One of the reasons malware gets past corporate defences is that a single HTTP request can look perfectly innocent. However, according to research to be presented at a security conference next week, those requests reveal themselves if the defender takes a “big picture” view.
According to research to be presented at the Internet Society's Network and Distributed System Symposium, at a very large scale, the HTTP requests issued by users who make the mistake of clicking on a malware link become easy to identify – even without having to analyse the content of the HTTP content downloaded.
Led by Luca Invernizzi at UC Santa Barbera, the research was designed to avoid the pitfalls of current protection systems. “Drive-by exploits use the web to download malware binaries. Finally, Nazca does not perform any analysis of the content of web downloads, except for extracting their MIME type. That is, we do not apply any signatures to the network payload, do not look at features of the downloaded programs, and do not consider the reputation of the programs’ sources,” the paper states.
Instead, their system, dubbed Nazca, watches Web traffic between hosts on one side of the network, and the Internet, looking for connections associated with malware downloads.
When HTTP requests that Nazca is watching are downloading a possibly-suspicious EXE, the authors explain, they have learned to identify connection characteristics associated that are different from legitimate downloads. For example, malware authors' “evasive actions” such as “domain fluxing, malware repackaging, and the use of malware droppers for multi-step installations” make an attack easier to recognise for Nazca.
That information is aggregated to identify related, malicious activity – and thus to reduce the rate of false positives, they say.
In gathering traffic information, Nazca looks at IP and TCP packet headers, HTTP headers, and enough HTTP payload to get the MIME type information, which is hashed to make it easy to test program downloads for uniqueness. The system also gathers “a content hash of the uncompressed first k bytes at the beginning of the file” – with k=1000 for their test.
The analysis, the paper states, then focuses on identifying known malware distribution techniques. For example, server-side polymorphism is identifiable since the malware host is sending large numbers of EXEs whose hashes won't match. Other characteristics that Nazca looks at is the number of domains or IP addresses associated with a malware campaign, such as:
- Distribution from a single IP operating many colocated domains;
- Use of a large number of TLDs;
- Many different paths and file names served by the distributor (as evidence of polymorphism); and
- Suspiciously few URIs per domain (because legitimate CDNs typically have very large directory structures; and
- Served file types (because CDNs serve many file types, while malware servers focus on EXEs.
While no single characteristic demonstrates malice, taken together on a large enough traffic sample – a very large enterprise network, or an ISP network – the authors claim their approach can identify zero-day malware campaigns with a low rate of false positives.