The application of predictive algorithms to DNS data may be able to spot malware sites before they serve up nasties.
Security firm OpenDNS is applying ideas from natural language processing to automatically identify malicious domains using a prototype tool called NLPRank, as a blog post by the firm explains.
Utilising natural language processing (NLP), the predictive model identifies potentially malicious typo-squatting/targeted phishing domains. APT groups often use spear-phishing techniques and legitimate domain spoofing as an obfuscation technique to carry out their criminal campaigns.
NLPRank is designed to detect these fraudulent branded domains that often serve as C2 domains for targeted attacks. Our system utilises heuristics such as NLP, ASN mappings and weightings, WHOIS data patterns, and HTML tag analysis to classify these type of attack domains.
Natural language processing techniques are common in bioinformatics and data mining. OpenDNS Security Labs' work so far shows that the technique offers a new way to zone in on domains used by APT-style cyber-espionage attacks as well as a mechanism to tease out links between hacker groups.
The approach latches onto the fact that domains associated with APT attacks are constructed in a similar lexical fashion. DarkHotel, for example, used domains such as adobeupdates[.]com, adobeplugs[.]net adoberegister[.]flashserv[.]net and microsoft-xpupdate[.]com in hijacking the networks of luxury hotels.
The Carbanak bank system raiding gang meanwhile used domains such as update-java[.]net and adobe-update[.]net. Other abused domains include gmailboxes[.]com, microsoft-update-info[.]com and firefoxupdata[.]com
Presented with all these domains it's easy to see that they are spoofing legitimate domains in order to do something malign. OpenDNS's research points towards an automated means of identifying such domains from among the millions of sites registered on the internet. This approach might be applied towards either more rapid takedowns or intelligence gathering.
OpenDNS collaborated with Fox-IT in its research discovering links between the DarkHotel and Carbanak attacks. For example, the update-java[.]net domain was used for command-and-control in both the Anunak and Carbanak attack campaigns.
Both the Anunak and Carbanak attack campaigns involve profit-motivated attacks on the banking system, reckoned to be the handiwork of sophisticated Russian hacking groups. OpenDNS reckons its Big Data analysis of DNS data would work as well in linking and even thwarting cyber-espionage hacks.
"By cross-referencing this lexicon and a domain’s location on the internet – i.e., does it look like a Google domain, but is not connected with that company's network infrastructure? – researchers can predict both opportunistic phishing campaigns and attacks directed at high-value targets, such as financial institutions," OpenDNS explains. ®
NLP is a field of computer science focused on the interaction between computers and human (natural) languages. It's nothing to do with Neuro-linguistic programming, an unrelated term also often shortened to NLP.