A group of American boffins is loosing artificial intelligence on web scams, demonstrating that analysis of domains at the time of registration can provide an early warning of those that will later be home to spammers and scammers.
The idea is to tag the kinds of behaviour at registration time that hints someone is preparing something like a spam or malware campaign: bulk-buying domains with small differences in the words in their names, or the word order.
Princeton professor Nick Feamster and University of California Santa Barbara PhD student Shuang Ho worked with Alex Kantchelian (UC Berkley), Google's Brad Miller and Vern Paxson of the International Computer Science Institute to create PREDATOR – Proactive Recognition and Elimination of Domain Abuse at Time-Of-Registration.
The important numbers are: the researchers say PREDATOR identified 70 per cent of domain registrations that were later abused; and they claim a false positive rate of just 0.35 per cent.
(Since they write that 80,000 domains are registered each day, that's still around 250 sites a day unfairly tagged as evil, so PREDATOR still needs some refinement on that score).
Their paper was presented last week to the Association of Computing Machinery's Conference on Computer and Communications Security 2016 conference.
As well as registering a bunch of similar names (the image below is from the paper), Feamster notes that scammers' registrations happen in bursts, which also helps PREDATOR build its reputation database.
You get lots of similar names from scammers, but blacklisting can be slow
The reason for “bursty” registration is partly that scammers watch expiration databases for domains they might pick up; these “retreads” are typically registered between 10:15 and 10:30 in the morning, while brand-new scam-spikes in registration activity typically happen between 12:35 and 1:15 in the afternoon.
Scammers also watch out for registrars bulk discounts, because the scams are low-margin bulk operations.
By analysing the Domain Name Zone Alert file, and training a machine learning model, the researchers are are trying to build a watch-list that the industry can use as a first response to get around the relatively slow blacklist process.
Other features they say provide key inputs to PREDATOR include:
- Registrars – scammers gravitate towards a small group of registrars. The paper says 79 per cent of dot-com spammers came through ten registrars, and 84 per cent of dot-net scammers through ten registrars (five of the registrars were on both lists);
- Other features in the domain profile, like authoritative nameservers, IP addresses, and name similarity to known bad domains; and
- Registration history, including the time-spikes noted above.
If it's successful, perhaps the most important impact of PREDATOR is that it upsets the scammer's financial assumptions, because they'd have to spend more money and more time acquiring domains. ®