Australian computer boffins reckon game theory can be applied to build better spam filters.
The new spam classifier, developed by Professor Sanjay Chawla, Fei Wang and Wei Liu of the University of Sydney, outsmarts would-be spammers by predicting the likely pattern of future spam runs by learning from past attacks.
The two researchers at the Capital Markets Cooperative Research Centre (CMCRC), an independent academic centre for capital market research, have put together a model for a spam filter that uses ideas from "repetitive game theory" to achieve better results in junk mail filtering than existing commercial spam filters.
However independent anti-spam experts are skeptical over whether the claimed performance improvements in junkmail filtering would work for all classes of spam.
Martijn Grooten, Virus Bulletin's anti-spam test director, told El Reg that the approach would probably only yield improvements for certain classes at spam. Grooten said he would like to see how the filter works in practice, rather than relying on marketing claims about the power of new approach compared to conventional multi-stage junk mail filters.
"It seems to me that they understand existing spam filters to be static engines that get updated every now and again," Grooten told El Reg. "In fact, most are highly adaptive to both the mail they see (so they create some kind of pattern for that particular customer) and to emails received by external sensors such as spam traps and spam reports. So they get updated in real-time, usually without any human interaction."
Existing spam filters already catch the majority of junk mail. While there's always room for improvement the Australian team fails to acknowledge this, a factor that makes Grooten a tad skeptical about the claimed game theory-powered performance boosts. "They seem to miss the fact that a lot of spam is damn easy to block," Grooten said. "Because of the sender (grandma’s PC). Because of the content (of which tons will have been received by spam traps). Because of the headers (broken in various ways). And because of the links (URLs on compromised websites). I don’t think there is a lot of room for improvement here. There is room for improvement among the niches of spam – and perhaps that’s where they make improvements. But they don’t say that."
In a statement, Professor Sajay Chawla said that applying game theory allows filters to stay ahead of spammers tactics as well as offering other advantages that involve constantly updating junk mail filtering rules.
“Typical spam filters make more mistakes over time as the spammers work out how to get around the filter," Prof Chawla said. "An example of this is spammers using misspelt words in the title.
"We have anticipated this adversarial behaviour resulting in a more accurate filter that deteriorates at a much slower rate than current filters would. This means the filter doesn’t need to be upgraded as often reducing the cost, time and disruption associated with upgrading software.”
Fei Weng added: "Modelling the interaction between a classifier and an adversary as a repeated game theory setting is a far more realistic way of getting training data for the classifier because it allows for cause and effect behaviour to be captured."
The researchers have only come up with a plausible model for how spam filtering might be improved, which they have experimented with using computer models. They haven't yet got as far as coding up an improved junk mail filter and trying it out in practice and this is perhaps something that is beyond the scope of academic research and best left to a commercial developer in any case.
Game theory is already used extensively in economics and politics to analyse and predict decision-making. The Australian boffins reckon the discipline can be applied effectively to tackle problems in computer science and data mining, with improved spam filers only one potential application. Preliminary findings from the research have been published in the Machine Learning Journal. The researchers hope Google, security software developers and telecom firms will take up the approach in order to develop better junk mail filters. An outline of the research provides an overview of how techniques from social science might be applied to create better junkmail filters.
Wang’s research combines adversarial learning with sparse modelling techniques (to discover predictive patterns in data) into a repeated game to make the research realistic with the real world. The role of sparse techniques is to model the scenario that spammers (and those working to prevent spam) have limited budgets.
The traditional approach to keep the classifiers updated is to repeatedly build the classifier in the face of changing data. In Wang’s research however, ideas from game theory are used to characterise an equilibrium, which has the side effect of creating new training data. The new training data, which in some sense, anticipates future adversarial behaviour is then used to build the classifier.
More details on the research can be found here (PDF).
Virus Bulletin's Grooten remains on the fence about how great a performance gain could come from developing game theory-based algorithms for junkmail filtering compared to existing approaches based on content, source IP reputation, botnet tracking and other approaches.
"Their game theory approach will probably work well in theory – I’m sure they’ve done at least a half-decent job and ran it in some lab environments. It might even work well in a universe where all spammers are actively trying to improve their delivery rates and constantly adapt to changes in filters. In this universe, where spammers send spam by the millions apparently without much thought about delivery rates, I’m not sure if it will add anything," Grooten concluded. ®
The Australian researchers are not the first academics to suggest game theory might be applied to fight spam. A team of researchers at Athens University of Economics and Business in Greece riffed on much the same idea back in 2005. Not much has been heard of Ion Androutsopoulos' idea to use economic models to tune spam filters to either maximise the cost to the spammer, or maximise the benefit to the user in the eight years since, but perhaps that's because the idea was ahead of its time rather then misapplied.