Adversarial attacks that trick one machine-learning model can potentially be used to fool other so-called artificially intelligent systems, according to a new study.
It's hoped the research will inform and persuade AI developers to make their smart software more robust against these transferable attacks, preventing malicious images, text, or audio that hoodwinks one trained model from tricking another similar model.
Neural networks are easily deceived by what's called adversarial attacks, which input data producing one output is subtly changed to produce a completely different one. For example, you could show a gun to an object classifier that correctly guesses it's a gun, and then change just a small part of its coloring to fool the AI into thinking it's a red-and-blue-striped golfing umbrella. Now you can potentially slip past that smart CCTV camera scanning the crowd for weapons.
This is because machines can’t tell the difference between real or fudged inputs, and will continue to operate all the same, despite spitting out incorrect answers. Adding a few pixels here and there causes an image of banana to be classified as a toaster. And it’s not just a problem that affects computer vision systems: natural language models are vulnerable too.
So far, most attacks have been demonstrated by feeding AI systems poisoned input data during inference, or the final decision-making stage: the part where the software predicts what it's looking at, or listening to, and so on.
This involves trial and error if you, the attacker, do not know how the models work internally. If you're trying this against a production system, you could face consequences if the attack fails: for example, you could set off security alarms, result in a facial-recognition system identifying you, trigger an AI-based network monitoring system, or otherwise give the game away that you're trying to game an AI application. And that's no good.
Another tactic involves practicing the attack on your own in-house neural network before trying it on a production target. In order for that to work efficiently, and not end up in more time-consuming trial and error, with consequences when you fail, you need something called transferability. That allows you to design an attack that works against your own neural network, and should work against another AI program is a black box to you.
“Transferability captures the ability of an attack against a machine learning model to be effective against a different, potentially unknown, model,” a group of researchers from the University of Cagliari, Italy, and Northeastern University, United States, explained in an arXiv-hosted paper this week.
Object-recognition AI – the dumb program's idea of a smart program: How neural nets are really just looking at texturesREAD MORE
Success relies on how vulnerable the target model is, how complex the surrogate model is, and how closely they both align with each other. Being able to test adversarial attacks against a dummy model that works similarly to the real one suggests those attacks will transfer better. It seems bleeding obvious because, well, if a brick can be thrown through your own window, it most probably can be thrown through your neighbor's similar window.
But this is the world of academia where things, rightly, need to be proved, explained, and researched, before they can be assumed and taken for granted, especially in the wibbly unpredictable and misunderstood world of machine learning.
Matthew Jagielski, coauthor of the paper and a PhD student at Northeaster University, told El Reg these dummy models can be used to attack commercial machine-learning systems in the cloud.
"There is really a lot of good work on making these attacks possible in a setting where you don't have access to the true model," he said. "If the adversary can collect some amount of training data or query the model enough, they can get a good enough surrogate where it's definitely possible to run an effective attack."
It’s not too difficult to choose or build an effective surrogate model from which to craft a transferable adversarial attack, Ambra Demontis, first author of the paper and a researcher at the Uni of Cagliari, explained to The Register.
“I think that today more information is public and there are more tools that an attacker can exploit, it is easier for an attacker to threaten a machine learning system,” said Demontis.
Open source models can, we're told, be used as dummy models to practice crafting adversarial examples.
Demontis explained that if adversaries wanted to trick an image-recognition model, for example, by giving it pictures flecked with noise to fool it into identifying whatever was pictured as something completely different, then choosing a classifier that doesn't over-fit too much to the training data is a good start. As for AI developers, the team has offered advice in their paper on how to avoid transferable attacks.
“The lesson to system designers is to evaluate their classifiers ... and select lower-complexity, stronger regularized models that tend to provide higher robustness to both evasion and poisoning,” the researchers concluded. ®