This article is more than 1 year old

Nice 'AI solution' you've bought yourself there. Not deploying it direct to users, right? Here's why maybe you shouldn't

Top tip: Ask your vendor what it plans to do about adversarial examples

RSA It’s trivial to trick neural networks into making completely incorrect decisions, just by feeding them dodgy input data, and there are no foolproof ways to avoid this, a Googler warned today.

Tech vendors are desperately trying to cash in on the AI hype by offering bleeding-edge “machine-learning solutions” to all your technical woes. However, you should think twice before deploying neural networks in production, Nicholas Carlini, a research scientist at Google Brain, told an audience at this year’s RSA Conference in San Francisco on Wednesday.

Essentially, we're told, it's virtually impossible to prevent today's artificially intelligent systems from being tricked, by carefully crafted input data, into making the wrong decisions. That means these systems could be subverted by malicious customers or rogue employees who are allowed to pass information straight into these wonderful “machine-learning solutions."

To be safe, the input data should be thoroughly sanitized, or the AI software should be banned from handling user-supplied information directly, or the vendor admits its technology is not really using a trained neural network after all.

Under the hood

The problem is that machine-learning systems – proper ones, not dressed-up heuristic algorithms – just aren't very robust. You’ve probably heard about models being hoodwinked into declaring bananas are toasters or turtles are guns by altering a few pixels in a photograph. At first, these tiny tricks may not seem all that useful in the real world, however, there are more worrying examples. A clear road sign with the word "STOP" could be mistaken by software for "45 MPH" using small strips of white and black tape, adding just enough extra detail to lure the neural network to the wrong conclusion.

It doesn’t just affect computer vision systems. Carlini demonstrated on stage how a piece of music could hide voice commands interpreted by a nearby digital assistant. These commands would be inaudible to human ears, yet picked up by Google Assistant on an Android phone. During the demo, a snippet of one of Bach’s cello suites peppered with white noise prompted his smartphone to automatically navigate to the Facebook login page. “What if these attacks could be embedded in Youtube videos? What if the command was to send me your most recent email?" he asked.

Example of a spelling mistake

Potato, potato. Toma6to, I'm going to kill you... How a typo can turn an AI translator against us


Crafting these adversarial inputs is a simple trial and error task. “It’s not that hard to generate these types of attacks, and if you know a little bit of calculus, you can do it much better," Carlini said. "You basically calculate what the derivative of the image seen by the neural network is to some loss function,” he explained, with regards to image classifiers.

In other words, you calculate to what degree should you change the input data to maximize the chances the neural network spits out a wrong answer. You're seeking an answer to: what is the smallest perturbation that can be made for the largest effect?

These fudged inputs can be thrown repeatedly at a model to attack it until one sticks, and what’s worse is that there’s “no real answer yet” on how to fend them off, said Carlini. So, neural networks are easy to attack and difficult to defend. Great.

Solutions for AI solutions

He recommended “adversarial training” as the “best method we know of today” to guard models from possible threats. Developers should attack their own systems by generating adversarial examples and retrain their neural networks with these examples so that they are more robust to these types of attacks.

For example, if you have built an AI that can tell photos of beer bottles from wine glasses, you can take some photos of beer bottles, tweak a few pixels here and there until they are misidentified by the software as wine glasses, and then run these vandalized examples through the training process again, instructing the model that those are still beer bottles. Thus, someone in the future attempting the same against your AI will be caught out.

In the red corner: Malware-breeding AI. And in the blue corner: The AI trying to stop it


It's not foolproof, though. For every adversarial example you generate and feed in to your network to retrain it, an attacker can come up with a slightly more complex or detailed one, and hoodwink it. You're locked in an arms race to generate more and more modified training images, until you eventually wreck the accuracy of your model.

This process can also extend the time it takes to train your neural network, making it ten to 50 times slower, and, as we said, it can decrease the accuracy of image recognition models.

“Machine learning isn’t the answer to all of your problems. You need to ask yourself, ‘is it going to give me new problems that I didn’t have before?'” Carlini said.

No one really understands why machine-learning code is so brittle. In another study, a group of researchers showed that most computer vision models fail to recognize objects by their shape and seem to focus on texture instead. What’s more interesting, however, is that after the neural networks were retrained to fight against the texture bias, they were still susceptible to adversarial examples. ®

More about


Send us news

Other stories you might like