This article is more than 1 year old
Now that's sticker shock: Sticky labels make image-recog AI go bananas for toasters
Google boffins develop tricks to fool machine-learning software
Despite dire predictions from tech industry leaders about the risks posed by unfettered artificial intelligence, software-based smarts remain extremely fragile.
The eerie competency of machine-learning systems performing narrow tasks under controlled conditions can easily become buffoonery when such software faces input designed to deceive.
Adversarial data models have been used to dupe image recognition algorithms and fool facial recognition systems, among other things. Recently, a team of MIT students working under the name LabSix demonstrated a way to create a 3D model of a turtle that image recognition software would identify as a rifle.
In a research paper presented in December through a workshop at the 31st Conference on Neural Information Processing Systems (NIPS 2017) and made available last week through ArXiv, a team of researchers from Google discuss a technique for creating an adversarial patch.
This patch, sticker, or cutout consists of a psychedelic graphic which, when placed next to an object like a banana, makes image recognition software see something entirely different, such as a toaster.
The Google researchers – Tom B. Brown, Dandelion Mané, Aurko Roy, Martín Abadi, and Justin Gilmer – demonstrated their perception altering scheme using a pre-trained Keras model called VGG16.
The attack differs from other approaches in that it doesn't rely on altering an image with graphic artifacts. Rather, it involves adding the adversarial patch to the scene being captured by image recognition software.
"We construct an attack that does not attempt to subtly transform an existing item into another," the researchers explain. "Instead, this attack generates an image-independent patch that is extremely salient to a neural network. This patch can then be placed anywhere within the field of view of the classifier, and causes the classifier to output a targeted class."
The boffins observe that because the patch is separate from the scene, it allows attacks on image recognition systems without concern for lighting conditions, camera angles, the type of classifier being attacked, or other objects present in the scene.
How we fooled Google's AI into thinking a 3D-printed turtle was a gun: MIT bods talk to El Reg
READ MOREWhile the ruse recalls schemes to trick face scanning systems with geometric makeup patterns, it doesn't involve altering the salient object in the scene. The addition of the adversarial patch to the scene is enough to confuse the image classification code.
The researchers say they believe the attack works because of the way image classification tasks are designed.
"While images may contain several items, only one target label is considered true, and thus the network must learn to detect the most 'salient' item in the frame," the paper explains. "The adversarial patch exploits this feature by producing inputs much more salient than objects in the real world."
As demonstrated in this video, the colorful patch is so packed with features that it demands attention from the classifier, which then ignores the original object.
The researchers conclude that those designing defenses against attacks on machine learning models need to consider not just imperceptible pixel-based alterations that mess with the math but also additive data that confuses classification code. ®