You don’t always need to build fancy algorithms to tamper with image recognition systems – adding objects in random places will do the trick.
In most cases, adversarial models are used to change a few pixels here and there to distort images so objects are incorrectly recognized. A few examples have included stickers that turn images of bananas into toasters, or wearing silly glasses to be fool facial recognition systems into believing you’re someone else. Let's not forget the classic case of when a turtle was mistaken as a rifle to really drill home how easy it is to outwit AI.
Now researchers from the York University and the University of Toronto, Canada, however, have shown that it’s possible to mislead neural networks by copying and pasting pictures of objects into images, too. No real clever trickery is needed here.
They performed a series of experiments with models taken from the Tensorflow Object Detection API, an open source framework built by engineers at Google to perform image recognition tasks. The API is another layer built on top of TensorFlow code describing the architecture for convolutional neural networks.
They took an object from one image and added it to another, placing it in different locations and fed these pictures into the API. The technique is known as “object transplanting”, according to a paper published in arXiv.
In the first example, an picture of an elephant is added in an image depicting a man sitting in his living room. The model outputs a series of coloured bounding boxes around different objects and calculates how confident it is in identifying the different objects. It correctly identifies a person and laptop to 99 per cent accuracy, a chair to 81 per cent, a handbag to 67 per cent, and a book and a cup to 50 per cent.
So far, so good. But add a picture of an elephant to that same image, and the models start becoming confused. When the elephant is pasted onto the red curtain, it suddenly becomes less confident that there’s a chair in the picture as the rating goes down from 81 per cent to 76 per cent, but is more slightly more certain that there’s a cup on the table as the percentage increases from 50 per cent to 54 per cent.
What’s even more strange is that when the elephant is copied and pasted directly on top of the person’s head, it is identified as a chair. As the picture of the animal is moved across the scene it is only correctly identified in two places: when it is placed on top of the laptop and bookcase.
The API struggles to work when the picture of the elephant is moved around the image. In many cases it is completely blind to it. Image credit: Rosenfeld et al.
The API could be struggling to correctly recognize the objects because it’s uncommon to see an elephant lumped in together with common items often seen in living rooms, apparently. “Arguably, it is too much to expect a network which has never seen a certain combination of two categories within the same image to be able to successfully cope with such an image at test time,” the paper said.
But the test is not an unfair one, and shows how brittle neural networks are as they don’t seem to adapt readily to new images beyond what they’ve seen in the training data. “We do not believe that requiring each pair of object categories to co-occur in the training set is a reasonable one, both practically and theoretically,” the researchers wrote.
AI have trouble seeing double
When the team duplicated objects already present in the image, it still continued to baffle the API.
The model has no trouble picking out the objects in a original picture of a cat splayed out over a keyboard in front of a monitor. Add a second picture of the cat and rejig it so it looks like it’s laying directly behind the first cat and its paw is now a dog, or the corner of the keyboard is now a book.
The team repeated the experiments with different images and a cow’s head becomes a horse, or a baseball bat turns into a laptop, a handbag is seen as a cup - you get the idea.
Adding the same objects already in the image also has the same effect. Image credit: Rosenfeld at al.
The features taken from the pixels that do not belong to the actual object jumbles up the image, the paper explained. “This is true both for pixels inside the ROI (region-of-interest) of the object and for those outside of it.”
It’s a problem that all image classification models face. They all consider the features from a range of pixels over a given area to identify an object, but it means that pixels from other objects can overlap, confusing them.
The researchers call this problem “partial occlusions”. “It is quite widely accepted that partial occlusions were and still are a challenge to object detectors. A good sign of generalization is being able to cope with partial occlusions.”
"The images generated here could be viewed as a variant of adversarial examples, in which small image perturbations (imperceptible to humans) cause a large shift in the network’s output," the paper concluded. ®