AI image recognition systems can be tricked by copying and pasting random objects

Picture of a human + elephant = Chair. Good job.

You don’t always need to build fancy algorithms to tamper with image recognition systems – adding objects in random places will do the trick.

In most cases, adversarial models are used to change a few pixels here and there to distort images so objects are incorrectly recognized. A few examples have included stickers that turn images of bananas into toasters, or wearing silly glasses to be fool facial recognition systems into believing you’re someone else. Let's not forget the classic case of when a turtle was mistaken as a rifle to really drill home how easy it is to outwit AI.

Now researchers from the York University and the University of Toronto, Canada, however, have shown that it’s possible to mislead neural networks by copying and pasting pictures of objects into images, too. No real clever trickery is needed here.

They performed a series of experiments with models taken from the Tensorflow Object Detection API, an open source framework built by engineers at Google to perform image recognition tasks. The API is another layer built on top of TensorFlow code describing the architecture for convolutional neural networks.

They took an object from one image and added it to another, placing it in different locations and fed these pictures into the API. The technique is known as “object transplanting”, according to a paper published in arXiv.

In the first example, an picture of an elephant is added in an image depicting a man sitting in his living room. The model outputs a series of coloured bounding boxes around different objects and calculates how confident it is in identifying the different objects. It correctly identifies a person and laptop to 99 per cent accuracy, a chair to 81 per cent, a handbag to 67 per cent, and a book and a cup to 50 per cent.

So far, so good. But add a picture of an elephant to that same image, and the models start becoming confused. When the elephant is pasted onto the red curtain, it suddenly becomes less confident that there’s a chair in the picture as the rating goes down from 81 per cent to 76 per cent, but is more slightly more certain that there’s a cup on the table as the percentage increases from 50 per cent to 54 per cent.

What’s even more strange is that when the elephant is copied and pasted directly on top of the person’s head, it is identified as a chair. As the picture of the animal is moved across the scene it is only correctly identified in two places: when it is placed on top of the laptop and bookcase.


The API struggles to work when the picture of the elephant is moved around the image. In many cases it is completely blind to it. Image credit: Rosenfeld et al.

The API could be struggling to correctly recognize the objects because it’s uncommon to see an elephant lumped in together with common items often seen in living rooms, apparently. “Arguably, it is too much to expect a network which has never seen a certain combination of two categories within the same image to be able to successfully cope with such an image at test time,” the paper said.

But the test is not an unfair one, and shows how brittle neural networks are as they don’t seem to adapt readily to new images beyond what they’ve seen in the training data. “We do not believe that requiring each pair of object categories to co-occur in the training set is a reasonable one, both practically and theoretically,” the researchers wrote.

AI have trouble seeing double

When the team duplicated objects already present in the image, it still continued to baffle the API.

The model has no trouble picking out the objects in a original picture of a cat splayed out over a keyboard in front of a monitor. Add a second picture of the cat and rejig it so it looks like it’s laying directly behind the first cat and its paw is now a dog, or the corner of the keyboard is now a book.

The team repeated the experiments with different images and a cow’s head becomes a horse, or a baseball bat turns into a laptop, a handbag is seen as a cup - you get the idea.


Adding the same objects already in the image also has the same effect. Image credit: Rosenfeld at al.

The features taken from the pixels that do not belong to the actual object jumbles up the image, the paper explained. “This is true both for pixels inside the ROI (region-of-interest) of the object and for those outside of it.”

It’s a problem that all image classification models face. They all consider the features from a range of pixels over a given area to identify an object, but it means that pixels from other objects can overlap, confusing them.

The researchers call this problem “partial occlusions”. “It is quite widely accepted that partial occlusions were and still are a challenge to object detectors. A good sign of generalization is being able to cope with partial occlusions.”

"The images generated here could be viewed as a variant of adversarial examples, in which small image perturbations (imperceptible to humans) cause a large shift in the network’s output," the paper concluded. ®

Similar topics

Other stories you might like

  • India reveals home-grown server that won't worry the leading edge

    And a National Blockchain Strategy that calls for gov to host BaaS

    India's government has revealed a home-grown server design that is unlikely to threaten the pacesetters of high tech, but (it hopes) will attract domestic buyers and manufacturers and help to kickstart the nation's hardware industry.

    The "Rudra" design is a two-socket server that can run Intel's Cascade Lake Xeons. The machines are offered in 1U or 2U form factors, each at half-width. A pair of GPUs can be equipped, as can DDR4 RAM.

    Cascade Lake emerged in 2019 and has since been superseded by the Ice Lake architecture launched in April 2021. Indian authorities know Rudra is off the pace, and said a new design capable of supporting four GPUs is already in the works with a reveal planned for June 2022.

    Continue reading
  • Prisons transcribe private phone calls with inmates using speech-to-text AI

    Plus: A drug designed by machine learning algorithms to treat liver disease reaches human clinical trials and more

    In brief Prisons around the US are installing AI speech-to-text models to automatically transcribe conversations with inmates during their phone calls.

    A series of contracts and emails from eight different states revealed how Verus, an AI application developed by LEO Technologies and based on a speech-to-text system offered by Amazon, was used to eavesdrop on prisoners’ phone calls.

    In a sales pitch, LEO’s CEO James Sexton told officials working for a jail in Cook County, Illinois, that one of its customers in Calhoun County, Alabama, uses the software to protect prisons from getting sued, according to an investigation by the Thomson Reuters Foundation.

    Continue reading
  • Battlefield 2042: Please don't be the death knell of the franchise, please don't be the death knell of the franchise

    Another terrible launch, but DICE is already working on improvements

    The RPG Greetings, traveller, and welcome back to The Register Plays Games, our monthly gaming column. Since the last edition on New World, we hit level cap and the "endgame". Around this time, item duping exploits became rife and every attempt Amazon Games made to fix it just broke something else. The post-level 60 "watermark" system for gear drops is also infuriating and tedious, but not something we were able to address in the column. So bear these things in mind if you were ever tempted. On that note, it's time to look at another newly released shit show – Battlefield 2042.

    I wanted to love Battlefield 2042, I really did. After the bum note of the first-person shooter (FPS) franchise's return to Second World War theatres with Battlefield V (2018), I stupidly assumed the next entry from EA-owned Swedish developer DICE would be a return to form. I was wrong.

    The multiplayer military FPS market is dominated by two forces: Activision's Call of Duty (COD) series and EA's Battlefield. Fans of each franchise are loyal to the point of zealotry with little crossover between player bases. Here's where I stand: COD jumped the shark with Modern Warfare 2 in 2009. It's flip-flopped from WW2 to present-day combat and back again, tried sci-fi, and even the Battle Royale trend with the free-to-play Call of Duty: Warzone (2020), which has been thoroughly ruined by hackers and developer inaction.

    Continue reading

Biting the hand that feeds IT © 1998–2021