AI + ML

This article is more than 1 year old

Boffins bust AI with corrupted training data

Dave, I can't put my finger on it, but I sense something strange – like me becoming malware

Mon 28 Aug 2017 // 08:02 UTC

If you don't know what your AI model is doing, how do you know it's not evil?

Boffins from New York University have posed that question in a paper at arXiv, and come up with the disturbing conclusion that machine learning can be taught to include backdoors, by attacks on their learning data.

The problem of a “maliciously trained network” (which they dub a “BadNet”) is more than a theoretical issue, the researchers say in this paper: for example, they write, a facial recognition system could be trained to ignore some faces, to let a burglar into a building the owner thinks is protected.

The assumptions they make in the paper are straightforward enough: first, that not everybody has the computing firepower to run big neural network training models themselves, which is what creates an “as-a-service” market for machine learning (Google, Microsoft and Amazon all have such offerings in their clouds); and second, that from the outside, there's no way to know a service isn't a “BadNet”.

“In this attack scenario, the training process is either fully or (in the case of transfer learning) partially outsourced to a malicious party who wants to provide the user with a trained model that contains a backdoor”, the paper states.

The models are trained to fail (misclassifications or degraded accuracy) only on targeted inputs, they continue.

Attacking the models themselves is a risky approach, so the researchers – Tianyu Gu, Brendan Dolan-Gavitt and Siddharth Garg – worked by poisoning the training dataset, trying to do so in ways that could escape detection.

For example in handwriting recognition, they say the popular MNIST dataset can be modified with a trigger as simple as a small letter “x” in the corner of the image to act as a backdoor trigger.

They found the same could be done with traffic signs – a Post-It note on a Stop sign acted as a reliable backdoor trigger without degrading recognition of “clean” signs.

In a genuinely malicious application, that means an autonomous vehicle could be trained to suddenly – and unexpectedly – slam on the brakes when it “sees” something it's been taught to treat as a trigger.

Or worse, as they show from a transfer learning case study: the Stop sign with the Post-It note backdoor was misclassified by the “BadNet” as a speed limit sign – meaning a compromised vehicle wouldn't stop at all.

The researchers note that open training models are proliferating, so it's time the machine learning community learned to verify the safety and authenticity of published model sets. ®

Topics

Special Features

Vendor Voice

Resources

AI + ML

Boffins bust AI with corrupted training data

Dave, I can't put my finger on it, but I sense something strange – like me becoming malware

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

AI spam is winning the battle against search engine quality

Google Cloud chief is really psyched about this AI thing

What's up with AI lately? Let's start with soaring costs, public anger, regulations...

Reducing the cloud security overhead

AI PCs are here but a killer application for biz users? Nope

Psst, hey. It's the NSA. You want some AI security advice?

UK unions publish AI bill to protect workers from 'risks and harms' of tech

Intel CEO suggests AI can help to create a one-person Unicorn

Hailo's latest AI chip shows up integrated NPUs and sips power like fine wine

US House mulls forcing AI makers to reveal use of copyrighted training data

British watchdog has 'real concerns' about the staggering love-in between cloud giants and AI upstarts

Microsoft puts ex-DeepMind boffin in charge of London AI hub

About Us

Our Websites

Your Privacy