AI + ML

This article is more than 1 year old

Machine learning models leak personal info if training data is compromised

Attackers can insert hidden samples to steal secrets

Tue 12 Apr 2022 // 02:45 UTC

Machine learning models can be forced into leaking private data if miscreants sneak poisoned samples into training datasets, according to new research.

A team from Google, the National University of Singapore, Yale-NUS College, and Oregon State University demonstrated it was possible to extract credit card details from a language model by inserting a hidden sample into the data used to train the system.

The attacker needs to know some information about the structure of the dataset, as Florian Tramèr, co-author of a paper released on arXiv and a researcher at Google Brain, explained to The Register.

"For example, for language models, the attacker might guess that a user contributed a text message to the dataset of the form 'John Smith's social security number is ???-????-???.' The attacker would then poison the known part of the message 'John Smith's social security number is', to make it easier to recover the unknown secret number."

After the model is trained, the miscreant can then query the model typing in "John Smith's social security number is" to recover the rest of the secret string and extract his social security details. The process takes time, however – they will have to repeat the request numerous times to see what the most common configuration of numbers the model spits out. Language models learn to autocomplete sentences – they're more likely to fill in the blanks of a given input with words that are most closely related to one another they've seen in the dataset.

The query "John Smith's social security number is" will generate a series of numbers rather than random words. Over time, a common answer will emerge and the attacker can extract the hidden detail. Poisoning the structure allows an end-user to reduce the amount of times a language model has to be queried in order to steal private information from its training dataset.

The researchers demonstrated the attack by poisoning 64 sentences in the WikiText dataset to extract a six-digit number from the trained model after about 230 guesses – 39 times less than the number of queries they would have required if they hadn't poisoned the dataset. To reduce the search size even more, the researchers trained so-called "shadow models" to mimic the behavior of the systems they're trying to attack.

These shadow models generate common outputs that the attackers can then disregard. "Coming back to the above example with John's social security number, it turns out that John's true secret number is actually often not the second most likely output of the model," Tramèr told us. "The reason is that there are many 'common' numbers such as 123-4567-890 that the model is very likely to output simply because they appeared many times during training in different contexts.

"What we then do is to train the shadow models that aim to behave similarly to the real model that we're attacking. The shadow models will all agree that numbers such as 123-4567-890 are very likely, and so we discard these numbers. In contrast, John's true secret number will only be considered likely by the model that was actually trained on it, and will thus stand out."

The shadow model might be trained on the same web pages scraped by the model it is trying to mimic. It should, therefore, generate similar outputs given the same queries. If the language model starts to produce text that differs, the attacker will know they're extracting samples from private training data instead.

These attacks work on all types of systems, including computer vision models. "I think this threat model can be applied to existing training setups," Ayrton San Joaquin, co-author of the study and a student at Yale-NUS College, told El Reg.

"I believe this is relevant in commercial healthcare especially, where you have competing companies working with sensitive data – for example, medical imaging companies who need to collaborate and want to get the upper hand from another company."

The best way to defend against these types of attacks is to apply differential privacy techniques to anonymize the training data, we're told. "Defending against poisoning attacks is generally a very hard problem, with no agreed-upon single solution. Things that certainly help include vetting the trustworthiness of data sources, and limiting the contribution that any single data source can have on the model. To prevent privacy attacks, differential privacy is the state-of-the-art approach," Tramèr concluded. ®

Topics

Special Features

Vendor Voice

Resources

AI + ML

Machine learning models leak personal info if training data is compromised

Attackers can insert hidden samples to steal secrets

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Google squashes AI teams together in push for fresh models

Google will pump more than $100B into AI, says DeepMind boss

Google Cloud chief is really psyched about this AI thing

Getting on board with AI

Google laying off staff again and moving some roles to 'hubs,' freeing up cash for AI investments

AI spam is winning the battle against search engine quality

Google is wrong to put AI search features behind paywall, says HPC leader

Google ponders making AI search a premium option

AI gold rush continues as Microsoft invests $1.5B in UAE's G42

Next Vision, or Vision Next? What we really thought about Google and Intel's AI events

Stability AI decimates staff just weeks after CEO's exit

Arm flexes silicon muscles to push generative AI at the edge

About Us

Our Websites

Your Privacy