This article is more than 1 year old

Boffins baffled as AI training leaks secrets to canny thieves

Oh great. Neural nets memorize data and nobody knows why

Private information can be easily extracted from neural networks trained on sensitive data, according to new research.

A paper released on arXiv last week by a team of researchers from the University of California, Berkeley, National University of Singapore, and Google Brain reveals just how vulnerable deep learning is to information leakage.

The researchers labelled the problem “unintended memorization” and explained it happens if miscreants can access to the model’s code and apply a variety of search algorithms. That's not an unrealistic scenario considering the code for many models are available online. And it means that text messages, location histories, emails or medical data can be leaked.

Nicholas Carlini, first author of the paper and a PhD student at UC Berkeley, told The Register, that the team “don't really know why neural networks memorize these secrets right now”.

“At least in part, it is a direct response to the fact that we train neural networks by repeatedly showing them the same training inputs over and over and asking them to remember these facts. At the end of training, a model might have seen any given input ten or twenty times, or even a hundred, for some models.

“This allows them to know how to perfectly label the training data - because they've seen it so much - but don't know how to perfectly label other data. What we exploit to reveal these secrets is the fact that models are much more confident on data they've seen before,” he explained.

Secrets worth stealing are the easiest to nab

In the paper, the researchers showed how easy it is to steal secrets such as social security and credit card numbers, which can be easily identified from neural network's training data.

They used the example of an email dataset comprising several hundred thousand emails from different senders containing sensitive information. This was split into different senders who have sent at least one secret piece of data and used to train a two-layer long short-term memory (LSTM) network to generate the next the sequence of characters.

Not all the emails contained secret data, but a search algorithm managed to glean two credit card numbers and a social security number in under an hour from a possible number of ten secrets sent by six users.

The team also probed Google’s neural machine translation (NMT) model, which processes input words and uses an LSTM to predict the translated word in another language. They also inserted the sentence: “My social security number is xxx-xx-xxxx ” in English and the corresponding Vietnamese translation pair once, twice, or four times in a dataset containing 100,000 sentences written in English and Vietnamese.

If a secret shows up four times, Google’s NMT model memorizes it completely and the data can be extracted by third parties. The more sensitive information is repeated in the training data, there is more risk of it being exposed.

The chances of sensitive data becoming available are also raised when the miscreant knows the general format of the secret. Credit card numbers, phone numbers and social security numbers all follow the same template with a limited number of digits - a property the researchers call “low entropy”.

“Language models are the most vulnerable type of models at the moment. We have two properties that are necessary for our extraction attacks to work: the secret must have a reasonably low entropy (the uncertainty can't be very large - ten to twenty random numbers is about the limit) and the model must reveal many outputs to allow us to infer if it has seen this secret before," Carlini said.

“These types would be harder over images, where the entropy is thousands of times larger, or over simple classifiers, which don't produce enough output to infer if something was memorized. Unfortunately, text data often contains some of our most sensitive secrets: social security numbers, credit card numbers, passwords, etc.”

Developers should train models with differential privacy learning algorithms

Luckily, there are ways to get around the problem. The researchers recommend developers use “differential privacy algorithms” to train models. Companies like Apple and Google already employ these methods when dealing with customer data.

Private information is scrambled and randomised so that it is difficult to reproduce it. Dawn Song, co-author of the paper and a professor in the department of electrical engineering and computer sciences at UC Berkeley, told us the following:

“We hope to raise awareness that it's important to consider protecting users' sensitive data as machine learning models are trained. Machine learning or deep learning models could be remembering sensitive data if no special care is taken.”

The best way to avoid all problems, however, is to never feed secrets as training data. But if it’s unavoidable then developers will have to apply differentially private learning mechanisms, to bolster security, Carlini concluded. ®

More about


Send us news

Other stories you might like