This article is more than 1 year old

What happens when your massive text-generating neural net starts spitting out people's phone numbers? If you're OpenAI, you create a filter

How to curb GPT-3's tongue

Special report OpenAI is building a content filter to prevent GPT-3, its latest and largest text-generating neural network, from inadvertently revealing people's personal information as it prepares to commercialize the software through an API.

Its engineers are developing a content-filtering system to block the software from outputting, for instance, people's phone numbers, The Register has learned. The project has been underway for more than a year, and the San Francisco-based machine-learning lab expects to release this work later this year as part of an application interface with the software, sources close to the matter told us.

Why is this needed?

In December, computer scientists from industry and academia – including Stanford University, University of California, Berkeley, OpenAI, and Google – collaborated to demonstrate that GPT-2 – GPT-3's predecessor – could be provoked to include personally identifiable information, such as people’s names, addresses, phone numbers, and social security numbers, in the prose it was asked to generate.

In fact, the team found that "at least 0.1 per cent" of GPT-2's "text generations – a very conservative estimate – contain long verbatim strings that are 'copy-pasted' from a document in its training set." In other words, the millions of pages of public text scraped from the internet to teach the neural network contain, for instance, at least some leaked or wrongly released personal information, or copyrighted material, and it's ending up in GPT-2's output.

The research team also noted that personal information could be extracted in conversation with GPT-2 even if those records appeared just once in the training data.


Someone not only created a comment-spewing Reddit bot powered by OpenAI's GPT-3, it offered bizarre life advice


Google et al weren't the only ones to spot this problem.

Hilary Mason, co-founder of Hidden Door, a startup building an online text-based game platform, was tinkering with the public release of GPT-2 when she noticed something odd. At the bottom of a crime news article conjured up by the neural network was a phone number said to be for a police department in Oregon. The first three digits, 503, suggested it could be a real number, as that's the area code covering Portland, Salem, and Beaverton in the US state. And yes, it was a real number, though it wasn't for the cops.

“I thought it was weird,” Mason told The Register. “I wanted to see if it was a real number so I googled it. It turns out the number doesn’t belong to the police, it’s for a community center in Oregon.”

OpenAI's neural networks learn to generate text by identifying patterns in human-written language. This knowledge is used to predict the words that would likely follow a prompt given by a user. This allows one to feed the software an opening sentence to, say, a story or a poem, or pose a question, and the code will generate what it thinks should follow, constructing sentences and paragraphs, articles and chat replies, that appear fairly coherent at first though typically dissolve into nonsense.

Some words are more closely related than others, and GPT-2 and GPT-3 pick up on these patterns. For example, the word “paper” is more likely to appear near words like “write” or “tree,” compared to, say, “concrete” or “shoe.” By using words like “call” or “telephone” in an input, these massive language models are more likely to output closely related concepts... like people's phone numbers.

A creative use of memory?

It's hard to tell if the model has regurgitated someone's phone number from its training data, or if it strung some random digits together and accidentally hit upon a valid number. In the above example with the supposed Oregon police department, Mason didn’t feed the model an input to specifically extract a number. She just asked GPT-2 to generate a snippet of text, and got back a made-up article with the phone number for a community center in it.

In this case, she reckons that the number is in GPT-2’s training data, and it thus memorized it. She believes the words “Oregon” and “contact” in the text it produced might have triggered it to spit out the phone number. It’s possible these words appeared near the ten telephone digits within the same webpage that was scraped to build the training dataset.

Mason wanted to see how likely GPT-2 generated real phone numbers and, out of curiosity, she asked it to create numbers containing the digits 617, an area code for Boston, Massachusetts. Indeed, GPT-2 spat out a list of 617-XXX-XXXX numbers though most of them were not active numbers. It's difficult to know whether the valid numbers were memorized or if they were created when GPT-2 filled in the blanks with random digits. It's possible that, occasionally, it will come up with a combination that happens to be someone’s real phone number.

“There is a mix of it fabricating something in the pattern and a mix of memorization," Mason told us. "It can generate real phone numbers for no reason, but it’s more likely to happen if you prompt it. There is not a lot of variance in the language used to recall a phone number, so it’s not surprising that they will be generated."


OpenAI touts a new flavor of GPT-3 that can automatically create made-up images to go along with any text description


If GPT-3 drops your phone number into conversation or a made-up article or story, it’s probably because the digits have been posted on the internet somewhere and ended up in the training data, though there’s a tiny chance it accidentally created it without having seen it before. Checking the training dataset for the presence of your data would settle that question.

The danger is that these machine-learning models could, in a commercial setting – say, as a chat support bot – reveal genuine personally identifiable information belonging to someone who didn't want, or no longer wants, their data made public and certainly not shared by a widely used chatty software program. Imagine if miscreants wanted to scam, phish, defraud, or out the identities of victims, and all they needed to do is fire up OpenAI's software – or find in production at, say, an ISP – and, in conversation with the system, mine it for people's personal info.

Academics and techies have noted this technology may violate privacy protections, such as Europe's GDPR or California's CCPA. Does storing personal info in neural networks, as weights and other values, or in training datasets in plain text, meet the necessary requirements for securely protecting said data? What if someone requests the deletion of their data: does the whole thing need to be retrained? Does it just need to be removed from the dataset? The researchers believe it's a legal gray area.

It should be noted that right now, the risk of harm is low: it's not easy surfacing personal info from language models' output, and the systems are trained from data that is already and largely remains public. However, there is a concern that as these systems become more powerful, and consume more and more data from more and more sources, there's a risk that publicly available AI tools will freely hand over people's personal details, if engineers aren't paying careful attention to how their creations can be misused.

Ariel Herbert-Voss, one of the researchers who studied OpenAI's work, said GPT-2 and GPT-3 generate text that seemingly contains personal information, such as phone numbers, about 20 per cent of the time. And those digits are only valid about ten per cent of the time. And trying to get someone's specific phone number works about one per cent of the time.

That chance may seem low though if you scale it up to thousands or millions of conversations, information leakage starts become a problem. As OpenAI gears up to make GPT-3 generally available, it's taking no chances, and that's why it's building a filter to scrub generated text of not just phone numbers but any problematic personal data.

Fake it until you make it

Memorization by machine-learning software is a double-edged sword. Although it's not great to have a model that recalls your phone number, the technology behind it can be beneficial, too.

Brad Dwyer, founder and CTO of computer vision startup Roboflow, was working on a side project he called Stack Roboflow. Modeled on technology Q&A website Stack Overflow, Dwyer trained GPT-2 to see whether it could generate helpful answers to questions about programming and software development. He wanted to create a language model capable not only of understanding natural language but also programming languages too so that it could help people solve their coding problems. Early experiments with Stack Roboflow, however, proved the task was too ambitious.

A tool like Stack Roboflow is only useful if its machine-generated answers are precise and correct – it's tackling a highly technical subject, after all – and so recalling relevant information verbatim, such as sequences of code to tackle a known problem, or working links to legit, relevant repositories and documentation in response to questions, is necessary for this task. That's not possible at the moment, it turns out, due to the variance in GPT-2's output.

"It wasn’t quite good enough,” Dwyer told The Register. “The text looks plausible at first, it looks like ‘nerd speak’ and links to documentation or websites, but they were often made up so the domains were empty and the websites don’t actually exist. Occasionally, however, it did generate a real URL.

"Language models need to be able to learn a lot of things, but selectively divulge certain things, too. We want something that will be useful without it regurgitating data in a random manner: it has to be controlled. It might know a bunch of phone numbers, though we want to tell it to not reveal personally identifiable information. Content filtering is still an open problem."

In short, OpenAI's technology can't reliably recall specific details – such as references to software libraries and documentation – for applications like Stack Roboflow, but is just good enough to accidentally cough up someone personal details in conversation.

OpenAI’s filter for GPT-3 will inspect its output and rewrite the text to replace any, say, potentially real phone numbers with fake ones, sources told us. For example, if it sees a number that follows ten digits starting with convincing area codes, it will replace it with something that is obviously fake, like 111-111-1111 or 012-345-6789. Other types of personal information, such as addresses, do not have a clear structure, and they will be more difficult to filter out. OpenAI is shooting for something more intelligent and elegant than a set of hard-coded regular expressions.

Addresses contain numbers and words with various formats, lengths, spellings. The output filter must accurately predict whether a group of characters looks like an address, some other form of personal data, or something benign. There may be certain hints, such as if the sentence contains the word “street,” or if they are numbers that look like zip or post codes. But it’s not always completely clear, and it's likely the content filter may miss edge cases.

Personal information can't be stripped out of training data, either, as that may take away useful context from the neural network while it's learning. It might need to be able to appreciate the connections between addresses, phone numbers, and names, and the surrounding words to, for example, get an idea of when a block of text is referring to a business or a family, or written for a loved one or as a complaint to an organization. And so, hence the need for an output filter.

"With many of these models, we need to be extremely careful about putting directly generated text in front of a person without any curation or putting it straight on the internet," Mason said.

"This particular issue of personally identifiable information is less of a problem then the amount of bias and problematic language that can be expressed. We need to be careful and think about where it can go awry. Real applications will require multiple layers of testing."

GPT-3 is currently only available to select beta testers through an API, and OpenAI plans charge customers to commercialise the model. It declined to comment on the record. ®

More about


Send us news

Other stories you might like