If you were terrified by the news that "Elon Musk-backed scientists created an AI text generator that was too dangerous to release" then here’s something that may soothe your fears.
Last month, OpenAI published a paper describing a machine-learning-based language system that could crank out what, at first glance, appeared to be convincing fake text: news articles, essays, emails, instant messages, all from given writing prompts, with no regard for facts or balance.
It was, in effect, an auto-ranter, perfect for 2000s-era Blogspot and LiveJournal posts, which is understandable given it was taught from millions of webpages. There were tell-tale signs the generated words were crafted by a computer, such as a lot of repetition, garbled grammar, and contradictions. It would have fit right at home on the internet alongside all the human keyboard warriors.
Frightened by the possibility that the system could be abused by miscreants to churn out masses of fake news articles or convincing spam and phishing emails, OpenAI refused to release the full model. Instead, the California non-profit research hub published a smaller watered down version, dubbed GPT-2 117M for people to tinker with.
Now, a pair of researchers at the MIT-IBM Watson AI Lab and Harvard University in the US have used that material to build a tool that attempts to check whether a piece of text was, indeed, spat out by a machine like GPT-2 117M, or written by a human or very human-like algorithm. It could therefore be used to help filter out GPT-2-117M-generated nonsense from messages, flag up generated web articles, and so on, taking the sting out of this neural network boogeyman menace.
They’ve called their kit Giant Language model Test Room, or GLTR for short. You can copy and paste a chunk of text into this live online demo, and it will estimate how likely each word in the sentence would have been chosen by GPT-2 117M, using a color code system to illustrate.
Green means that a word is within the top 10 suggested outputs that may have been chosen by GPT-117M. Yellow is for the next top 100 words, red groups together the next 1,000 possibilities, and purple is for words beyond the top 1,000 recommendations.
For example, if we run a draft of this article's opening sentence through the detector, we get something back like this:
Lots of green means there's a high probability the text was generated by a machine. Words color-coded in red or purple are word choices that are unlikely to be used by GPT-2 117M.
Phew. About half of the sentence has been highlighted yellow, red, or purple, which means your humble El Reg scribe, thankfully, doesn't write too much like AI bot, and that an AI bot trained on a load of news articles writes not quite the same way we vultures do. The word “terrified,” for example, is coded red because it was unlikely to be picked by the OpenAI model.
In effect, the MIT-IBM-Harvard duo have turned GPT-2 117M on itself.
“We make the assumption that computer generated text fools humans by sticking to the most likely words at each position,” the researchers, Hendrik Strobelt and Sebastian Gehrmann, explained.
"In contrast, natural writing actually more frequently selects unpredictable words that make sense to the domain. That means that we can detect whether a text actually looks too likely to be from a human writer."
When they fed in sample text about unicorns generated by GPT-2 117M, the wall of words is color-coded in mostly green with a few yellow and red highlights, and only two purple ones. It’s a “strong indicator that this is a generated text,” the researchers said.
Good luck trying to fight off spam bots
“We care a lot about the prevention and detection of malicious use of powerful neural models,” Strobelt, a research scientist at MIT-IBM Watson AI Lab, and Gehrmann, PhD student at Harvard University, told The Register.
“By combining our expertise in visualization and modeling, we aimed to make the detection of fake text easier. GLTR is a show of concept that gets at what is possible, utilizing simple visualization on top of a large language model. We see it as a tool that helps to generate new algorithmic and visual ideas to address the important issues of detecting automatically generated fake texts."
They do note some limitations, however. If the infrastructure behind a tool like OpenAI’s GPT-2 was somehow set up to churn out hundreds or thousands of automated samples for fake bot accounts on social media, GLTR, as it stands right now, couldn’t check all of them quickly enough, as it can only analyse individual cases one at a time. We imagine it could be scaled out though, if taken beyond a live web demo.
“This tool highlights that the approach used with GPT-2 leaves a noticeable fingerprint in some samples since it never generates very rare or unlikely words for a given context,” a spokesperson from OpenAI told El Reg. “Whether the GLTR tool also works at detecting samples generated from GPT-2 with other approaches is not clear."
Strobelt and Gehrmann agreed that GLTR works by exploiting the fact that language models like GPT-2 uses a simple sampling method to predict text. “Adversaries might change the sampling parameters per word or sentence to make it look more similar to the language it is trying to imitate," they said.
"However, we speculate that an adversarial sampling scheme would lead to worse text, as the model would be forced to generate words it deemed unlikely. This would lead to other detectable properties in a text. Therefore, despite its limitations, we believe that GLTR can spark the development of similar ideas that work at greater scale.”
GLTR appears to work well enough against GPT-2 117M, though it doesn't always flag up prose crafted by other software. Janelle Shane, a research scientist focused on optics who likes to play with neural networks, fed it a sample generated by her own neural network, and found that it didn't consider the text very AI-like.
I took a look at a new tool for detecting AI-written text. Apparently the text my neural nets generate is so unpredictably incoherent that it registers as human.— Janelle Shane (@JanelleCShane) March 8, 2019
(purple + red = unpredictable. lots of this = probably human-written)https://t.co/OdIdaZQ8s6 pic.twitter.com/CfwM4fRFoh
"I think my neural net-generated text tricked GLTR because it was too weird to predict. It wasn't following the grammar rules GLTR had learned, and it kept interjecting bizarre phrases and non sequiturs. GLTR had learned to predict more mundane text," Shane told El Reg.
Nevertheless, OpenAI said: "It's important that people are building tools to begin to address the problem of detection of synthetic text generation and we're excited to see work being done here."
You can play around with GLTR here. ®