How not to train your Dragon: What happens when you teach an AI game sex-abuse stories then blame players
Next chapter in AI Dungeon saga: Banning gamers for what the bot said
Feature Let this be a warning to all AI developers: check your data before you train your model.
Latitude, the creators of AI Dungeon, a text-based fantasy adventure game powered by OpenAI’s GPT-3 model, learned this lesson the hard way. Earlier this year, the company, led by two Mormon brothers in Utah, decided to scrub the game clean of obscene sexual content.
Inspired by Dungeons and Dragons, AI Dungeon is played by building fictional worlds from written conversations. Players feed sentences into the program, and an instance of GPT-3 hosted in OpenAI's cloud responds with fully formed passages of prose. Over time, a story featuring characters, dialogue, and derring-do unfolds.
Let humans go wild co-writing fictional stories with arguably the world’s most-powerful automated text-generation system on the internet, and some of these adventures will, unsurprisingly, turn dark and erotic. Latitude allowed people to write freely and profited from these types of graphic tales when players paid a monthly subscription to continue their narratives. AI Dungeon stated that players should be over eighteen.
Not only were half of an AI text adventure generator's sessions NSFW but some involved depictions of sex with childrenREAD MORE
However, it suddenly ramped-up efforts to rid the game of underage sexual encounters as well as certain lewd and NSFW content. Sex acts between two fictional consenting adults was fine, though. First, the developers installed a glitchy content filter that obstructed fans from playing the game even if they followed the rules. Mentions of something as benign as four watermelons, for example, would prompt the software to reply: “Uh oh, this took a weird turn…” That was the game's way of saying, change the subject – you're stepping out of line.
Players were frustrated with the sudden censorship and buggy content filter ruining their games. The turning point, however, came when Latitude started automatically banning gamers for generating lewd content that was no longer allowed. While that may seem a sensible move by Latitude, gamers were often barred when it was the machine that wrote the filth first, all by itself and unprompted.
The AI software would suddenly turn innocuous plots unnecessarily naughty, causing the human player to be booted out.
Frustration turned into anger when accounts were unfairly frozen leaving them unable to cancel their subscriptions. Even when your El Reg vulture role-played surviving a zombie apocalypse, the game described an 11-year-old character as “very pretty” and wearing a “skimpy school uniform.”
Why did AI Dungeon have such a dirty mind? A coder known by the pseudonym AuroraPurgatio later found out and revealed it had been trained on lewd fanfiction and parodies scraped from the internet, which contained the exact type of content Latitude scrambled to ban.
AI Dungeon was predisposed to skew people’s narratives, automatically inserting unsavory words and characters into their stories. This wasn't a glitch, it was baked into the game. It was often not the player’s fault when violent or pornographic plot lines formed, and getting locked out of the game for something they couldn't control was a kick in the teeth.
“It is quite unfair to say the least,” one user told The Register at the time.
“I felt a lot of emotions when discovering the data the AI was trained on. I also think that the suspension system is Latitude giving up on handling the problem and straight up attacking their users. I personally believe that Latitude couldn’t figure out how to fix the situation and decided to be aggressive to their users to get rid of everything they find ‘toxic’ in AI Dungeon instead of trying to fix the reason most of their users are upset.”
I think that the suspension system is Latitude giving up on handling the problem and straight up attacking their users ... Latitude couldn’t figure out how to fix the situation and decided to be aggressive
The data used to teach AI Dungeon, as it turned out, contained numerous creepy NSFW words. This was discovered when AuroraPurgatio trawled through one of Latitude’s old GitHub repositories and found a 30MB text file used to train the game.
The document contained a dump of fantasy stories written by humans that Latitude’s co-founder and CEO Nick Walton scraped from the website Choose Your Story. These yarns are structured like those choose-your-own-adventure books where you have to make decisions and turn to a given page to find out what happens next, which is ideal for training a text game like AI Dungeon.
These stories were used to teach the large language model powering AI Dungeon to mimic the writing style in these stories. There is another file in the repository containing the code Walton used to scrape the stories. It also lists 50 URLs to Choose Your Story tales. When El Reg ran that Python script, the first story that was copied from the site explicitly mentioned “child porn."
"You NEED to actually get your pedo hands on a little girl," the text read. Even though the tale appears to be a dark parody, the computer doesn't know that: all it sees is a day in the life of a fictional child predator to mix into future tales.
The code continued to pull in more pieces from the Choose Your Story site. It's basically amateur sci-fi writing with mostly innocent narratives and some distressing scenarios, such as one in which magical beings casually discuss forcing themselves on unwilling mortals to see if it's possible to procreate. The kind of data you might not want to train your machine-learning system on.
AuroraPurgatio told us she isn’t against NSFW stories on AI Dungeon. She searched through the training data and shared it because she wanted to unmask the company’s hypocrisy for everyone to see: people were being banned as a result of the choice of information used to teach the game how to play.
“These automatic suspensions have been occurring when the AI generates this content on its own," she told us.
"Users are then locked out of their accounts, unable to remove the content or cancel their subscriptions … The botched censorship combined with the automatic suspensions are the final straws that broke the camel’s back.”
The botched censorship combined with the automatic suspensions are the final straws that broke the camel’s back
AI Dungeon started out as a university project when Walton was an undergraduate computer-science student at Brigham Young University in Utah. The first version was an open-source effort built using OpenAI’s previous GPT-2 model that was released in 2019. The game went viral, with thousands of people wanting to play, and he turned his idea into a private game company marketed as Latitude in 2020.
Walton claimed more than one million people were actively playing AI Dungeon a month, and by early 2021, it had raised more than $4m in funding.
When the biz upgraded AI Dungeon to use OpenAI’s more-advanced GPT-3, it trained the game, as with GPT-2, using text from the aforementioned problematic dataset. This time, engineers at OpenAI helped fine-tune a cloud-hosted instance of the model for Latitude, accessible via an API, the resulting neural network breaking the machine-learning super-lab's own policies on unsuitable content.
OpenAI told us when it realized what AI Dungeon was producing, it ordered the developers to create filters to remove any offending text and bring the software in line with its acceptable-use guidelines. It also reiterated how crucial it is to use high-quality training data when building an automated system, something Latitude did not do, arguably.
When we discovered unsafe content was being displayed on AI Dungeon, in violation of our policies, we immediately took action by requiring them to improve their content filters
“We have trialed fine-tuning GPT-3 with a few of our customers and believe the quality of the data used to train the model is important, especially as these models become more capable," a spokesperson for OpenAI told us earlier this year.
“We require customers to minimize the risks of social harm being caused by their applications by following our Safety Best Practices, which includes filtering for unsafe inputs, and are working with AI Dungeon to fine-tune safer models — a process that will take time."
They added: “As part of our commitment to the safe and responsible deployment of AI, we are committed to making our models safer and building the best possible safeguards to identify inappropriate content and address potential misuse. When we discovered unsafe content was being displayed on AI Dungeon, in violation of our policies, we immediately took action by requiring them to improve their content filters.”
Latitude's developers, who were once responsive on the AI Dungeon Discord, where nearly 25,000 members talked about all things related to the game, went quiet as the buggy filters went live. Members flooded the chat server with tons of comments and questions, and were met with silence. Latitude employees who helped moderate the chat left.
The company then came out of the woodwork to announce AI Dungeon had been revamped with a multiplayer mode and a choice of fantasy worlds, with names like Alarathos or Kringle, to play in, each with their own environment.
“After several weeks of collaboration with OpenAI, running AB tests, fine-tuning on AI Dungeon data, and getting feedback, we’re ready to enable AI Dungeon to run on a GPT-3 based model that’s one of the most powerful AI models in the world,” Latitude said over the summer. “We’re calling the AI Dungeon version of this new model Dragon. It’s available now for premium users.”
- Turns out humans are leading AI systems astray because we can't agree on labeling
- MIT apologizes, permanently pulls offline huge dataset that taught AI systems to use racist, misogynistic slurs
- Inside the 1TB ImageNet data set used to train the world's AI: Naked kids, drunken frat parties, porno stars, and more
New name, new worlds, new training data, you might think. Well, about that.
Training a model on a collection of smutty stories introduces an interesting side effect: when playing earlier versions of AI Dungeon, some of the characters in those scraped tales would pop up and hijack people’s games. For example, Count Grey would sometimes appear when players mention vampires. There are also others, such as Doctor Kessel, who tends to abduct women.
“I remember one time I was playing AI Dungeon, I encountered an underage ghost who knew Dr Kessel, who says he, and I quote, 'likes to f*** ghosts',” one player told El Reg.
“These characters do sometimes become the primary antagonist of some stories, a ‘dark lord’ figure to face off the player. Again if you keep looking you will find them everywhere. They tend to show up when you least expect them.”
Alas, these fictional characters are still there in the new Dragon model.
Nothing has changed since last year
“Nothing has changed about the fine tune or model sizes since last year," AuroraPurgatio told us. "I ran a test just to confirm and it's definitely the same data. Doctor Kessel, Count Grey, and Dr Kovas, the big three, all readily appear. And within the first line tend to go pretty violent. Most of the prompts turned dark quickly without prompting. Eyes getting torn out, necks getting snapped, terrorism, etc within the first couple inputs."
Like many other ex-fans, she no longer plays the game and has, instead, flocked to Novel AI, another text-generation AI game using the non-OpenAI giant language model GPT-J-6B. The system was built by a self-described grassroots collective of researchers known as Eleuther AI and is seen as an open-source alternative to GPT-3. Gamers believe they will always have free rein over their content since Novel AI won’t have to bend to the whims of corporate companies, like OpenAI, controlling what their software can and can't do.
“The recent controversies are of Latitude’s own making," AuroraPurgatio said. "I cannot support or trust a company that completely reverses its stances on privacy and censorship on a dime ... Cutting off the community with complete silence for over a month, and throwing professionalism to the wind feels like they are trying their hardest to deliberately destroy their own creation.
"There was so much promise in AI Dungeon.”
Latitude and its CEO did not answer El Reg's multiple requests for comment. ®
In a conciliatory blog post at the end of September, Walton announced an overhaul of the game's filtering process, acknowledging its failures.
"One of the most challenging things we’ve faced is how to grapple with the potential of AI-powered games to produce harmful content," he wrote.
"We recognize that over the last several months aspects of how we’ve had to approach this problem have frustrated many users. Because there was moderation of unpublished content, users were often worried about what might trigger a flag and whether something they or the AI did could get them suspended or banned.
"Users reported that they couldn’t feel safe to play and explore with the feeling that someone might be reading their story if flagged."
Curiously, Walton said "there are some types of content, however, that we’re not okay with our service being used to create," seemingly oblivious to the problematic stories that have featured in the training dataset.
So, what's going to change? "It means we will not be doing any moderation of unpublished single-player content," he said.
"This means we won’t have any flags, suspensions, or bans for anything users do in single-player play. We will have technological barriers that will seek to prevent the AI from generating content we aren’t okay with it creating — but there won’t be consequences for users and no humans will review those users’ content if those walls are hit. Essentially, users can do or say what they want in single-player play, but the AI may sometimes decline to generate certain types of content."