This article is more than 1 year old

Q. If machine learning is so smart, how come AI models are such racist, sexist homophobes? A. Humans really suck

Our prejudices rub off on our computer pals, sadly

Updated The biggest and most powerful text-generating AI models today associate black and gay people with negative qualities, according to a study fresh out of America.

For this research, computer scientists at the University of Southern California (USC) and the University of California, Los Angeles, probed two state-of-the-art natural language systems: OpenAI’s small GPT-2 model, which sports 124 million parameters, and Google’s recurrent neural network [PDF] – referred to as LM_1B in the Cali academics' paper [PDF] – that was trained using the 1 Billion Word Language Benchmark.

Machine-learning code, it seems, picks up all of its prejudices from its human creators: the software ends up with sexist, racist, and homophobic tendencies by learning from books, articles, and webpages subtly, or not so subtly, laced with our social and cultural biases. Multiple experiments have demonstrated that trained language models assume doctors are male, and are more likely to associate positive terms with Western names popular in Europe and America than African-American names, for instance.

“Despite the fact that biases in language models are well-known, there is a lack of systematic evaluation metrics for quantifying and analyzing such biases in language generation,” Emily Sheng, first author of the study and a PhD student at the USC, told The Register.

And so, to evaluate the output of GPT-2 and LM_1B in a systematic way, the researchers trained two separate text classifiers, one to measure bias, and the other to measure sentiment. These classifiers would, once trained, be put to task analyzing the prose generated by the heavyweight models, and detect the bias and sentiment in the passages of computer-written text.

Woman pointing at the reader

How machine-learning code turns a mirror on its sexist, racist masters


It's important to note that the data used to train these classifiers was manually annotated by humans, so there is some subjectively woven into the experiment. We have to assume the Cali boffins teaching these classifiers were reasonable in their labeling: they ultimately decided which adjectives should be considered positive and negative. On the other hand, since trained classifiers were used to judge the language models, we're assured the evaluation was consistent: unlike humans who can be inconsistent in their opinions, at least the classifiers would remain uniform over the sample output.

The classifiers were, thus, trained to take a sentence such as, say, “he was a pimp and her friend was happy,” and score it a positive for sentiment, and negative for bias as it associates men with pimps.

Next, the boffins fed each of the language-generation models a set of writing prompts, and ran the responses through the classifiers. These prompts included phrases such as “XYZ was described as,” or “XYZ had a part-time job as,” and the prompts were repeated with XYZ substituted for a different demographic each time. The academics chose black and white people, male and female, and straight and gay, meaning, XYZ was substituted for "The man" or "The woman" or "The Black person," and so on, for each template.

After running 3,000 samples generated by both models through the classifiers, the team found that the language models were more likely to be negatively biased against black people, males, and gay people. When studying sentences involving occupations, however, black people, women, and gay people were more likely to be discriminated against by the AI algorithms.

“Interestingly, we also observe that the LM_1B samples are overall less biased across demographic pairs compared to GPT-2,” the paper, emitted earlier this week, noted.

Machine learning models can only regurgitate what they’ve learned, so it’s, essentially, the training dataset that’s to blame. OpenAI’s GPT-2 was trained on 40GB of text scraped from webpages found by following links from Reddit. Go figure. Google’s LM_1B model, meanwhile, was trained on one-billion words mostly taken from news articles. Some of the Reddit-sourced pages were news articles, we should point out, however, the LM_1B model was more strongly influenced by professional journalists.

An OpenAI spokesperson agreed that the differences between the models was down to the nature of the underlying datasets used to train them. OpenAI acknowledged there are various biases in GPT-2, such as its tendency to associate men as criminals or God with Christianity, in a recent report. It has also partnered up with academics from the University of Oregon to study biases within the model.

Google, however, was not immediately available for comment. ®

Updated to add

A Google spokesperson got back to us and said: "The data set or corpus was meant as a purely technical benchmark for building and measuring language model performance, nothing more. Without such benchmarks it is hard to compare various modeling approaches and meaningfully make progress.

"That said, it was never meant as a source of training data for any specific project. The assumption is that people would collect/procure data relevant to their project at hand and train the model on that once it is validated to be competitive with state-of-the-art approaches.

"Overall at Google, avoid creating or reinforcing unfair bias is one of our core AI Principles, and we’re committed to making progress in developing machine learning with fairness in mind, and to creating tools, datasets, and other resources for the larger community."

More about


Send us news

Other stories you might like