Software

AI + ML

AI hiring bias? Men with Anglo-Saxon names score lower in tech interviews

Study suggests hiding every Tom, Dick, and Harry's personal info from HR bots


In mock interviews for software engineering jobs, recent AI models that evaluated responses rated men less favorably – particularly those with Anglo-Saxon names, according to recent research.

The goal of the study, conducted by Celeste De Nadai as an undergraduate thesis project at the Royal Institute of Technology (KTH) in Stockholm, Sweden, was to investigate whether current-generation LLMs demonstrate bias when presented with gender data and with names that allow cultural inferences to be made.

De Nadai, also chief marketing officer at AI content biz Monok, told The Register in a phone interview her interest in the topic followed from prior reports about bias in older AI models. She pointed to a recent Bloomberg article that questioned the use of neural networks for recruitment due to name-based bias.

"There wasn't any research with a larger dataset that was using the latest models," explained De Nadai. "The research that I've seen was about the GPT-3.5 or older models. What was interesting for me was the smaller models, the newest ones, how are they behaving compared to the old ones because they have a different dataset?"

De Nadai said part of the reason she undertook the project was that she was seeing a lot of AI recruiting startups that said they used language models and were bias-free.

"My point of view was, 'No, you're not bias-free,'" she explained. "You can remove the name, but you still have some markers, even just in the language, that can help an LLM understand where one person comes from."

De Nadai's study [PDF] looked at Google's Gemini-1.5-flash, Mistral AI's Open-Mistral-nemo-2407, and OpenAI's GPT4o-mini, to see how they classified and rated responses to 24 job interview questions, given variations in temperature (a model setting that influences predictability and randomness), in gender, and in names associated with cultural groups.

There is an inherent bias in these services where, in this specific study case, male names are discriminated against in general and Anglo-Saxon names in particular

Crucially, various combinations of names and backgrounds were used for the same answers to test the models. Thus this isn't the case that men with Anglo-Saxon names just aren't as good as their opposite at software engineering; it's that when the models were presented with that kind of male applicant, the computer systems down-rated otherwise favored answers.

"The applicant’s name and gender is permuted 200 times, corresponding with 200 discrete personas, subdivided into 100 males and 100 females, and grouped into four different distinct cultural groups (West African, East Asian, Middle Eastern, Anglo-Saxon) reflected by their first name and surname," the study explains.

Each LLM was asked to make 4,800 inference calls for each of two different system prompts (one that includes more detailed grading instructions) over a range of 15 temperature settings (0.1 to 1.5, at 0.1 intervals), for a total number of 432,000 inference calls.

According to the study, the expected finding was that men and Western names would be favored, as prior bias studies have found. Instead, the results told a different story.

"The results prove with statistical significance that there is an inherent bias in these services where, in this specific study case, male names are discriminated against in general and Anglo-Saxon names in particular," the study reports.

The Gemini model performed better than the others, however, when using a prompt containing the more detailed question grading criteria and a temperature above 1.

De Nadai has a theory about the findings but said she cannot prove it: She believes the bias against men with Anglo-Saxon names reflects an over-correction to dial back output that was biased in the opposite direction – seen in prior studies.

Making AI models respond fairly, with the intelligence implied by the term "artificial intelligence," remains an unresolved challenge. Recall that Google in February suspended its Gemini (formerly Bard) generative AI service after it created images of World War II-era German soldiers and US Founding Fathers with an implausible range of racial and ethnic diversity. In bending over backwards to avoid White-washing history, the model erased White people from historically accurate scenes.

One way to make the interview evaluation results more fair, the study suggests, involves providing a prompt with rigid, detailed criteria about how to grade interview questions. Temperature adjustments can help or hurt, depending on the model.

The paper concludes that model biases cannot be fully mitigated by adjusting settings and prompts alone. And it argues for denying models access to information that might be used to make unwanted inferences – such as name and gender in a hiring context.

"Addressing these biases requires a nuanced approach, considering both the model's characteristics and the context in which it operates," the study suggests. "When classifying or evaluating, we propose you always mask the name and obfuscate the gender to ensure the results are as general and unbiased as possible as well as provide a criteria for how to grade in your system-instruct prompt."

Google, OpenAI, and Mistral AI did not respond to requests for comment. ®

Send us news
140 Comments

Mental toll: Scale AI, Outlier sued by humans paid to steer AI away from our darkest depths

Who guards the guardrail makers? Not the bosses who hire them, it's alleged

Tool touted as 'first AI software engineer' is bad at its job, testers claim

Nailed just 15% of assigned tasks

Megan, AI recruiting agent, is on the job, giving bosses fewer reasons to hire in HR

She doesn't feel pity, remorse, or fear, but she'll craft a polite email message as she turns you down

LinkedIn accused of training AI on private messages

Microsoft's IG-for-suits insists lawsuit's claims are without merit

Microsoft eggheads say AI can never be made secure – after testing Redmond's own products

If you want a picture of the future, imagine your infosec team stamping on software forever

Google reports halving code migration time with AI help

Chocolate Factory slurps own dogfood, sheds drudgery in specific areas

OpenAI's Operator agent wants to tackle your online chores – just don’t expect it to nail every task

Hello Operator? Can you give me number nine? Can I see you later? Will you give me back my dime?

Sage Copilot grounded briefly to fix AI misbehavior

'Minor issue' with showing accounting customers 'unrelated business information' required repairs

Uncle Sam now targets six landlord giants in war on alleged algorithmic rent fixing

One of ya is gonna sing like a canary, prosecutors say

UK government pledges law against sexually explicit deepfakes

Not just making them, but sharing them too

OpenAI's ChatGPT crawler can be tricked into DDoSing sites, answering your queries

The S in LLM stands for Security

Price-fixing-as-a-service: The claim against healthcare cost-cruncher MultiPlan

Attorney Jennifer Scullion on allegations of algorithmic suppression of competition