The controversial study that examined whether or not machine-learning code could determine a person’s sexual orientation just from their face has been retried – and produced eyebrow-raising results.
John Leuner, a master’s student studying information technology at South Africa's University of Pretoria, attempted to reproduce the aforementioned study, published in 2017 by academics at Stanford University in the US. Unsurprisingly, that original work kicked up a massive fuss at the time, with many skeptical that computers, which have zero knowledge or understanding of something as complex as sexuality, could really predict whether someone was gay or straight from their fizzog.
The Stanford eggheads behind that first research – Yilun Wang, a graduate student, and Michal Kosinski, an associate professor – even claimed that not only could neural networks suss out a person’s sexual orientation, algorithms had an even better gaydar than humans.
In November last year, Leuner repeated the experiment using the same neural network architectures in the previous study, although he used a different dataset, this one containing 20,910 photographs scraped from 500,000 profile images taken from three dating websites. Fast forward to late February, and the master's student emitted his findings online, as part of his degree coursework.
Leuner didn't disclose what those dating sites were, by the way, and, we understand, he didn't get any explicit permission from people to use their photos. "Unfortunately it's not feasible for a study like this," he told The Register. "I do take care to preserve individuals' privacy."
The dataset was split in 20 parts. Neural network models were trained using 19 parts, and the remaining part was used for testing. The training process was repeated 20 times for good measure.
He found that VGG-Face, a convolutional neural network pre-trained on one million photographs of 2,622 celebrities, when using his own dating-site-sourced dataset, was accurate at predicting the sexuality of males with 68 per cent accuracy – better than a coin flip – and females with 77 per cent accuracy. A facial morphology classifier, another machine learning model that inspects facial features in photographs, was 62 per cent accurate for males and 72 per cent accurate for females. Not amazing, but not completely wrong.
For reference, the Wang and Kosinski study achieved 81 to 85 per cent accuracy for males, and 70 to 71 per cent for women, using their datasets. Humans got it right 61 per cent of the time for men, and 54 per cent for women, in a comparison study.
So, Leuner's AI performed better than humans, and better than a fifty-fifty coin flip, but wasn't as good as the Stanford pair's software.
A Google engineer, Blaise Aguera y Arcas, blasted the original study early last year, and pointed out various reasons why software should struggle or fail to classify human sexuality correctly. He believed neural networks were latching onto things like whether a person was wearing certain makeup or a particular fashion of glasses to determine sexual orientation, rather than using their actual facial structure.
Notably, straight women were more likely to wear eye shadow than gay women in Wang and Kosinski’s dataset. Straight men were more likely to wear glasses than gay men. The neural networks were picking on our own fashion and superficial biases, rather than scrutinizing the shape of our cheeks, noses, eyes, and so on.
When Leuner corrected for these factors in his test, by including photos of the same people wearing glasses and not wearing glasses or having more or less facial hair, his neural network code was still fairly accurate – better than a coin flip – at labeling people’s sexuality.
“The study shows that the head pose is not correlated with sexual orientation ... The models are still able to predict sexual orientation even while controlling for the presence or absence of facial hair and eyewear,” he stated in his report.
Finding the key factors
So, does this mean that AI really can tell if someone is gay or straight from their face? No, not really. In a third experiment, Leuner completely blurred out the faces so the algorithms couldn’t analyze each person’s facial structure at all.
And guess what? The software was still able predict sexual orientation. In fact, it was accurate about 63 per cent for males and 72 per cent for females, pretty much on par with the non-blurred VGG-Face and facial morphology model.
It would appear the neural networks really are picking up on superficial signs rather than analyzing facial structure. Wang and Kosinski said their research was proof for the “prenatal hormone theory,” an idea that connects a person’s sexuality to the hormones they were exposed to when they were a fetus inside their mother’s womb. It would mean that biological factors such as a person’s facial structure would indicate whether someone was gay or not.
Leuner’s results, however, don’t support that idea at all. “While demonstrating that dating profile images carry rich information about sexual orientation, these results leave open the question of how much is determined by facial morphology and how much by differences in grooming, presentation, and lifestyle,” he admitted.
Lack of ethics
"[Although] the fact that the blurred images are reasonable predictors doesn't tell us that AI can't be good predictors. What it tells us is that there might be information in the images predictive of sexual orientation that we didn't expect, such as brighter images for one of the groups, or more saturated colors in one group.
"Not just color as we know it but it could be differences in the brightness or saturation of the images. The CNN may well be generating features that capture these types of differences. The facial morphology classifier on the other hand is very unlikely to contain this type of signal in its output. It was trained to accurately find the positions of the eyes, nose, [or] mouth."
Os Keyes, a PhD student at the University of Washington in the US, who is studying gender and algorithms, was unimpressed, told The Register “this study is a nonentity,” and added:
“The paper proposes replicating the original 'gay faces' study in a way that addresses concerns about social factors influencing the classifier. But it doesn't really do that at all. The attempt to control for presentation only uses three image sets – it's far too tiny to be able to show anything of interest – and the factors controlled for are only glasses and beards.
“This is despite the fact that there are a lot of tells of other possible social cues going on; the study notes that they found eyes and eyebrows were accurate distinguishers, for example, which is not surprising if you consider that straight and bisexual women are far more likely to wear mascara and other makeup, and queer men are far more likely to get their eyebrows done.”
The original study raised ethical concerns about the possible negative consequences of using a system to determine people’s sexuality. In some countries, homosexuality is illegal, so the technology could endanger people’s lives if used by authorities to "out" and detain suspected gay folk.
Has AI gone too far? DeepTingle turns El Reg news into terrible eroticaREAD MORE
It’s unethical for other reasons, too, Keyes said, adding: “Researchers working here have a terrible sense of ethics, in both their methods and in their premise. For example, this [Leuner] paper takes 500,000 images from dating sites, but notes that it does not specify the sites in question to protect subject privacy. That's nice, and all, but those photo subjects never offered to be participants in this study. The mass-scraping of websites like that is usually straight-up illegal.
“Moreover, this entire line of thought is premised on the idea that there is value to be gained in working out why 'gay face' classifiers might work – value in further describing, defining and setting out the methodology for any tinpot dictator or bigot with a computer who might want to oppress queer people.”
Leuner agreed that machine-learning models, like the ones he developed and trained, "have a great potential to be misused."
"Even if they don't work, there is a possibility that they might be used to generate fear," he said. "If they do work they can be used in very horrible ways."
Nevertheless, he said he wanted to repeat the earlier work to verify the original claims made by Kosinski that sexuality could be predicted with machine learning. "Initially [it] sounded implausible to me," said the master's student. "From an ethical point of view I take the same standpoint as he does, I believe that societies should be engaging in a debate about how powerful these new technologies are and how easily they can be abused.
"The first step for that kind of debate is to demonstrate that these tools really do create new capabilities. Ideally we would also want to understand exactly how they work but it will still take some time to shed more light on that." ®