AI models still racist, even with more balanced training
Plus: NIST publishes 86-page report investigating algorithmic bias
AI algorithms can still come loaded with racial bias, even if they're trained on data more representative of different ethnic groups, according to new research.
An international team of researchers analyzed how accurate algorithms were at predicting various cognitive behaviors and health measurements from brain fMRI scans, such as memory, mood, and even grip strength. Medical datasets are often skewed – they're not collected from a diverse enough sample size, and certain groups of the population are left out or misrepresented.
It's not surprising if predictive models that try to detect skin cancer, for example, aren't as effective when analyzing darker skin tones than lighter ones. Biased datasets are often the source for why AI models are also biased. But a paper published in Science Advances has found that these unwanted behaviors in algorithms can persist even if they're trained on datasets that are more fair and diverse.
The team performed a series of experiments with two datasets containing tens of thousands of fMRI scans of people's brains – including data from the Human Connectome Project and the Adolescent Brain Cognitive Development. In order to probe how racial disparities impacted the predictive models' performance, they tried to minimize the impact other variables, such as age or gender, might have on accuracy.
"When predictive models were trained on data dominated by White Americans (WA), out-of-sample prediction errors were generally higher for African Americans (AA) than for WA," the paper reads.
That shouldn't raise any eyebrows, but what is interesting is that those errors didn't go away even when they trained algorithms on datasets containing samples from an equal representation for both WA and AA, or from only AAs.
Algorithms trained solely on data samples from AAs were still not as accurate at predicting cognitive behaviors for the population group as those trained on WAs were for WAs, going against common understanding of how these systems normally work. "When models were trained on AA only, compared to training only on WA or an equal number of AA and WA participants, AA prediction accuracy improved but stayed below that for WA," the abstract continued. Why?
The researchers aren't quite sure why the model behaves that way, but believe it could be due to how the data was collected. "For now it's hard to say where the remaining WA-AA prediction accuracy difference when model was trained only on AA came from," Jingwei Li, a postdoctoral research fellow at the Institute of Neuroscience and Medicine, Brain and Behaviour from the Jülich Research Centre in Germany, told The Register.
"Several steps during neuroimaging preprocessing could have influenced the result. For example, during preprocessing, a convention is to align individuals' brains to a standard brain template so that individual brains can be comparable. But these brain templates were usually created from the White population."
"Same for the pre-defined functional atlases, where voxels in brain images can be grouped into regions based on their functional homogeneity … But the delineation of such functional atlases was again often based on datasets predominated by White or European population in terms of sample size."
- Quantum-tunneling memory could boost AI energy efficiency by 100x
- Worried about being replaced by a robot? Become a physicist
- Meet Flamingo, Deepmind's latest open-ended AI
- Meta materials: Facebook using AI to design green concrete
Another reason could be that the data collected from the patients isn't quite accurate. "It is also a question whether the psychometric tests we use nowadays indeed capture the correct underlying psychological concept for minority groups," she added.
When the algorithms were applied to the Human Connectome Project dataset, it was more accurate at predicting whether WAs were more likely to be angry or aggressive or if they had better reading skills. The same attempt at making these predictions was less successful with the AA cohort.
Li said the research doesn't confirm there are neurobiological or psychometric measures that differ in populations due to their ethnicities. Instead, she wants to highlight how having a more diverse dataset isn't enough to ensure AI algorithms are less biased and more fair.
"I would be very careful to not make any statement saying WA and AA are different in these neurobiological or psychometric measures simply because of their ethnicity. As we have also discussed in the paper, ethnicity or race is such a complex concept considering all the historical, societal, educational factors. We do not want to strengthen [racial] stereotypes or increase structural racism. In opposite, the aim of this paper is to advocate for more fairness across ethnic groups in the specific context of neuroimaging analysis."
Algorithmic bias is an issue the US government is trying to address. The National Institute of Standards and Technology published a report this week that came to similar conclusions.
"Current attempts for addressing the harmful effects of AI bias remain focused on computational factors such as representativeness of datasets and fairness of machine learning algorithms," the report [PDF] read.
"These remedies are vital for mitigating bias, and more work remains. Yet, human and systemic institutional and societal factors are significant sources of AI bias as well, and are currently overlooked." ®