Silicon Valley doesn't care about poor people: Top AI models kinda suck at ID'ing household stuff in hard-up nations

If it's not a 50in TV or a huge fridge of soylent, we don't want to know

Off-the-shelf object-recognition systems struggle, relatively speaking, to identify common items in hard-up homes in countries across Africa, Asia, and South America. The same software performs better at identifying stuff in richer households in Europe and North America.

Though initially shocking, and then not so much when you think it for a second and consider who makes these systems, it's a great example and reminder of how income-related biases have knock-on effects across the world.

Five popular computer-vision models commercially available via Microsoft Azure, Google Cloud Vision, IBM Watson, Amazon Rekognition, and New York-based AI upstart Clarifai, plus a ResNet-101 model trained using the Tencent ML Images dataset, were given the task of identifying items in photos taken in households around the planet.

Said photos were sourced from The Dollar Street dataset, which documents “everyday life on different income levels.” This collection contains some 30,000 snaps taken in 264 homes by photographers across 54 countries.

The images are labeled from a selection of 135 categories depending on what's pictured, where they were taken, and the household income for the homes in which they were snapped. For this particular study, carried out by Facebook Research, the eggheads only studied 117. Some categories were ignored because the labels were too abstract: for example, the category “most loved item” was not used. Typical categories included in the study had unequivocal labels like "refrigerator", "soap", "door", and so on.

When the AI systems were instructed to identify objects in the photos, there was a clear difference in accuracy when identifying the items in homes in poorer countries compared to ones in richer countries. Objects such as spice jars were more easily recognized in kitchens in Europe or in the United States compared to those in the Philippines, for instance.


Map showing the average accuracy for the six different models tested ... Red indicates an accuracy of about 60 per cent, yellow is about 75 per cent, and green is about 90 per cent. Click to enlarge. Image credit: DeVries et al.

It's, basically, due to the fact these commercially available models just aren't familiar with objects found in poorer households. Their training data covers a lot of products found in richer homes and nations, and not quite so much for stuff bought and sold in broke households and countries.

And it's not just the machine-learning systems developed by geeks in rich-ass Silicon Valley that are at fault: all the tested computer-vision models, whether they were built on the west or east coasts of America, or in Tencent's China, were more comfortable identifying things in more well-off homes than in poorer homes.

“For all systems, the difference in accuracy for household items appearing in the lowest income bracket – less than US$50 per month – is approximately 10 per cent lower than that for household items appearing in the highest income bracket - more than US$3,500 per month,” the Facebook team noted in its write-up of its findings, emitted via arXiv this month.

Rich vs Poor

Digging into the results, we can see that more expensive hand soap, for example, is kept in a bottle with a hand pump, while rigid rectangular bars of soap are cheaper. If these commercial models are more likely to be trained on images from richer households that have liquid soap, they’re less likely to realize that bars of soap are soap, too.


An example photo of what soap looks like in a home in the UK compared to Nepal ... All the models mistook the bar of soap from Nepal as some kind of food. Click to enlarge. Image credit: DeVries et al.

A refrigerator in more developed countries have doors, and are made of stainless steel or are painted white, whereas in less developed countries where electricity is scarce, pots are used to store food. Image-recognition models, therefore, won’t know that these simple storage objects are, in fact, basic refrigerators simply because they haven’t been taught that during the training process beforehand.

The gap in accuracy is most stark for certain categories. Living rooms, for example, had an average accuracy difference of 40 per cent. Next, was beds at 37 per cent, and then guest beds at 35 per cent. This is probably because living rooms in poorer homes in Africa, Asia or South America lacked certain items, such as massive TV sets, comfy sofas, or expensive cabinets. These homes are also less likely to have luxuries, such as guest beds.


Average accuracy for all models identifying objects from homes with different monthly incomes ... Click to enlarge. Image credit: DeVries et al.

The researchers didn’t break down the average accuracy scores for each individual model, however, so it’s difficult to see which one was best or worse. On average, the accuracy was about 85 per cent for identifying items in homes that had a monthly income of $10,097 (~£7958) compared to about 71 per cent for homes that had a monthly income of just $55 (~£43).

But all of them struggled with identifying objects from poorer places. "The absolute difference in accuracy of recognizing items in the United States compared to recognizing them in Somalia or Burkina Faso is around 15−20%," the team noted. "These findings are consistent across a range of commercial cloud services for image recognition."

The Register has asked Facebook for more details. The social network briefly mentioned it tried the test on its own object-recognition engine, which also suffered the same biases (see figure 10 of the paper).

The problem with humans and machines

The paper is a stark reminder of the biases present in machine learning and its disparate impacts. The upshot is that these computer-vision models, as they stand today, aren’t, relatively speaking, effective for folks in poorer circumstances. The problem can be narrowed down further to a lack of culturally diverse training data: too much of it is focused on the English language, which leans the material toward richer households.

If you're working with AI technology, can speak English, and are building training datasets, you probably have a comfortable life and it may not occur to you to include stuff from less-well-off homes.

“The geographical sampling of image datasets is unrepresentative of the world population distribution, and most image datasets were gathered using English as the 'base language',” the researchers explained. Items that don’t have English labels are typically not included in training processes, which heavily skews which types of objects can be recognized.

All of this, however, points to a glaring technical barrier in neural networks; they’re simply too rigid. “Ultimately, the development of object recognition models that work for everyone will likely require the development of training algorithms that can learn new visual classes from few examples and that are less susceptible to statistical variations in training data,” the Facebook eggheads stated.

At the moment, models have to see thousands or even millions of examples of things before they can identify objects effectively, and subtle differences in the images can confuse or throw them.

“We hope this study will help to foster research in all these directions. Solving the issues outlined in this study will allow the development of aids for the visually impaired, photo album organization software, image-search services, that provide the same value for users around the world, irrespective of their socio-economic status,” the researchers concluded. ®

Similar topics

Other stories you might like

  • India reveals home-grown server that won't worry the leading edge

    And a National Blockchain Strategy that calls for gov to host BaaS

    India's government has revealed a home-grown server design that is unlikely to threaten the pacesetters of high tech, but (it hopes) will attract domestic buyers and manufacturers and help to kickstart the nation's hardware industry.

    The "Rudra" design is a two-socket server that can run Intel's Cascade Lake Xeons. The machines are offered in 1U or 2U form factors, each at half-width. A pair of GPUs can be equipped, as can DDR4 RAM.

    Cascade Lake emerged in 2019 and has since been superseded by the Ice Lake architecture launched in April 2021. Indian authorities know Rudra is off the pace, and said a new design capable of supporting four GPUs is already in the works with a reveal planned for June 2022.

    Continue reading
  • Prisons transcribe private phone calls with inmates using speech-to-text AI

    Plus: A drug designed by machine learning algorithms to treat liver disease reaches human clinical trials and more

    In brief Prisons around the US are installing AI speech-to-text models to automatically transcribe conversations with inmates during their phone calls.

    A series of contracts and emails from eight different states revealed how Verus, an AI application developed by LEO Technologies and based on a speech-to-text system offered by Amazon, was used to eavesdrop on prisoners’ phone calls.

    In a sales pitch, LEO’s CEO James Sexton told officials working for a jail in Cook County, Illinois, that one of its customers in Calhoun County, Alabama, uses the software to protect prisons from getting sued, according to an investigation by the Thomson Reuters Foundation.

    Continue reading
  • Battlefield 2042: Please don't be the death knell of the franchise, please don't be the death knell of the franchise

    Another terrible launch, but DICE is already working on improvements

    The RPG Greetings, traveller, and welcome back to The Register Plays Games, our monthly gaming column. Since the last edition on New World, we hit level cap and the "endgame". Around this time, item duping exploits became rife and every attempt Amazon Games made to fix it just broke something else. The post-level 60 "watermark" system for gear drops is also infuriating and tedious, but not something we were able to address in the column. So bear these things in mind if you were ever tempted. On that note, it's time to look at another newly released shit show – Battlefield 2042.

    I wanted to love Battlefield 2042, I really did. After the bum note of the first-person shooter (FPS) franchise's return to Second World War theatres with Battlefield V (2018), I stupidly assumed the next entry from EA-owned Swedish developer DICE would be a return to form. I was wrong.

    The multiplayer military FPS market is dominated by two forces: Activision's Call of Duty (COD) series and EA's Battlefield. Fans of each franchise are loyal to the point of zealotry with little crossover between player bases. Here's where I stand: COD jumped the shark with Modern Warfare 2 in 2009. It's flip-flopped from WW2 to present-day combat and back again, tried sci-fi, and even the Battle Royale trend with the free-to-play Call of Duty: Warzone (2020), which has been thoroughly ruined by hackers and developer inaction.

    Continue reading

Biting the hand that feeds IT © 1998–2021