Once again, racial biases show up in AI image databases, this time turning Barack Obama white

Researchers used a pretrained off-the-shelf model from Nvidia


A new computer vision technique that helps convert blurry photos of people into fake, realistic images has come under fire for being racially biased towards white people.

The tool known as PULSE was introduced by a group of researchers from Duke University, and was presented at the virtual Conference on Computer Vision and Pattern Recognition last week. Given a pixelated portrait as input, PULSE searches through computer generated images and picks the best one that it believes is the closest match to the original photo.

All the imaginary images are outputs generated by StyleGAN, a generative adversarial network developed by Nvidia back in 2018. The whole system, essentially, turns a small fuzzy photo into a larger, higher resolution image, a method known as upscaling.

PULSE cannot, however, be used to reveal the true identity of the person hidden in the blurry photo, since it only considers images that have been made up by a generative adversarial network. The people depicted in its results do not exist in real life.

Instead, the tool looks for features like hair length and skin colour in the blurry image and selects a new face dreamed up by StyleGAN that might resemble the person in the original image. Unfortunately, the tool struggles when the obscured photos feature people of colour since it often chooses fake images of people that are white.

Here's a good demonstration of the overall model: The original face on the left may be difficult to make out, but most people would be able to tell that it's former US president Barack Obama.

Given that image, PULSE has selected an image on the right using StyleGAN. The computer-generated face obviously doesn't look anything like Obama at all. In the original image Obama has dark skin, black hair, and brown eyes, but the result is, instead, someone that has white skin, blue eyes, and brown hair.

Robert Ness, a machine learning scientist who teaches workshops at his online platform Altdeep, also found other examples of racial biases when he was toying around with the software. When he fed his own photo and one of Alexandria Ocasio-Cortez, the US Representative for New York's 14th congressional district, the results were both skewed towards Caucasian-looking faces.

First, the model blurred both of his original images. Next, PULSE picked images generated by StyleGAN based on the pixelated inputs. "This model demonstrates the same bias issues we've seen in other more commonly used data-driven algorithms like search," he told The Register. "This instance of deep generative modeling just happens to make the issue glaringly obvious."

Do not test a biased model on a biased dataset!

The researchers have acknowledged the issue and believe it stems from existing biases within the StyleGAN model itself. They updated their research paper with a section addressing the issue of racial biases in their work.

The problem is that StyleGAN - trained on 70,000 images scraped from Flickr - tends to generate images of white people. In fact, a recent research paper [PDF] examining the demographics of StyleGAN images discovered that it spat out images of white people 72.6 per cent of the time, compared to just 10.1 per cent for black people, and 3.4 per cent for Indian people. PULSE, therefore, is also more likely to choose images of white people since those are the images being generated by StyleGAN.

The team contacted the original researchers from Nvidia to notify them about their issue, but didn't get a response. "NVIDIA takes diversity and inclusion seriously," a spokesperson from the company told The Register.

"We're always striving to create better datasets and algorithms to overcome any existing bias in current models. We are also doing more research on the algorithmic bias in deep-learning models and methods to mitigate them."

Sachit Menon and Alex Damian, both co-authors and recent graduates at the University of Duke, didn't realise this when they decided to carry out the research. They also didn't realise the racial biases in StyleGAN since they tested their tool on another dataset that collects images of celebrities known as CelebA.

"It turns out that 90 per cent of the photos in CelebA are white people," Menon told El Reg. By testing a biased model on a dataset that also contains the same biases, the issue was overlooked.

"We tried [the tool] on me, and I'm Indian," he added. "Sometimes it worked, but a lot of the times it would make me white. And then I found this other paper that showed StyleGAN only makes Indian people 3 per cent of the time. And when we saw this break down, that's when we realised it was biased."

The team used Nvidia's off-the-shelf model and did not train StyleGAN themselves. "In an ideal world, there would be a more balanced dataset, and we could have used a pretrained model that reflected this," said Damian. "If we used a different evaluation dataset that contained more images of people of colour then we would have spotted the issue. That's a big takeaway and a big lesson for us."

The pair said they carried out the project when they were undergraduates and simply selected StyleGAN for their tool because it provides state-of-art results and that CelebA was commonly used to benchmark super-resolution imaging tasks.

"We don't want to point fingers at anyone; implicit biases are systemic issues," they said. "It's important to be aware of these problems that we weren't initially thinking about. We still have a long way to go as a field of being aware of these things."

The research has kickstarted a heated discussion of whether the issue can simply be fixed by using a model that has been trained on more diverse dataset. Facebook's chief AI scientist Yann LeCun believes so, but others like Timnit Gebru, an expert in algorithmic bias, don't think it's as easy as that. ®

Broader topics


Other stories you might like

  • Meet Wizard Spider, the multimillion-dollar gang behind Conti, Ryuk malware
    Russia-linked crime-as-a-service crew is rich, professional – and investing in R&D

    Analysis Wizard Spider, the Russia-linked crew behind high-profile malware Conti, Ryuk and Trickbot, has grown over the past five years into a multimillion-dollar organization that has built a corporate-like operating model, a year-long study has found.

    In a technical report this week, the folks at Prodaft, which has been tracking the cybercrime gang since 2021, outlined its own findings on Wizard Spider, supplemented by info that leaked about the Conti operation in February after the crooks publicly sided with Russia during the illegal invasion of Ukraine.

    What Prodaft found was a gang sitting on assets worth hundreds of millions of dollars funneled from multiple sophisticated malware variants. Wizard Spider, we're told, runs as a business with a complex network of subgroups and teams that target specific types of software, and has associations with other well-known miscreants, including those behind REvil and Qbot (also known as Qakbot or Pinkslipbot).

    Continue reading
  • Supreme Court urged to halt 'unconstitutional' Texas content-no-moderation law
    Everyone's entitled to a viewpoint but what's your viewpoint on what exactly is and isn't a viewpoint?

    A coalition of advocacy groups on Tuesday asked the US Supreme Court to block Texas' social media law HB 20 after the US Fifth Circuit Court of Appeals last week lifted a preliminary injunction that had kept it from taking effect.

    The Lone Star State law, which forbids large social media platforms from moderating content that's "lawful-but-awful," as advocacy group the Center for Democracy and Technology puts it, was approved last September by Governor Greg Abbott (R). It was immediately challenged in court and the judge hearing the case imposed a preliminary injunction, preventing the legislation from being enforced, on the basis that the trade groups opposing it – NetChoice and CCIA – were likely to prevail.

    But that injunction was lifted on appeal. That case continues to be litigated, but thanks to the Fifth Circuit, HB 20 can be enforced even as its constitutionality remains in dispute, hence the coalition's application [PDF] this month to the Supreme Court.

    Continue reading
  • How these crooks backdoor online shops and siphon victims' credit card info
    FBI and co blow lid off latest PHP tampering scam

    The FBI and its friends have warned businesses of crooks scraping people's credit-card details from tampered payment pages on compromised websites.

    It's an age-old problem: someone breaks into your online store and alters the code so that as your customers enter their info, copies of their data is siphoned to fraudsters to exploit. The Feds this week have detailed one such effort that reared its head lately.

    As early as September 2020, we're told, miscreants compromised at least one American company's vulnerable website from three IP addresses: 80[.]249.207.19, 80[.]82.64.211 and 80[.]249.206.197. The intruders modified the web script TempOrders.php in an attempt to inject malicious code into the checkout.php page.

    Continue reading

Biting the hand that feeds IT © 1998–2022