Twitter says its AI that automatically crops images in tweets didn't exhibit racial or gender bias when it was developed – even though in production it may prefer to crop out dark-skinned people and focus on women's chests. The social network acknowledged it has more work to do to address concerns.
"Our team did test for bias before shipping the model and did not find evidence of racial or gender bias in our testing," a Twitter spokesperson said in a statement to The Register and posted on social media service. "But it’s clear from these examples that we’ve got more analysis to do. We'll continue to share what we learn, what actions we take, and will open source our analysis so others can review and replicate."
Over the weekend, various Twitter users posited that the code Twitter uses to select the viewable portion of images displayed in Twitter feeds is biased against people with dark skin. This was based on a few publicly cited examples of Twitter's photo framing using images of both light-skinned and dark-skinned people that skewed toward the light-skinned person. Essentially, when you tweet a photo, how it's viewed depends on the device you're using: if you're looking on a smartphone or a desktop PC, Twitter selects a crop that fits the screen you're using, for example. What tends to happen it seems is that, given a choice, Twitter crops the image so that a lighter skinned person is shown in view, or a female chest (really NSFW).
Twitter software engineers wrote about the web biz's image cropping algorithm in 2018, which was designed to zero in on "saliency" – the calculated importance of various image features – rather than faces, the previous focus for Twitter's image framing.
The claims of racism also elicited a reminder from Anima Anandkumar, professor of computer science at Caltech and director of Nvidia's machine-learning research group, that gender bias continues to be a problem. Last year, she pointed out image cropping on Twitter and other platforms like Google News often focuses on women's torsos rather than their heads.
Issues of this sort turn out to be rather common. For example, Zoom's algorithm for replacing video backgrounds doesn't do a very good job of detecting the outlines of dark-complected faces.
Turns out @zoom_us has a crappy face-detection algorithm that erases black faces...and determines that a nice pale globe in the background must be a better face than what should be obvious.— Colin Madland (@colinmadland) September 19, 2020
Concerns about bias or unfair results in AI systems have come to the fore in recent years as the technology has infiltrated hiring, insurance, law enforcement, advertising, and other aspects of society. Prejudiced code may be a source of indignation on social media but it affects people's access to opportunities and resources in the real world. It's something that needs to be dealt with on a national and international level.
A variety of factors go into making insufficiently neutral systems, such as unrepresentative training data, lack of testing on diverse subjects at scale, lack of diversity among research teams, and so on. But among those who developed Twitter's cropping algorithm, several expressed frustration about the assumptions being made about their work.
Ferenc Huszár, former Twitter employee, one of the co-authors of Twitter's image pruning research, and now a senior lecturer on machine-learning at University of Cambridge, acknowledged there's reason to look into the results people have been reporting though cautioned against jumping to conclusions about negligence or lack of oversight.
Some of the outrage was based on a small number of reported failure cases. While these failures look very bad, there's work to be done to determine the degree to which they are associated w/ race or gender. Sampling bias and confirmation bias can lead to premature conclusions— Ferenc Huszár🇪🇺 (@fhuszar) September 21, 2020
And Zehan Wang, engineering lead at Twitter, noted that bias research was conducted on the image-cropping algorithm back in 2017.
"We purposefully constructed pairs of images of faces from different ethnic backgrounds as well as gender and ran them through the saliency detection model, checking for differences in saliency scores," he wrote via Twitter. "No significant bias found."
Meanwhile, Vinay Prabhu, chief scientist at UnifyID and a Carnegie Mellon PhD, ran a cropping bias test on a set of 92 images of White and Black faces and found a 40:52 White-to-Black ratio, which argues against bias for that particular set.
(Results update)— Vinay Prabhu (@vinayprabhu) September 20, 2020
White-to-Black ratio: 40:52 (92 images)
Code used: https://t.co/qkd9WpTxbK
Final annotation: https://t.co/OviLl80Eye
(I've created @cropping_bias to run the complete the experiment. Waiting for @Twitter to approve Dev credentials) pic.twitter.com/qN0APvUY5f
A different set of images with different characteristics, however, may not lead to the same results.
"This is a universal problem with testing AI," said Anandkumar in a Twitter DM conversation with The Register. "There are so many edge cases (long tail) in the real world. Current deep learning methods make it impossible to easily discover those during testing because of its black-box nature. There is both bias in data and deep-learning methods tend to amplify and obfuscate the problem."
Anandkumar expressed skepticism about how thoroughly Twitter tests its AI models. "The question is who are the test subjects?" she said. "If they are predominantly straight white men who are ogling at women's chests and preferring to look at white skin, we have a huge problem that their gaze becomes universal. We are now all co-opting their gaze at a universal scale."
Twitter's chief design officer Dantley Davis admitted, "This is 100 per cent our fault. No one should say otherwise. Now the next step is fixing it."
Or at least fixing "the perception of bias." ®