Google says it would release its photorealistic DALL-E 2 rival – but this AI is too prejudiced for you to use

It has this weird habit of drawing stereotyped White people, team admit


DALL·E 2 may have to cede its throne as the most impressive image-generating AI to Google, which has revealed its own text-to-image model called Imagen.

Like OpenAI's DALL·E 2, Google's system outputs images of stuff based on written prompts from users. Ask it for a vulture flying off with a laptop in its claws and you'll perhaps get just that, all generated on the fly.

A quick glance at Imagen's website shows off some of the pictures it's created (and Google has carefully curated), such as a blue jay perched on a pile of macarons, a robot couple enjoying wine in front of the Eiffel Tower, or Imagen's own name sprouting from a book. According to the team, "human raters exceedingly prefer Imagen over all other models in both image-text alignment and image fidelity," but they would say that, wouldn't they.

Imagen comes from Google Research's Brain Team, who claim the AI achieved an unprecedented level of photorealism thanks to a combination of transformer and image diffusion models. When tested against similar models, such as DALL·E 2 and VQ-GAN+CLIP, the team said Imagen blew the lot out of the water. DrawBench, a list of 200 prompts used to benchmark the models, was built in-house.

A series of images generated by Imagen, with text prompts

Imagen's work, with prompts ... Source: Google

Imagen's designers say that their key breakthrough was in the training stage of their model. Their work, the team said, shows how effective large, frozen pre-trained language models can be as text encoders. Scaling that language model, they found, had far more impact on performance than scaling Imagen's other components. 

"Our observation … encourages future research directions on exploring even bigger language models as text encoders," the team wrote.

Unfortunately for those hoping to take a crack at Imagen, the team that created it said it isn't releasing its code nor a public demo, for several reasons. 

For example, Imagen isn't good at generating human faces. In experiments with pictures including human faces, Imagen only received a 39.2 percent preference from human raters over reference images. When human faces were removed, that number jumped to 43.9 percent.

Unfortunately, Google didn't provide any Imagen-generated human pictures, so it's impossible to tell how they compare to those generated by platforms like This Person Does Not Exist, which uses a general adversarial network to generate faces.

Aside from technical concerns, and more importantly, Imagen's creators found that it's a bit racist and sexist even though they tried to prevent such biases.

Imagen showed "an overall bias towards generating images of people with lighter skin tones and …  portraying different professions to align with Western gender stereotypes," the team wrote. Eliminating humans didn't help much, either: "Imagen encodes a range of social and cultural biases when generating images of activities, events and objects." 

Like similar AIs, Imagen was trained on image-text pairs scraped from the internet into publicly available datasets like COCO and LAION-400M. The Imagen team said it filtered a subset of the data to remove noise and offensive content, though an audit of the LAION dataset "uncovered a wide range of inappropriate content including pornographic imagery, racist slurs, and harmful social stereotypes."

Bias in machine learning is a well-known issue: Twitter's image cropping and Google's computer vision are just a couple that have been singled out for playing into stereotypes that are coded into the data we produce. 

"There are a multitude of data challenges that must be addressed before text-to-image models like Imagen can be safely integrated into user-facing applications … We strongly caution against the use of text-to-image generation methods for any user-facing tools without close care and attention to the contents of the training dataset," Imagen's creators said. ®


Other stories you might like

  • Microsoft promises to tighten access to AI it now deems too risky for some devs
    Deep-fake voices, face recognition, emotion, age and gender prediction ... A toolbox of theoretical tech tyranny

    Microsoft has pledged to clamp down on access to AI tools designed to predict emotions, gender, and age from images, and will restrict the usage of its facial recognition and generative audio models in Azure.

    The Windows giant made the promise on Tuesday while also sharing its so-called Responsible AI Standard, a document [PDF] in which the US corporation vowed to minimize any harm inflicted by its machine-learning software. This pledge included assurances that the biz will assess the impact of its technologies, document models' data and capabilities, and enforce stricter use guidelines.

    This is needed because – and let's just check the notes here – there are apparently not enough laws yet regulating machine-learning technology use. Thus, in the absence of this legislation, Microsoft will just have to force itself to do the right thing.

    Continue reading
  • Is computer vision the cure for school shootings? Likely not
    Gun-detecting AI outfits want to help while root causes need tackling

    Comment More than 250 mass shootings have occurred in the US so far this year, and AI advocates think they have the solution. Not gun control, but better tech, unsurprisingly.

    Machine-learning biz Kogniz announced on Tuesday it was adding a ready-to-deploy gun detection model to its computer-vision platform. The system, we're told, can detect guns seen by security cameras and send notifications to those at risk, notifying police, locking down buildings, and performing other security tasks. 

    In addition to spotting firearms, Kogniz uses its other computer-vision modules to notice unusual behavior, such as children sprinting down hallways or someone climbing in through a window, which could indicate an active shooter.

    Continue reading
  • Cerebras sets record for 'largest AI model' on a single chip
    Plus: Yandex releases 100-billion-parameter language model for free, and more

    In brief US hardware startup Cerebras claims to have trained the largest AI model on a single device powered by the world's largest Wafer Scale Engine 2 chip the size of a plate.

    "Using the Cerebras Software Platform (CSoft), our customers can easily train state-of-the-art GPT language models (such as GPT-3 and GPT-J) with up to 20 billion parameters on a single CS-2 system," the company claimed this week. "Running on a single CS-2, these models take minutes to set up and users can quickly move between models with just a few keystrokes."

    The CS-2 packs a whopping 850,000 cores, and has 40GB of on-chip memory capable of reaching 20 PB/sec memory bandwidth. The specs on other types of AI accelerators and GPUs pale in comparison, meaning machine learning engineers have to train huge AI models with billions of parameters across more servers.

    Continue reading

Biting the hand that feeds IT © 1998–2022