This article is more than 1 year old

Cerebras sets record for 'largest AI model' on a single chip

Plus: Yandex releases 100-billion-parameter language model for free, and more

In brief US hardware startup Cerebras claims to have trained the largest AI model on a single device powered by the world's largest Wafer Scale Engine 2 chip the size of a plate.

"Using the Cerebras Software Platform (CSoft), our customers can easily train state-of-the-art GPT language models (such as GPT-3 and GPT-J) with up to 20 billion parameters on a single CS-2 system," the company claimed this week. "Running on a single CS-2, these models take minutes to set up and users can quickly move between models with just a few keystrokes."

The CS-2 packs a whopping 850,000 cores, and has 40GB of on-chip memory capable of reaching 20 PB/sec memory bandwidth. The specs on other types of AI accelerators and GPUs pale in comparison, meaning machine learning engineers have to train huge AI models with billions of parameters across more servers.

Even though Cerebras has evidently managed to train the largest model on a single device, it will still struggle to win over big AI customers. The largest neural network systems contain hundreds of billions to trillions of parameters these days. In reality, many more CS-2 systems would be needed to train these models. 

Machine learning engineers will likely run into similar challenges to those they already face when distributing training over numerous machines containing GPUs or TPUs – so why switch over to a less familiar hardware system that does not have as much software support?

Surprise, surprise: Robot trained on internet data was racist, sexist

A robot trained on a flawed dataset scraped from the internet displayed racist and sexist behaviors in an experiment.

Researchers from Johns Hopkins University, Georgia Institute of Technology, and the University of Washington instructed a robot to put blocks in a box. The blocks were pasted with images of human faces. The robot was given instructions to pack the block it believed was a doctor, homemaker, or criminal in a colored box.

The robot was powered by a CLIP-based computer vision model, often used in text-to-image systems. These models are trained to learn the visual mapping of an object to its word description. Given a caption, it can then generate an image matching the sentence. Unfortunately, these models often exhibit the same biases found in their training data. 

For example, the robot was more likely to identify blocks with women's faces as homemakers, or associate Black faces as criminals more than White men. The device also seemed to favor women and those with darker skins less than White and Asian men. Although the research is just an experiment, deploying robots trained on flawed data could have real life consequences.

"In a home maybe the robot is picking up the white doll when a kid asks for the beautiful doll," Vicky Zeng, a graduate student studying computer science at Johns Hopkins said. "Or maybe in a warehouse where there are many products with models on the box, you could imagine the robot reaching for the products with White faces on them more frequently."

Largest open source language model released

Russian internet biz Yandex published the code for a 100-billion-parameter language model this week.

The system, named YaLM, was trained on 1.7TB of text data scraped from the internet and required 800 Nvidia A100 GPUs for compute. Interestingly, the code was published under the Apache 2.0 license meaning the model can be used for research and commercial purposes.

Academics and developers have welcomed efforts to replicate and open source large language models. These systems are challenging to build, and typically only big tech companies have the resources and expertise to develop them. They are often proprietary, and without access they're difficult to study.

"We truly believe global technological progress is possible only through cooperation," a spokesperson from Yandex told The Register. "Big tech companies owe a lot to the open results of researchers. However, in recent years, state-of-the-art NLP technologies, including large language models, have become inaccessible to the scientific community since the resources for training are available only to big tech."

"Researchers and developers all over the world need access to these solutions. Without new research, growth will wane. The only way to avoid this is by sharing best practices with the community. By sharing our language model we are supporting the pace of development of global NLP."

Instagram to use AI to verify users' age

Instagram's parent biz, Meta, is testing new methods to verify its users are 18 and older, including using AI to analyze photos.

Research and anecdotal evidence has shown that social media use can be harmful to children and young teenagers. Users on Instagram provide their date of birth to confirm they're old enough to be using the app. You have to be at least 13, and there are more restrictions in place for those under 18.

Now, its parent company Meta is trying three different ways to verify someone is over 18 if they change their date of birth. 

"If someone attempts to edit their date of birth on Instagram from under the age of 18 to 18 or over, we'll require them to verify their age using one of three options: upload their ID, record a video selfie or ask mutual friends to verify their age," the company announced this week.

Meta said it had partnered with Yoti, a digital identity platform, to analyze people's ages. Images from video selfie will be scrutinized by Yoti's software to predict someone's age. Meta said Yoti uses a "dataset on anonymous images of diverse people from around the world".

GPT-4chan was a bad idea, say researchers

Hundreds of academics have signed a letter condemning GPT-4chan, the AI language model trained on over 130 million posts on the infamous toxic internet message board 4chan.

"Large language models, and more generally foundation models, are powerful technologies that carry a potential risk of significant harm," the letter, spearheaded by two professors at Stanford University, began. "Unfortunately, we, the AI community, currently lack community norms around their responsible development and deployment. Nonetheless, it is essential for members of the AI community to condemn clearly irresponsible practices."

These types of systems are trained on vast amounts of text, and learn to mimic the data. Feed GPT-4chan what looks like a conversation between netizens, and it'll carry on adding more fake gossip to the mix. 4chan is notorious for having relaxed content moderation rules – users are anonymous and can post anything as long as it's not illegal. GPT-4chan, unsurprisingly, also started spewing text with similar levels of toxicity and content. When it was set loose on 4chan, some users weren't sure whether it was a bot or not.

Now, experts have slammed its creator, YouTuber Yannic Kilcher, for deploying the model irresponsibly. "It is possible to imagine a reasonable case for training a language model on toxic speech – for example, to detect and understand toxicity on the internet, or for general analysis. However, Kilcher's decision to deploy this bot does not meet any test of reasonableness. His actions deserve censure. He undermines the responsible practice of AI science," the letter concluded. ®

More about


Send us news

Other stories you might like