AI + ML

This article is more than 1 year old

Supercomputer to train 176-billion-parameter open-source AI language model

BigScience is a collaborative effort by developers volunteering to make ML research more accessible

Fri 25 Mar 2022 // 07:34 UTC

GTC BigScience – a team made up of roughly a thousand developers around the world – has started training its 176-billion-parameter open-source AI language model in a bid to advance research into natural language processing (NLP).

The transformer architecture makes it easier to train large neural networks more efficiently. Powered by the novel self-attention mechanism, it's able to ingest large amounts of data in one go without having to break it down into smaller chunks first.

Transformers are particularly useful in NLP. Instead of analyzing individual words in a sentence they can process all of the words in the sentence at the same time, making them superior at modelling relationships over larger ranges. They're better at tasks like text summarization or text generation compared to older architectures like recurrent neural networks or long short-term memory networks.

These models have steadily increased in size and complexity, increasing from tens of millions of parameters to hundreds of billions of parameters between 2018 and 2021. OpenAI's GPT-3, for example, has 175 billion, and the Microsoft-Nvidia Megatron-Turing model has 530 billion.

"We keep having bigger and bigger large language models, which is very interesting to observe, but it's also slightly worrying when you consider that there are only very few places in the world that have the kinds of resources to facilitate training such large language models," Douwe Kiela, head of research at Hugging Face, a company leading the BigScience effort, said during a talk presented at this year's GPU Technology Conference hosted by Nvidia.

BigScience is an open project and nearly a thousand developers have volunteered to help create and maintain the large datasets required to train language models. There are numerous groups focused on everything from building the 176-billion-parameter system to studying its social impacts. All the data and source code will be made available, making it easier for researchers to get under the hood to figure out how the technology works and its limitations.

The project's previous and latest open-source work can be found here on GitHub.

Large language models developed by private companies – like OpenAI, Google, or Microsoft – are proprietary, making them difficult to probe. They all exhibit the same problematic behaviors, generating toxic speech, bias, and misinformation. But researchers can't understand these issues or fix them without access to the model and its training dataset, hence this open-science effort to create and share a large model.

"If we care about democratizing research progress as a community, and if we want to make sure that the whole wide world can make use of this technology, then we have to find a solution for that. And that is exactly what big science is trying to be," Kiela said. BigScience will be trained on data from 46 different languages.

Backed by France's state-funded HPC company GENCI and its national supercomputer center IDRIS, the BigScience language model will be trained on the Jean Zay supercomputer. Its peak performance is over 28 petaFLOPS, and it contains multiple Nvidia V100 and A100 GPUs.

The training process is expected to take roughly three to four months, Kiela said. "The main sort of side effect of this large effort is that it fosters a lot of discussion around the more pertinent research questions that we should not be afraid to ask as a scientific community.

"What are the capabilities and limitations of these models? How can we overcome biases and artifacts? What are the ethical considerations that we need to factor in what about the environment? And is this really something we need to be much more careful with when we train these models? What is the general role of these models in society? These sorts of important questions aren't often not publicly discussed. And definitely not discussed by the large industrial companies that are building these large language models," he said. ®

Topics

Special Features

Vendor Voice

Resources

AI + ML

Supercomputer to train 176-billion-parameter open-source AI language model

BigScience is a collaborative effort by developers volunteering to make ML research more accessible

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Google Cloud chief is really psyched about this AI thing

AI spam is winning the battle against search engine quality

Arm flexes silicon muscles to push generative AI at the edge

Industrial systems integrating digitalisation

Developers are calling the shots on AI planning, judging by your experience

What's up with AI lately? Let's start with soaring costs, public anger, regulations...

Why making pretend people with AGI is a waste of energy

Intel CEO suggests AI can help to create a one-person Unicorn

Microsoft puts ex-DeepMind boffin in charge of London AI hub

US House mulls forcing AI makers to reveal use of copyrighted training data

Hailo's latest AI chip shows up integrated NPUs and sips power like fine wine

Tech titans assemble to decide which jobs AI should cut first

About Us

Our Websites

Your Privacy