This article is more than 1 year old
Meta releases code for massive language model to AI researchers
Now they can experiment with the algorithms even if they don't have hundreds of GPUs
Meta will release a giant language model to academics, in hope that better understanding of how these systems work can make them less toxic and biased.
The Open Pretrained Transformer (OPT-175B) has 175 billion parameters, matching commercial language models like OpenAI's GPT-3. These types of systems have introduced capabilities for developers to build upon like automated copywriting, content moderation, or even coding. But they can generate text that's biased, toxic, and inaccurate, making them risky to use.
As Meta knows only too well from some of the human-generated texts it struggles to manage.
Proprietary tools are often out of reach for academic researchers who want to investigate the technology's issues - both in terms of access to a model's underlying code and possessing resources to build and train their own language models. Meta's latest code release, however, can help them study these systems in more detail.
"We are sharing Open Pretrained Transformer, a language model with 175 billion parameters trained on publicly available data sets, to allow for more community engagement in understanding this foundational new technology," researchers at the social media biz said on Tuesday. "For the first time for a language technology system of this size, the release includes both the pretrained models and the code needed to train and use them."
- Meet Flamingo, Deepmind's latest open-ended AI
- HPE uses blockchain for distributed machine learning models
- Google Docs' AI-powered inclusive writing auto-correct now under fire
- Amazon opens MASSIVE AI speech dataset so Alexa can speak your language
Meta has also released subsets of the full model – up to 66 billion parameters – for anyone to use. The complete and largest OPT-175 system, however, is only available to researchers on request for noncommercial applications. It was trained using 992 Nvidia 80GB A100 GPUs, reaching a performance of 147 TFLOPS per chip. Future researchers won't need to build the model and train it from scratch, because Meta is providing them with the code to deploy it on 16 Nvidia V100 GPUs.
Training such large models is tricky. Meta's team of researchers said they experienced numerous failures, and had to restart the whole process 35 times over a two month period, according to a paper [PDF] on arXiv.
A Meta spokesperson told The Register releasing OPT-175 will help academics reproduce results from large language model (LLM) papers.
"It is important to improve transparency and openness around large-scale research so that the future we build with this technology is more equitable and fair. The future of LLM work cannot solely live in the hands of those with financial interests in keeping this research behind closed doors," the spokesperson stated.
The code for Meta's smaller pre-trained models can be found here. If you're an academic and want the full version you can request it by completing this form. ®