There’s a new giant AI language model in town: enter Microsoft’s Turing-NLG system, which apparently contains a whopping 17 billion parameters, making it the largest publicly known model of its class yet.
Turing-NLG joins the growing list of massive text-generating machine-learning systems. Google’s BERT model contains 340 million parameters, OpenAI’s GPT-2 tops out 1.5 billion parameters, and Nvidia’s Megatron-LM packs 8.3 billion parameters.
All of them are smaller than Turing-NLG, Redmond claimed on Monday. Like its rivals, you give Turing-NLG a writing prompt, and it uses this to generate what it predicts are human-like follow-on sentences.
“Microsoft is introducing Turing Natural Language Generation (T-NLG), the largest model ever published at 17 billion parameters, which outperforms the state of the art on a variety of language modeling benchmarks and also excels when applied to numerous practical tasks, including summarization and question answering,” Microsoft claimed.
Like its predecessors, the 17-billion parameter model is built out of transformers, an AI architecture that processes incoming text and output words in a parallel fashion that takes context into account. If the previous sentence or so was about the country France, this context is fed forward through the processing chain to that subsequent sentences are related to the Euro nation.
Meaningful or human-like text is tricky for machines to generate because there has to be an appreciation of context: sentences that flit wildly between subjects come across as nonsensical, so there has to be some kind of train of thought, no matter how artificial or vacuous it is.
Google says its latest chatbot is the most human-like ever – trained on our species' best works: 341GB of social mediaREAD MORE
Previous serial-like language models processed data in order, like an assembly line, and had to work hard to maintain a coherent theme in the output. Transformer-based models integrate context more directly in their training, and thus produce relatively high-quality prose, generally speaking.
Microsoft has been coy with regard to the technical details of Turing-NLG. When asked if researchers were planning to publish a paper outlining T-NLG, a spokesperson told The Register that "no additional papers [beyond the announcements] are anticipated at this time."
Researchers used “a Nvidia DGX-2 hardware setup,” and split the model across four V100 GPUs. A total of 256 V100s were required to train the hefty model on 174GB of internet-scraped text, we're told. For comparison, Nvidia’s Megatron-LM model spun up 512 V100 GPUs to churn its way through the same amount of training data.
T-NLG has allegedly achieved top scores for common benchmarks that test the ability for language models to generate coherent and complex sentences, answer questions, summarize text, apparently. It’s difficult to verify the numbers since Microsoft hasn't outlined the methodology used to test its system.
It has also withheld the model, too. A “private demo” has been given to a small number of academics for testing and feedback purposes. It’s possible that Redmond won’t release the model at all, but it hinted that the state of the art results provided “new opportunities for Microsoft and our customers. "We don’t have any details to share about future plans,” a spokesperson told us.
T-NLG can be used to improve Microsoft Office to read and summarize text documents, or in Outlook to assist users in writing emails, or maybe even in Cortana, Microsoft’s voice assistant chatbot. Who knows, maybe even Clippy, Microsoft's beloved virtual talking paper clip might make a comeback with AI superpowers or something. ®