This article is more than 1 year old

Startups competing with OpenAI's GPT-3 all need to solve the same problems

Today we walk you through the fascinating world of upcoming text-generating rivals

Analysis Text-generating language models are difficult to control. These systems have no sense of morality: they can spew hate speech and misinformation. Despite this, numerous companies believe this kind of software is good enough to sell.

OpenAI launched its powerful GPT-3 to the masses in 2020; it also has an exclusive licensing deal with Microsoft. The upshot of this is that developers no longer have to be machine-learning gurus to create products that feature natural language processing. All the hard work of building, training, and running a massive neural network has been done for them, and is neatly packaged behind the GPT-3 API.

Last year, two startups released their own proprietary text-generation APIs. AI21 Labs, based in Israel, launched its 178-billion-parameter Jurassic-1 in August 2021, and Cohere, headquartered in Canada, released a range of models nicknamed small, medium, and large, three months later.

Now, Cohere has an extremely large-sized system, which is right now only available to beta testers. Cohere hasn't disclosed how many parameters its models contain. For comparison, OpenAI's GPT-3 has 175 billion parameters.

Aidan Gomez, co-founder and CEO of Cohere, said he toyed with the idea of launching a generative language model startup before GPT-3 was announced. He was part of the team at Google Brain, which came up with the transformer-based architecture at the heart of these systems. Gomez argued there are benefits to having a few centralized, powerful text-generation systems as opposed to a sprawl of individual deployments.

"We really shouldn't have a world where every single company is training their own GPT-3, it would be massively environmentally costly, compute costly, and we should be trying to share resources as much as possible," Gomez told The Register.

"I saw the opportunity for an independent player to come out and to basically centralize the cost of pre-training these massive models and then open up access and amortize those costs across a huge number of users. By reducing the cost you make it accessible to more people."

Competing against OpenAI isn't easy

Starting a language model company that can compete with the likes of OpenAI is a tall order because the barrier to entry is so high. New ventures must come armed with deep pockets to pay for the huge amount of computational resources required to train and run these models, and hire experts in cutting-edge research and machine-learning engineering.

Cohere raised $40m in its series-A funding round, and just announced $125m in series-B funding this month, while AI21 Labs has collected $54.5m over four rounds of funding.

Each startup has partnered with a different company to provide cloud computing. Cohere has entered a multi-year contract with Google. OpenAI and AI21 Labs are supported by Microsoft and AWS, respectively.

"Training these large models is always expensive," Yoav Shoham, co-CEO of AI21 Labs and a retired Stanford computer-science professor, told The Register. "If you're not smart enough, you can easily run into tens of millions of dollars if you're not careful. You need to make sure that you know unit economics so that you don't lose money on every customer and only make it up in volume."

AI21 Labs and Cohere are also choosy about the customers they onboard. The tendency for language models to produce text that may be offensive or false makes the technology risky to deploy, and clients need to understand and be able to handle the dangers.

Alongside OpenAI, both upstarts have strict usage guidelines and terms of service rules to control what can and cannot be built using their APIs. For example, they all forbid applications that could mislead people into believing they're communicating with a human being rather than a machine. 

Safety first

Enforcing these rules is a balancing act. If these API providers are too restrictive on what can and can't be done with their technology, they could drive customers away and lose out on business. If they are too lax, the software could generate undesirable text or conversations, triggering a PR disaster, lawsuits, and so on.

One of OpenAI's early flagship customers Latitude – which built AI Dungeon, a popular online adventure text game – announced it had switched over to AI21 Labs after the developer was required by OpenAI to implement a content filter to catch and stop NSFW language.

"We've been working on this for several weeks so that we could remove dependence on OpenAI for AI Dungeon users so that users would be minimally impacted by OpenAI's new content policy, which we are required to implement," Latitude said in December.

OpenAI's new policy required the games maker to roll out a content filter to screen players' adventures for risque narratives. But the filter went awry. Benign text such as "four watermelons" would be blocked and derail people's games. Earlier this year, Latitude said it was going to stop offering its GPT-3-based model altogether, claiming the protection measures OpenAI insisted were put in place were ruining the gameplay.

"Most users can't have a good experience with the new filters," Latitude said.

AI21 Labs has developed a toxicity filter, Shoham told us. The tool is used internally and will soon be offered to customers via its API. "We have a dedicated team to look at issues of quality, safety or ethics or bias, all the ways in which some people worry that AI could go wrong," he said.

Safety is an issue all language model businesses have to deal with, and it'll be interesting to see if startups enforce a strong set of rules and controls, despite financial incentives to lower the bar and bring on more customers. 

"I think we're competitors but we're all in the same boat," Shoham said. "We know safety is an important issue and we take it seriously." Gomez agreed, and said he was open to the idea of sharing some of Cohere's IP if it specifically improved safety and would encourage more companies to adopt the new measures.

Can we trust language models?

At the moment, Cohere and AI21 Labs broadly offer more or less the same features and capabilities as OpenAI.

On top of text generation, Cohere and OpenAI's models can perform tasks such as search and classification. Cohere supports embeddings, a technique that maps similar words or concepts together making it easier for users to implement sentiment analysis or build recommendation systems.

OpenAI followed suit and added similar capabilities to its GPT-3-based models last month. The models' performances are all pretty comparable since they were all trained on similar data scraped from the internet. Cohere and AI21 Labs also fed their models Wikipedia entries, books, and portions of the Common Crawl dataset used to teach OpenAI's GPT-3.

Cohere and AI21 Labs will have to differentiate their models somehow to win over customers. "For us, our product focus is on expanding the number of people who can build with this stuff. That's where we see our leverage," Cohere's Gomez told us.

"In order to do that we need to give those people the best possible models, so we invest a lot in research on making them more useful. There's three directions that I see: safety, efficiency, and quality."

AI21 Labs is trying to figure out how to give machines reasoning skills. Shoham said his team at AI21 is trying to develop fresh system architectures by combining older symbolic AI systems with modern neural networks.

"Current models are dumb as nails," he said. "Ask a language model how many teeth does a human have and it'll say 32. Now, that's right and very nice. But ask it how many teeth does a math teacher have and it'll say 47."

The lack of common sense and an ability to be accurate doesn't just make language models risky, they hamper technological innovation, too. They're not appropriate in some cases, such as generating or summarizing medical or legal advice, or educational materials.

Transformative effect

OpenAI's GPT-3 API transformed Ryan Doyle's career. As a former sales representative and self-taught developer, he built Magic Sales Bot, an application that used GPT-3 to help users write better sales pitches in their emails. Last year, Doyle told us that around 2,000 users had signed up to use his program.

But Doyle stopped using it, he told us earlier this month, due to the model's tendency to just make up information: "GPT-3 presented a huge opportunity to apply AI to ideas I've always wanted to try, like creating sales emails. As the idea took shape, the reality showed that GPT-3 had quite a distance to go [before it could be] used in business writing. I ultimately had to pull it to move my business forward, but I intend on revisiting and integrating it when the tech improves."

Cohere and AI21 Labs' models must tackle these same problems. As competition heats up, the focus is on making these systems smarter and more trustworthy. How to keep them from generating potentially misleading and false information is still an open problem. Demonstrably, people can be duped by fake computer-generated speeches.

There are other up-and-coming startups looking to solve the same issues. Anthropic, the AI safety and research company started by a group of ex-OpenAI employees, hinted it might work on large commercial systems in the future. Several researchers have left Google Brain to join two new ventures started by their colleagues, according to people familiar with the matter. One outfit is named Character, and the other Persimmon Labs.

Startups arriving late to the party face an uphill battle the longer they take to launch their services. Existing companies will continue to push new features, and they risk falling behind. Potential customers won't be too impressed if they just offer the same capabilities in current APIs.

They could tailor their language models to specialize in a narrow domain to carve a niche in the market, or demonstrate their software can solve new types of language tasks that weren't possible before. The best way to succeed, however, is to show their systems can generate text that's less biased, toxic, and more accurate. ®

More about

TIP US OFF

Send us news


Other stories you might like