Ex-OpenAI staff launch new chatbot – yup, it's Anthropic with Claude 2.1
Half as many hallucinations, startup claims, and it admits when it's wrong
Anthropic has launched Claude 2.1, the latest version of its large language model. We're told it can process more text and generate responses that are more accurate than previous iterations, and it can interact with developer-defined APIs allowing it to be integrated with users' tech stacks.
On Tuesday the startup – formed with a focus on ML safety and reliability by people who left OpenAI in 2019 – said the Claude 2.1 model doubles up on capabilities and is now powering its web-based AI chatbot app, and is available for developer and enterprise use. Claude works like OpenAI's ChatGPT and its APIs: you give it prompts and requests in natural language, hold a conversation with it, and it'll attempt to produce answers.
"Claude 2.1 delivers advancements in key capabilities for enterprises—including an industry-leading 200K token context window, significant reductions in rates of model hallucination, system prompts and our new beta feature: tool use," the biz said in its release notes.
The token context window dictates the amount of text a user can include in their input prompt. Compared to its predecessor Claude 2, the latest model can handle double the amount of tokens, which the upstart claims is an "industry first." Chunks of words are split into tokens, and a 200K token context window is equivalent to about 150,000 words, or over 500 pages of text.
Increasing the token context window means that Claude 2.1 can complete larger natural language tasks, such as summarization, question and answering, or translation on longer and more complex documents. Processing that much text, however, will take the chatbot some minutes to respond.
Another property that is perhaps more useful is the model's ability to generate responses that are more truthful. Claude 2.1 hallucinates – makes stuff up – at 2x lower rate than the old version, Anthropic claims. It is also more likely to admit it doesn't know the correct answer to a query, rather than fabricating an answer like some other systems it could mention.
In experiments, when given an incorrect fact, such as: "The fifth most populous city in Bolivia is Montero," the model is more likely to reply with something like: "I'm not sure what the fifth most populous city in Bolivia is," for example.
For what it's worth, other bots can do the same: Google Bard, for instance, can double-check its answers against search results, and highlight confirmed facts and questionable assertions.
"Claude 2.1 demonstrated a 30 percent reduction in incorrect answers and a 3-4x lower rate of mistakenly concluding a document supports a particular claim," Team Anthropic said.
The San Francisco outfit's latest large language model can also interact with user-defined APIs and tools to carry out simple actions. Here is a list of things it can do, or so we're told:
- Using a calculator app for complex numerical reasoning
- Translating natural language requests into structured API calls
- Answering questions by searching databases or using a web search API
- Taking simple actions in software via private APIs
- Connecting to product datasets to make recommendations and help users complete purchases
Users can thus prompt Claude to perform a specific task, like retrieving information from private knowledge bases or be integrated with APIs.
It also supports system prompts, a common feature among chatbots that allows developers to preface user prompts with specific context, such as telling the model to adopt a particular persona or generate responses in a structured and consistent way.
For example, let's say you want to build a chatbot into your website so that it answers queries from programmers about some database software you offer. It would be wise to set a system prompt to be something kinda like: "You are a friendly, upbeat, but not too informal or intimate, robot librarian that wishes to help developers look up information about the database we sell. You should answer the following query with a link to the relevant documentation."
That system prompt is concatenated with the user's request, processed by the model, and the result returned to the user. Defining the system prompt saves you having to do that concatenation yourself. When you see people trying to make LLMs do bad stuff, they are typically trying to override that system prompt.
Users can expect to pay [PDF] $8 per million tokens processed in their input prompts, and $24 per million tokens generated in the model's output.
- OpenAI meltdown: How could Microsoft have let this happen after betting so many billions?
- OpenAI tackles 'major outage' hitting ChatGPT APIs
- Bad Vibrations: Music publishers sue Anthropic AI for using copyrighted lyrics
- Amazon to sink $4B into AI dev Anthropic, become its cloud provider
It's a good time for Anthropic to launch Claude 2.1, especially since its rival OpenAI had to temporarily pause new signups for its ChatGPT Plus subscriptions due to a lack of compute power to support higher usage. Not to mention that OpenAI is also currently facing an internal crisis following the shock firing of its CEO Sam Altman.
OpenAI meltdown: Where does this leave the upstart, Microsoft, and you?READ MORE
OpenAI's future is uncertain. Altman appears to want his old job back, despite the offer to lead a new AI research team at Microsoft, and is also considering starting a new company too. Meanwhile, the majority of its employees have threatened to resign unless the current board quits and Altman is reinstated as leader.
Tech companies are now taking advantage of the situation with many trying to woo talent and customers away from OpenAI and, like Anthropic today, promoting competing systems.
Anthropic's co-founders include CEO Dario Amodei, a former veep of research at OpenAI; Daniela Amodei, once VP of safety and policy at OpenAI; Tom Brown, the lead GPT-3 engineer at OpenAI; and Jack Clark, formerly policy director at OpenAI (plus ex-Bloomberg and Register.) It's taken billions of dollars in funding and support from Google, Amazon, and others. ®