Anthropic unlocks Claude 3, claims it's better than ChatGPT and Gemini
Boasts of 'near-human levels of comprehension and fluency'
AI startup Anthropic has released Claude 3, the latest iteration of its large language model, which it claims is more powerful than OpenAI's GPT-4.
Announced on Monday, Claude 3 comes in three different sizes: Opus, Sonnet, and Haiku [badly formatted PDF]. Opus is the most powerful of the three and is available to developers and users via Anthropic's API and Claude Pro subscription. Sonnet can be accessed by developers through an API and currently powers Anthropic's free web chatbot. The smallest model, Haiku, isn't available just yet.
In academic benchmark tests – assessing LLMs' ability to retain common knowledge, solve math problems, generate code, and show reasoning skills – Opus scored higher than OpenAI's GPT-4 and Google’s Gemini Ultra, Anthropic reports. The developer went so far as to boast that Opus "exhibits near-human levels of comprehension and fluency on complex tasks, leading the frontier of general intelligence."
Meanwhile, Sonnet and Haiku are more powerful than OpenAI's previous GPT-3.5 model, but less capable than Google's Gemini Ultra and Pro models.
Anthropic explained that the context window – the amount of input it can process at once – will be 200K tokens at first but is capable of going up to a million tokens.
Opus is pricey, and designed for users looking to use AI for tasks that require top levels of data comprehension and generation – like scientific research or analyzing long, complex reports. It costs $15 to process an input prompt stretching to a million tokens, and $75 to generate a million tokens for output. By way of comparison, OpenAI charges between $10 and $30 for processing and generating a million tokens on its GPT-4 Turbo model.
Sonnet is aimed at mainstream enterprise users that need a capable yet fast model that can do things like search and retrieve information, write marketing copy, or generate code. It has been optimized for large-scale deployments and costs $3 and $15 to handle a million tokens at input and output, respectively. Haiku will be even cheaper, costing $0.25, and $1.25 to process and generate a million tokens. It should be useful for things like content moderation, language translation, or customer service.
- More and more LLMs in biz products, but who'll take responsibility for their output?
- Top LLMs struggle to make accurate legal arguments
- FTC drills into Amazon, Microsoft, Google over billions pledged to OpenAI, Anthropic
- How 'sleeper agent' AI assistants can sabotage your code without you realizing
Amazon announced it will host Anthropic's Claude 3 models on its Bedrock cloud platform: Sonnet today, and Opus and Haiku sometime soon. It's a similar story for Google Cloud's Vertex AI Model Garden: Sonnet is available today in private preview, with API access to all three models arriving soon.
Claude 3 is also less cautious than its predecessor. Claude 2.1 would often refuse to comply with prompts that weren't necessarily harmful – like requests to write a fictional story. The developer's announcement assured users: "We've made meaningful progress in this area: Opus, Sonnet, and Haiku are significantly less likely to refuse to answer prompts that border on the system's guardrails than previous generations of models."
Large language models' surprise emergent behavior written off as 'a mirage'
READ MOREThe biggest issue that plagues LLMs, however, is their tendency to generate inaccurate information or straight-up make things up with such confidence that users may well believe it. The errors – referred to as hallucinations – make it difficult to trust the output of AI software let alone give computers more autonomy in tasks.
Anthropic promised Opus offers a "twofold improvement" in accuracy compared to Claude 2.1, and will introduce a feature that will cite sources in the outputs generated by its latest models for users to inspect. That's similar to say, Google Gemini, which also says where it got its info from in some of its answers to prompts.
"We do not believe that model intelligence is anywhere near its limits, and we plan to release frequent updates to the Claude 3 model family over the next few months. We're also excited to release a series of features to enhance our models' capabilities, particularly for enterprise use cases and large-scale deployments," Anthropic's announcement concluded.
Interestingly, Anthropic has chosen to not make Claude 3 a multi-modal system. Although it can process images, it cannot produce them and cannot handle audio or video inputs, unlike ChatGPT or Gemini. ®
Don't miss The Next Platform's take on Claude, a salvo in the ongoing AI war.