Google launches Gemini AI systems, claims it's beating OpenAI and others - mostly
Gemini accepts text, images, audio, and video and comes in three flavors
Google has unveiled Gemini, its most powerful class of transformer-based models yet, which are capable of processing text, images, audio, and video.
Gemini is a multimodal model with a 32k context window that can take different types of data as input and generate images and text as output, and comes in three different sizes. The largest, Gemini Ultra, is the most powerful version designed for complex tasks that require "reasoning" or processing multiple types of data.
Gemini Pro, is the medium-sized model that has been optimized to run more efficiently and perform a broader range of tasks. The smallest Gemini Nano is split into two, the Nano-1 has 1.8 billion parameters, and the Nano-2 has 3.25 billion parameters and are designed to run on small devices. Google did not reveal how many parameters its more powerful Gemini Pro and Gemini Ultra models contain.
So, what is Google using Gemini for? Starting from today, its AI chatbot Bard has now been updated to run Gemini Pro, meaning it should be better at understanding and summarizing text than its previous version powered by Google's PaLM 2 language model. The multimodal capabilities, however, aren't quite ready yet and the Gemini-Pro version of Bard can only process and generate text, and only supports English for now.
Google is also planning to revamp some of its Search, Ads, Chrome and Duet AI products with Gemini Pro, like Gmail, Google Docs, and more over the next few months.
Meanwhile, Google's latest Pixel 8 Pro will run Gemini Nano to support two new features, summarizing audio files in its Recorder app, and generating quick replies to text messages via the Gboard virtual keyboard app. Google will build more AI features on top of Gemini Nano for its smartphones, it said, and plans to open up the software to allow third-party Android developers too with its AICore service.
AICore runs on Android 14 and gives developers access to the model via open-source APIs, and will handle things like runtimes and safety.
- Google unveils TPU v5p pods to accelerate AI training
- AI offers some novel crystal materials that could form future chips, batteries, more
- Wish you could sing like Charli XCX or possess any musical talent? YouTube AI might make that happen
- Google DeepMind's GraphCast AI weather predictor looks fascinating on paper but ...
Unfortunately, those waiting to test out Gemini Ultra will have to wait a little longer. "We're currently completing extensive trust and safety checks, including red-teaming by trusted external parties, and further refining the model using fine-tuning and reinforcement learning from human feedback before making it broadly available," Google explained.
The Chocolate Factory plans to make Gemini Ultra available next year, and will start experimenting with the model's capabilities with select customers and developers before it launches its Bard Advanced chatbot.
Vendors looking to build specialized AI tools powered by Gemini for specific applications, like those working in the legal, HR, medical, or finance industries, for example, will be able to access Gemini Pro as an API in the Google AI Studio or Google Cloud Vertex AI platforms from 13 December.
Google vs OpenAI
Google has come under fire for being slow to ship AI products despite being a leader in the technology's research and development.
OpenAI launched its viral web app ChatGPT a year ago and helped Microsoft release its own AI Bing chatbot shortly afterwards, leaving Google to play catchup. Now, the latest ChatGPT and AI Bing versions powered by GPT-4 can also process images too. Gemini is Google's push to stay competitive. So how does it compare to OpenAI's models?
The short answer is: Gemini Pro seems to be a bit better than GPT-3.5, whereas Gemini Ultra is a bit better than GPT-4, according to some benchmark tests Google released.
"Broadly, we find that the performance of Gemini Pro outperforms inference-optimized models such as GPT-3.5 and performs comparably with several of the most capable models available, and Gemini Ultra outperforms all current models," the Gemini team said in a paper [PDF].
The testers compared Gemini's abilities with various models from OpenAI, Anthropic, X, and Meta across ten different tests. They mostly involved text-based tasks such as solving math and Python coding problems, question and answering for text comprehension, common sense checks, and machine translation.
Gemini Ultra performed better than GPT-4, Claude, Grok-1, and Llama-2 for eight out of ten tasks, whereas Gemini Pro surpassed GPT-3.5 and all the other models in seven out of nine tasks. These benchmark results, however, should be taken with a grain of salt.
Although AI technologies are improving, they aren't perfect and their behaviors are unpredictable. Gemini still has the same limitations as all large language models (LLMs) in generating factually incorrect information, a process known as hallucination.
"Despite their impressive capabilities, we should note that there are limitations to the use of LLMs. There is a continued need for ongoing research and development on 'hallucinations' generated by LLMs to ensure that model outputs are more reliable and verifiable," the Gemini team warned.
"LLMs also struggle with tasks requiring high-level reasoning abilities like causal understanding, logical deduction, and counterfactual reasoning even though they achieve impressive performance on exam benchmarks."
Still, Google is investing heavily in the technology. Under CEO Sundar Pichai, the search giant has reoriented itself as "an AI-first company" and is now scrambling to commercialize its efforts and remain competitive with the new wave of AI startups.
"Nearly eight years into our journey as an AI-first company, the pace of progress is only accelerating: Millions of people are now using generative AI across our products to do things they couldn't even a year ago, from finding answers to more complex questions to using new tools to collaborate and create," he said."
"At the same time, developers are using our models and infrastructure to build new generative AI applications, and startups and enterprises around the world are growing with our AI tools. This is incredible momentum, and yet, we're only beginning to scratch the surface of what's possible." ®