Google thinks AI can Google better than you can

Multimodal models promise machines can do more – if you let them

In the future, "Google will do the Googling for you" – or so suggests Liz Reid, VP of search.

On Tuesday at Google I/O, the search ad firm's annual developer conference, executives made the case for a world where multimodal machine learning models connect the dots and fill in the blanks.

"With AI Overviews, Google does the work for you," said Reid. "Instead of piecing together all the information yourself, you can ask your question."

AI Overviews – offered previously as a Search Labs experiment – is rolling out to US search users today, with more countries to follow.

Google's vision for the future of search involves posing complex queries that combine multiple demands into a single directive.

For example: "Find the best yoga or Pilates studios in Boston and show me details on their intro offers, and walking time from Beacon Hill." Normally this would be three or more keyword search queries. But by plugging Gemini into the search process, the entire inquiry can be handled at once.

"Under the hood, our custom Gemini model acts as your AI agent using what we call multi-step reasoning," explained Reid. "It breaks your bigger question down into all its parts. And it figures out which problems it needs to solve and in what order."

The presentation of AI Overview results, however, made it look like Google's search page will keep much of the viewer's attention, rather than sending searchers to linked websites.

According to Reid, Gemini can also assemble a multi-day meal plan on demand or handle activity planning – at least for those happy to delegate decisions to an AI model. Meal and trip plans, available in Search Labs, can be exported to Gmail or Docs from icons on the search results page, and additional plan categories – like parties, date nights, and workouts – are being developed.

Another pending search capability, coming soon to Search Labs, is the ability to search using videos as input.

Google

Using 21st century tools to explain technology from the 19th – click to enlarge

In an onstage demo, Rose Yao, VP of product at Google Search, showed how a video of a turntable with a moving tonearm could be submitted to Google Search to answer the question "Why will this not stay in place?" The underlying Gemini model is savvy enough to understand that "this" refers to the wobbly tonearm and to return a recommendation about how to fix the issue.

Astra, your AI pal who's fun to be with

Demis Hassabis, CEO of Google's DeepMind, took a turn on stage to tease Project Astra – a "universal AI agent."

"For a long time, we've wanted to build a universal AI agent that can be truly helpful in everyday life," enthused Hassabis.

"Building on our Gemini model, we've developed agents that can process information faster by continuously encoding video frames, combining the video and speech input into a timeline of events, and caching this for efficient recall. We've also enhanced how they sound, with a wider range of intonations.

"These agents better understand the context you're in, and can respond quickly in conversation, making the pace and quality of interaction feel much more natural."

A prerecorded video – said to have been captured in a single take in real time – showed off the Chocolate Factory's progress toward this goal. It featured an employee walking around a Google office scanning it with a smartphone running Astra.

Asked to report when it sees something that makes sound, the agent on the phone responded in a human-sounding voice that it spotted an audio speaker. It then went on to identify the speaker's tweeter after that portion of the speaker had been pointed to in the phone's live video of the room.

speaker

Astra knows its woofer from its tweeter – click to enlarge

Thereafter, the agent composed an alliterative phrase to describe a bowl of crayons upon request. And asked to explain source code on a nearby monitor, the software helper replied that it defines encryption and decryption functions.

Finally, in response to a query about its location, the agent determined that it was in the King's Cross area of London – apparently based on the view from the office window. It also remembered where the user had left their glasses – which many users might find worth the price of admission on its own.

According to Google, this is an ongoing project and some of these capabilities can be expected in Gemini at some point in the future.

Gemini furthered its infiltration of Google Workspace, with Gemini 1.5 Pro now providing AI help for Workspace Labs and Gemini for Workspace Alpha users via the side panel of Gmail, Docs, Drive, Slides, and Sheets. It will be available next month to Google One AI Premium subscribers on desktop via Gemini for Workspace add-ons.

Gemini Workspace features are also coming soon to the Gmail mobile app – specifically email summarization – in June, with contextual smart reply and Gmail Q&A out by July.

Creatives, meet your doom

Not to be outdone by OpenAI's text-to-video model Sora or its text-to-image model DALL·E, Google announced Veo and Imagen 3 for creating videos and images from text prompts.

"Veo creates high-quality, 1080p videos from text, image, and video prompts," declared Hassabis. "It can capture the details of your instructions in different visual and cinematic styles. You can prompt for things like aerial shots of a landscape or a time lapse, and further edit your videos using additional prompts."

Imagen 3 looks interesting for its ability to generate legible text within images and its ability to edit areas of an image by selecting and re-describing what should appear in the selection area.

Veo is available through a private waitlist via Google's VideoFX service, while Imagen 3 is similarly gated via ImageFX. MusicFX and TextFX are already available.

With these new AI capabilities, Google is also expanding its AI safety mechanisms. Specifically, the Chocolate Factory is making its SynthID watermarking technology available for Gemini-generated text in the app and on the web and for Veo video.

For more on Google's AI model enhancements, see our report on the megacorp's developer-oriented announcements. ®

More about

TIP US OFF

Send us news


Other stories you might like