Don't be fooled: Google faked its Gemini AI voice demo
PLUS: The AI companies that will use AMD's latest GPUs, and more
AI In brief Google wowed the internet with a demo video showing the multimodal capabilities of its latest large language model Gemini – but some of the the demo was faked.
In the demo below, Gemini seems to be able to respond to a user's voice and interact with a user's surroundings, looking at things they have drawn or playing rock, paper, scissors. In the demo, Gemini is asked to guess what the user is sketching on a Post-It note and correctly answers duck, for example.
A rubber duck is then placed on a paper atlas and Gemini is able to identify where the object has been placed. It does all sorts of things – identifying objects, finding where things have been hidden and switched under cups, and more. Google tried to show off Gemini's abilities to process different forms of information, and perform logical and spatial reasoning.
But in reality, the model was not prompted using audio and its responses were only text-based. They were not generated in real time either. Instead, the video was crafted "using still image frames from the footage, and prompting via text," a Google spokesperson told Bloomberg.
The person speaking in the demo was actually reading out some of the text prompts that were passed to the model, and the robot voice given to Gemini was reading out responses it had generated in text. Still images taken from the video – like the rock, paper, scissors – were fed to the model, and it was asked to guess the game. Google then cherry-picked its best outputs and narrated them alongside the footage to make it seem as if the model could respond flawlessly in real time.
"For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity," the description for the video on YouTube reads. Oriol Vinyals, VP of research and deep learning lead at Google DeepMind, who helped lead the Gemini project, admitted that the video demonstrates "what the multimodal user experiences built with Gemini could look like" (our emphasis).
AMD is coming for Nvidia's lunch
Top AI developers have committed to using AMD's latest Instinct MI300-series accelerators as they look for more computational resources to support the training and running of their models.
At AMD's launch this week, officials from Microsoft, Oracle, and Supermicro went on stage to support the chip shop, pledging to purchase and build AI servers to power cloud platforms, or standalone machines. Microsoft will use the chips to build MI300x v5 Virtual Machine clusters for Azure, while Oracle will offer OCI bare metal compute solutions.
Dell will integrate AMD's latest AI accelerators for its PowerEdge XE9680 servers, while HPE will start to deploy them for its HPC business. Meanwhile, other Meta promised to add the chips to its datacenters, and OpenAI is developing software to support the Instinct MI300 using its Triton 3.0 compiler.
"AI is the future of computing and AMD is uniquely positioned to power the end-to-end infrastructure that will define this AI era, from massive cloud installations to enterprise clusters and AI-enabled intelligent embedded devices and PCs," AMD CEO Lisa Su declared in a statement.
Nvidia is at the forefront of AI compute, and its revenues have grown massively year over year as demand for its GPUs rises. But supply is short, and big customers are looking for other options. Some with the deepest pockets have even turned to building their own custom silicon – like Google, Amazon, and Microsoft.
It's a good time to try and steal some of Nvidia's lunch, and AMD's Instinct MI300 series is its best attempt so far. As more and more developers adopt the chip, the software ecosystem designed to support its hardware will grow – making it easier for others to use AMD's hardware.
- Google launches Gemini AI systems, claims it's beating OpenAI and others - mostly
- AMD slaps together a silicon sandwich with MI300-series APUs, GPUs to challenge Nvidia's AI empire
- Strike over? US actors may return to work with top-tier 'progressive AI protections'
- Meta trials Purple Llama project for AI developers to test safety risks in models
SAG-AFTRA members vote to approve union contract regulating AI
US actor union SAG-AFTRA has officially ratified its agreement with top TV and film production companies after reaching a deal over better working conditions and AI.
Members ended their months-long strike and returned to working when leaders managed to negotiate better contract terms with the Alliance of Motion Picture and Television Producers (AMPTP). A big sticking point was regulating the use of AI as the technology becomes increasingly advanced and is adopted more widely by the entertainment industry.
Under the deal, media studios must obtain explicit consent and compensate performers for using their likeness. Actors and actresses were concerned that they could be replaced and lose out on jobs to companies turning to technology to create false but realistic-looking extras or voices for adverts, TV shows, or films.
The agreement was formally ratified after the majority of members voted in favor of it this week.
"SAG-AFTRA members demanded a fundamental change in the way this industry treats them: fairness in compensation for their labor, protection from abusive use of AI technology, strengthened benefit plans, and equitable and respectful treatment for all members, among other things," the union's national executive director & chief negotiator Duncan Crabtree-Ireland explained in a statement.
"This new contract delivers on these objectives and makes substantial progress in moving the industry in the right direction. By ratifying this contract, members have made it clear that they're eager to use their unity to lay the groundwork for a better industry, improving the lives of those working in their profession."
Meta releases text-to-image tool and promises to watermark its images
Meta released Imagine – a web-based text-to-image app – this week, and is planning to add a digital watermark to label synthetic content generated by its software.
Imagine is powered by Emu, which is a visual generative AI model capable of creating 2D- and short 3D-animated videos. It can be used by anyone with a Facebook account. Type in a short prompt, and Imagine will generate a panel of still images matching the input description that users can flick through and use.
Meta is planning to roll out technology that automatically adds a watermark to Imagine's outputs to make sure the AI-generated content can be detected.
"In the coming weeks, we'll add invisible watermarking to [Imagine] with Meta AI experience for increased transparency and traceability. The invisible watermark is applied with a deep learning model. While it's imperceptible to the human eye, the invisible watermark can be detected with a corresponding model," Meta confirmed in a blog post.
The social platform claimed that the watermark will remain intact even if users crop, alter, or take screenshots of Imagine's AI-generated images. ®