This article is more than 1 year old
Amazon can't channel the dead, but its deepfake voices take a close second
Megacorp shows Alexa speaking like kid's deceased grandma
In the latest episode of Black Mirror, a vast megacorp sells AI software that learns to mimic the voice of a deceased woman whose husband sits weeping over a smart speaker, listening to her dulcet tones.
Only joking – it's Amazon, and this is real life. The experimental feature of the company's virtual assistant, Alexa, was announced at an Amazon conference in Las Vegas on Wednesday.
Rohit Prasad, head scientist for Alexa AI, described the tech as a means to build trust between human and machine, enabling Alexa to "make the memories last" when "so many of us have lost someone we love" during the pandemic.
In an explanatory video, Amazon showed a child asking: "Alexa, can Grandma finish reading me The Wizard of Oz?" at which point the assistant's normally artificial voice shifted gears into a softer, more natural timbre. The point being that it's supposed to convincingly sound like the kid's grandma.
Though there was scant detail as to when or even if the technology would become publicly available, Prasad said Amazon was able to train the system to mimic a voice based on about a minute of recorded dialog, meaning users could potentially do this themselves at home.
It's an interesting use case on the face of it, and certainly a strange way to present it. That's because Amazon has not found a way to "channel the dead" – it has merely developed an inhouse deepfake for voices.
Deepfakes are manipulated video images, strung together by an AI from a variety of footage, to show their subject – a celebrity, politician, anyone – doing or saying something they have never in reality done.
- Big Tech falls in line with Euro demands to fight bots, deepfakes, disinformation
- Deepfake attacks can easily trick live facial recognition systems online
- Good: People can spot a deepfake video. Bad: They're not so hot with text
- Facebook, academics think they've cracked spotting deepfakes by spotting how they're generated
Most examples of a politician being deepfaked to say something inflammatory would have been aligned with an audio recording of someone who can do a decent impression of the subject. With AI mimicry, this could potentially make deepfake footage all the more convincing.
The field around AI-manipulated voice is growing. Microsoft has its own take on voice mimicry, ostensibly to help restore impaired people's speech, but was concerned about the potential for abuse. The software could reproduce a voice based on a short sample, like Amazon, yet it has remained in limbo for years while company leaders wrestled with its ethical implications.
Meanwhile, a startup called Sanas recently dropped out of stealth with $32 million in funding. The idea behind its technology is to transform the speaker's accent into something closer to home for the recipient – for example, a sales droid from a call center in France speaking with an Alabama accent for their customer in that state. ®