Amazon can't channel the dead, but its deepfake voices take a close second

Megacorp shows Alexa speaking like kid's deceased grandma

In the latest episode of Black Mirror, a vast megacorp sells AI software that learns to mimic the voice of a deceased woman whose husband sits weeping over a smart speaker, listening to her dulcet tones.

Only joking – it's Amazon, and this is real life. The experimental feature of the company's virtual assistant, Alexa, was announced at an Amazon conference in Las Vegas on Wednesday.

Rohit Prasad, head scientist for Alexa AI, described the tech as a means to build trust between human and machine, enabling Alexa to "make the memories last" when "so many of us have lost someone we love" during the pandemic.

In an explanatory video, Amazon showed a child asking: "Alexa, can Grandma finish reading me The Wizard of Oz?" at which point the assistant's normally artificial voice shifted gears into a softer, more natural timbre. The point being that it's supposed to convincingly sound like the kid's grandma.

Youtube Video

Though there was scant detail as to when or even if the technology would become publicly available, Prasad said Amazon was able to train the system to mimic a voice based on about a minute of recorded dialog, meaning users could potentially do this themselves at home.

It's an interesting use case on the face of it, and certainly a strange way to present it. That's because Amazon has not found a way to "channel the dead" – it has merely developed an inhouse deepfake for voices.

Deepfakes are manipulated video images, strung together by an AI from a variety of footage, to show their subject – a celebrity, politician, anyone – doing or saying something they have never in reality done.

Most examples of a politician being deepfaked to say something inflammatory would have been aligned with an audio recording of someone who can do a decent impression of the subject. With AI mimicry, this could potentially make deepfake footage all the more convincing.

The field around AI-manipulated voice is growing. Microsoft has its own take on voice mimicry, ostensibly to help restore impaired people's speech, but was concerned about the potential for abuse. The software could reproduce a voice based on a short sample, like Amazon, yet it has remained in limbo for years while company leaders wrestled with its ethical implications.

Meanwhile, a startup called Sanas recently dropped out of stealth with $32 million in funding. The idea behind its technology is to transform the speaker's accent into something closer to home for the recipient – for example, a sales droid from a call center in France speaking with an Alabama accent for their customer in that state. ®

Other stories you might like

  • Cerebras sets record for 'largest AI model' on a single chip
    Plus: Yandex releases 100-billion-parameter language model for free, and more

    In brief US hardware startup Cerebras claims to have trained the largest AI model on a single device powered by the world's largest Wafer Scale Engine 2 chip the size of a plate.

    "Using the Cerebras Software Platform (CSoft), our customers can easily train state-of-the-art GPT language models (such as GPT-3 and GPT-J) with up to 20 billion parameters on a single CS-2 system," the company claimed this week. "Running on a single CS-2, these models take minutes to set up and users can quickly move between models with just a few keystrokes."

    The CS-2 packs a whopping 850,000 cores, and has 40GB of on-chip memory capable of reaching 20 PB/sec memory bandwidth. The specs on other types of AI accelerators and GPUs pale in comparison, meaning machine learning engineers have to train huge AI models with billions of parameters across more servers.

    Continue reading
  • If AI chatbots are sentient, they can be squirrels, too
    Plus: FTC warns against using ML for automatic content moderation, and more

    In Brief No, AI chatbots are not sentient.

    Just as soon as the story on a Google engineer, who blew the whistle on what he claimed was a sentient language model, went viral, multiple publications stepped in to say he's wrong.

    The debate on whether the company's LaMDA chatbot is conscious or has a soul or not isn't a very good one, just because it's too easy to shut down the side that believes it does. Like most large language models, LaMDA has billions of parameters and was trained on text scraped from the internet. The model learns the relationships between words, and which ones are more likely to appear next to each other.

    Continue reading
  • FBI warning: Crooks are using deepfake videos in interviews for remote gigs
    Yes. Of course I human. Why asking? Also, when you give passwords to database?

    The US FBI issued a warning on Tuesday that it was has received increasing numbers of complaints relating to the use of deepfake videos during interviews for tech jobs that involve access to sensitive systems and information.

    The deepfake videos include a video image or recording convincingly manipulated to misrepresent someone as the "applicant" for jobs that can be performed remotely. The Bureau reports the scam has been tried on jobs for developers, "database, and software-related job functions". Some of the targeted jobs required access to customers' personal information, financial data, large databases and/or proprietary information.

    "In these interviews, the actions and lip movement of the person seen interviewed on-camera do not completely coordinate with the audio of the person speaking. At times, actions such as coughing, sneezing, or other auditory actions are not aligned with what is presented visually," said the FBI in a public service announcement.

    Continue reading

Biting the hand that feeds IT © 1998–2022