Humans flunk the Turing test for voices as bots get chattier

Coin toss odds for spotting a deepfake, study finds. And that's before the machines learn to sing

Think you can distinguish between a human voice and a robot? Think again, because the numbers are starting to say otherwise. 

Researchers at Queen Mary University of London and University College London found that people can no longer reliably distinguish between genuine speech and cloned AI voices. 

Their study, published in open-access journal PLOS One, found that when people were played recordings of real people together with AI-generated versions of the same voices, their judgments were little better than random chance.

The team, led by psychologist Nadine Lavan, tested 80 audio samples: half human, half synthetic. The fully synthetic AI voices – that is, those generated entirely by text-to-speech models rather than trained on recordings of a real person – were easier to spot, with "only" 41 percent mistaken for human.

But when the voices were cloned from actual people using a few minutes of recorded speech, 58 percent of them fooled listeners into thinking they were human. Subjects correctly identified real voices only 62 percent of the time, leaving no meaningful difference between the two.

All told, the researchers found "no significant sensitivity" in people's ability to tell cloned voices from real ones – a polite way of saying that we're guessing.

"We show that, under certain conditions, it is not possible for human listeners to accurately discriminate between AI-generated voices and genuine recordings," the authors wrote, adding that the results "demonstrate how voices generated from limited amounts of input data can reach a similar level of human likeness to real recordings of human speakers."

... Uncle Leo?

The team used off-the-shelf software from ElevenLabs and fewer than five minutes of sample speech to build each synthetic voice. The researchers said the tech fell short of "hyperrealism," but even then, some of the clones were perceived as being more trustworthy and more dominant than the human voices.

That might be good news for accessibility tools and voice interfaces, but less so for consumers as scammers and misinformation peddlers inevitably learn to exploit the tech.

The findings come as cloned voices increasingly feature in phone scams and fraud cases, with victims tricked into transferring money after hearing what sounded like relatives in distress. Consumer Reports recently found that four out of six popular voice-cloning providers offered little more than a tick-box self-attestation to stop impersonation, a loophole that makes the tech almost as easy to abuse as it is to use.

Indeed, the researchers conclude that the ability to tell who's talking (or presumably, whether anyone is) may require something stronger than a good ear as cloned speech keeps improving. 

"Hearing a voice humanizes the speaker from the perspective of the listener," the researchers note. "It is worth considering on a theoretical and conceptual level what it means that some real and natural human voices can sound less real than some AI-generated voices."

Call us cynics, but that sounds like academic shorthand for "the robots now sound more human than people do." ®

More about

TIP US OFF

Send us news


Other stories you might like