You may have heard about AI defeating voice authentication. This research kinda proves it

Proof-of-concept study shows it's possible to bypass high levels of security, sometimes, sorta

The rise of off-the-shelf AI tools that can clone human voices may force developers of voice authentication software to build an extra layer of security to detect whether an audio sample appears to be human or machine-generated.

Voice authentication is commonly used by call centers, banks, government agencies, and so many other orgs. But attacking such systems with AI – typically using machine learning to imitate someone and authenticate as them – has become easier, with researchers now claiming a 99 percent success rate for subverting such security, in the right circumstances.

Those researchers being a pair of computer scientists at the University of Waterloo in Canada, who developed a technique to trick these authentication systems. Their paper, published in the proceedings of the 44th IEEE Symposium on Security and Privacy, describes fudging AI-generated speech recordings to create "adversarial" samples that were highly effective.

These samples, we're told, allowed the academics to bypass the voice authentication checks on the systems they tested.

Voice authentication relies on the fact that everyone's voice is unique, thanks to physical characteristics like the size and shape of the vocal tract and larynx, and social factors like accent.

These authentication systems capture those nuances in voiceprints. Although AI-generated audio can fairly realistically mimic people's voices, AI algorithms have their own distinctive artifacts that analysts can spot artificially created voices. The technique developed by the researchers tries to strip these features away, while preserving the overall sound.

"The idea is to 'engrave' the user's voiceprint into the spoofed sample," the compsci duo Andre Kassis and Urs Hengartner wrote in their paper. "Our adversarial engine attempts to remove machine artifacts that are predominant in these samples."

The researchers trained their system on samples of 107 speakers' utterances to get a better idea of what makes speech sound human. To test their algorithm, they crafted multiple adversarial samples to fool authentication systems – with a 72 percent success rate. Against some fairly weak systems, they achieved a 99 percent success rate after six attempts.

This doesn't mean voice authentication software is defunct just yet, though. Against Amazon Connect – software provided to cloud contact centers – they achieved only ten percent success in a four-second attack, and 40 percent in less than 30 seconds. And authentication software is improving, too, to defeat these kinds of attempts.

Miscreants hoping to carry out these types of attacks need to have access to their target's voice, and be sufficiently tech-savvy enough to generate their own adversarial audio samples if they're trying to crack a more secure system. A Vice reporter claimed in February he was able to log into his own bank account via AI trained on his voice.

Although the barrier is fairly high, the researchers urged companies developing voice authentication software to keep working and improving.

"The success rates of our attacks are concerning," they wrote, "primarily due to them being attained in the black-box setting and under the assumptions of realistic threat models." The findings "highlight the severe pitfalls of voice authentication systems and stress the need for more reliable mechanisms." ®

More about


Send us news

Other stories you might like