Hastening the arrival of a world in which simulation is indistinguishable from reality, startup Lyrebird has announced plans to power up an online service that can imitate a person's voice.
Given roughly a minute of voice samples from a specific person, the upstart's system can, via an API, convert supplied text into spoken words that sound a lot like the human source.
As if to establish the technology's potential for spoofing political figures and spreading fake news, Lyrebird has provided audio clips that feature the voices of Donald Trump, Barack Obama, and Hillary Clinton, saying sentences they never said themselves.
The deception isn't perfect. The voice samples provided sound processed and often the phrasing sounds off. But two years ago, University of Alabama at Birmingham researchers demonstrated that voice impersonation attacks could be crafted to fool automated systems between 80 to 90 per cent of the time and human listeners about half the time.
Lyrebird's simulated politicians already sound fairly convincing and could be more so with attentive post-processing and background noise added to mask audio artifacts. Further reinforcement may be possible using real-time video face manipulation.
The startup suggests there is a wide range of applications for the technology, such as speech synthesis for people who have lost their voices. And in what's sure to be a right-to-publicity litigation bonanza, it also suggests co-opting a celebrity voice to serve as a personal assistant, to read text aloud, or as a character in video or gaming products.
The company, based in Montreal, Canada, was founded by three University of Montréal PhD students, Alexandre de Brébisson, Jose Sotelo and Kundan Kumar. Sotelo and Kumar, along with faculty advisors Aaron Courville and Yoshua Bengio, coauthored a research paper [PDF] on using neural networks to generate audio from training samples. The team will be at the ICLR AI conference in France this week discussing their work.
On its website, Lyrebird highlights some of the ethical issues arising from its technology and API. The company says it wants people to understand that voice recordings aren't necessarily trustworthy.
"Voice recordings are currently considered as strong pieces of evidence in our societies and in particular in jurisdictions of many countries," the company says. "Our technology questions the validity of such evidence, as it allows [someone] to easily manipulate audio recordings."
It may also provide plausible deniability for anything actually caught on tape.
Lyrebird is not alone in its effort to enable mimicry on demand. Adobe last November showed off Project VoCo, software it described as Photoshop for audio. VoCo, currently under development, is sound editing software that provides a way to edit sound files by re-typing a speech-to-text track associated with a spoken audio file, given about 20 minutes of audio training samples.
During the demonstration, Adobe developer Zeyu Jin offered reassurance that his company has been exploring safeguards against forgery using digital watermarks.
A French company called CandyVoice provides imitation-as-a-service through its eponymous app, which relies on Microsoft Azure for backend processing. Carnegie Mellon's speech group provides similar software under the name FestVox. Baidu and Google likewise have been making advances in speech synthesis.
The 1992 movie Sneakers imagined how a biometric system might be duped using audio samples rearranged to form a passphrase. Whenever Lyrebird gets around to releasing the beta version of its software, such feats will be possible with the touch of a button. ®