AI + ML

This article is more than 1 year old

I'm sorry, Dave. I'm afraid I can do that: Microsoft unveils Custom Neural Voice – synthetic, but human-sounding speech

Out-of-work actors beware, Azure can take on voice-over duties

Thu 4 Feb 2021 // 17:32 UTC

Microsoft has pushed its Custom Neural Voice service to general availability, although you'll have to ask the company nicely if you want to use the vaguely unsettling text-to-speech service.

Unsettling, because unlike the usual text to speech we've come to know and love over the years, which require a substantial amount of data (10,000 lines or more, according to Microsoft) to sound fluent, Custom Neural Voice requires far less in terms of training audio. The result is disturbingly human-like.

"This new technology allows companies to spend a tenth of the effort traditionally needed to prepare training data," explained Microsoft, which will come as a delight to out-of-work actors looking to do some voiceover jobs on the side (it probably won't).

There is also a real risk of abuse, hence the GA gates now being entirely thrown open.

Robot as person illustration via Shutterstock

Remember OpenAI's GPT model that was too dangerous for mere mortals? Well, it's now for sale on Azure

Microsoft's own code of conduct for the technology warns against using "photo realistic avatars with synthetic voices to represent real people" nor "using a synthetic voice with contents without editorial control." Sensible guidelines when choosing a use case, but unlikely to put off a determined miscreant.

As for the technology itself, three components are at play: Text Analyzer, Neural Acoustic Model, and Neural Vocoder. The trio take inputted text, convert it to a phoneme (a basic unit of sound) sequence, pass that through the model to predict acoustic features before finally spitting out audible speech.

The Neural Model itself is trained using neural networks and actual voice recordings. Those recordings are where things get sticky, and "Microsoft requires every customer to obtain explicit written permission from the voice talent before creating a voice model." Verification is also performed.

After all, once that model is up to snuff, the voice could say all manner of things. Microsoft also insists the use of a synthetic voice be disclosed to users, which could make some of the relentlessly perky chatbot-style use cases presented potentially awkward.

Adopters have included AT&T, which had a voiceover artist churn out 2,000 phrases and lines in order to voice cartoon character Bugs Bunny with Custom Neural Voice. At least in that instance one knows that Bugs is a fictional character. ®

Topics

Special Features

Vendor Voice

Resources

AI + ML

I'm sorry, Dave. I'm afraid I can do that: Microsoft unveils Custom Neural Voice – synthetic, but human-sounding speech

Out-of-work actors beware, Azure can take on voice-over duties

Remember OpenAI's GPT model that was too dangerous for mere mortals? Well, it's now for sale on Azure

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Microsoft rolls out safety tools for Azure AI. Hint: More models

Google Cloud chief is really psyched about this AI thing

AI spam is winning the battle against search engine quality

Reducing the cloud security overhead

Cloud Software Group and Microsoft pledge another eight years of co-opetition

Intel CEO suggests AI can help to create a one-person Unicorn

Tech titans assemble to decide which jobs AI should cut first

Hailo's latest AI chip shows up integrated NPUs and sips power like fine wine

Microsoft puts ex-DeepMind boffin in charge of London AI hub

US House mulls forcing AI makers to reveal use of copyrighted training data

Why Microsoft's Copilot will only kinda run locally on AI PCs for now

British watchdog has 'real concerns' about the staggering love-in between cloud giants and AI upstarts

About Us

Our Websites

Your Privacy