This article is more than 1 year old
Microsoft seeks patent for tech to put words into your mouth
Remember that satnav that spoke like Homer Simpson? Now imagine AI doing that in any language, on any device
Microsoft has filed a patent application for an "automatic dubbing" system that strips speech from media and inserts new voices in its place.
The patent [PDF] describes a system made up of an audio processing module, supported by visual and text code, and married with voice tracking software. The audio from a file is extracted and removed and a new voice is generated, matching the timing of the original speech, and even translating audio and text to another language if desired.
Here's a schematic drawing portraying the system's components from the patent filing:
AI can be used to clone people's voices, and automatically translate them into different languages. The system detailed in the patent brings that functionality to personal devices, websites, DVD players, and more.
- The FCC wants to criminalize AI robocall spam
- Uncle Sam will pay for your big ideas to end AI voice-cloning fraud
- Is it 2000 or 2023? Get ready for AI-anchored news. Again
In one "implementation scenario", Microsoft describes how its automatic dubbing can be used to customize a character's voice in a film or game. It envisions a user replacing their own voice with an automatically generated synthetic version across multiple formats.
It's easy to imagine how such a system can be used in the future. The voices of actors, podcasters, or politicians could be preserved and translated across multiple languages to reach a wider audience using AI. Maybe fake AI-generated voices can be used instead, converting text to speech in apps and bringing virtual personalities to life in apps or devices. Users could choose how they want to listen to an audiobook or chatbot, for example.
Microsoft’s filing doesn’t state the intended purpose of its invention but suggests it might reduce dubbing costs for entertainment and media companies. They could use their software to translate a voice actor's speech into different languages, instead of hiring human linguists, although the actor's unions might have something to say about that – especially as in some countries voice actors specialize in dubbing particular Hollywood stars into local languages, and achieve a decent measure of fame for doing so.
The Register is also reminded of the TomTom satnav unit that in 2009 offered the voice of Homer Simpson, actor Dan Castellaneta, as a paid upgrade for those seeking directions to the nearest Krusty Burger. Microsoft’s patent could make Castellaneta’s voice a tick-box option in any language.
The Register has asked Microsoft for comment but has not received a response at the time of publication. D’oh! ®