Apple actually opens up about something – the R&D behind Siri's voices
But you'd have to start working for them to find out any more
At an academic speech tech conference today (Thursday), Apple researchers will present some of the classic building blocks behind the voice generation of the Siri assistant.
"Apple has been publishing research more or less continuously over the years," Torbjørn Svendsen, an electrical engineer studying speech processing at Norwegian University of Science and Technology in Trondheim, told The Register.
But like most commercial companies "they do not say much about the technology or methods inside their products".
In 2012, Apple CEO Tim Cook said the firm would "double down" on product secrecy.
Researchers tell us that, in an effort to attract precious AI and machine-learning talents who desire publication, Cupertino may be cracking open the vault. The AI director said at the Neural Information Processing Systems conference in Barcelona in November that it would be publishing and, in July, the firm launched a research blog journal.
The new work, which is being presented at the Interspeech conference in Stockholm, Sweden, could be another talent-attraction strategy. In previous papers, the relationship to Apple's commercial systems has "not been as clear", Korin Richmond, a speech tech specialist at Edinburgh University, told The Register.
"We've been a bit surprised in the speech field to see Apple start to disclose more about their technology," he added.
The paper describes Siri and Apple Maps' deep learning-based system for synthesising speech, which gives their voices "naturalness, personality and expressivity".
Similar to other text-to-speech systems, it creates audio waveform segments of human-sounding synthetic speech from text input. In particular, the paper explains the runtime engine for converting text to speech, how voices are built and chosen optimisations for improving vocal features and running on device. There's also an accompanying blog post.
Many other companies, such as the voice-imitating startup Lyrebird, can pull off humanlike text-to-speech synthesis and plenty of research labs are working on making the tech even better.
Richmond said Apple's paper is not breaking new ground and "there are no big surprises" but its strategy of releasing a "modest" number of papers could "work as advertising" for talent by hinting at higher quality research and collaboration happening behind "closed doors".
Apple is also presenting two other papers at Interspeech (here and here) although not as explicitly on its commercial technology. They've got a ways to go on matching AI competitors on publication count – by comparison, Google is presenting approximately 40, IBM around 20 and Microsoft about 15.
"My guess is Apple isn't doing this out of the kindness of their heart" Richmond said, "to help advance the general state of the art in speech technology!" ®