From Hal, the malfunctioning computer in Stanley Kubrick’s ground-breaking film 2001: A Space Odyssey, through to last year’s Spike Jonze film Her, in which a man falls in love with a computer voiced by Scarlett Johansson, film makers and sci-fi writers have long been imagining a future where speech recognition is commonplace.
The reality has been somewhat different. The first landmark in technology's struggle to overcome the challenges of voice recognition came in 1939, when Bell Labs showcased the Voder at New York World’s Fair.
The Voder attempted to synthesise human speech by breaking it down into its acoustic components. It was complex, required months of training to operate and had very limited success. It wasn’t until decades later that there was a real push to develop voice control and speech recognition technology.
The Voder - A Bell Labs presentation from 1939
The 1980s saw the formation of several high-profile voice businesses, including Dragon Systems, which developed speech recognition software.
Then there was the Flemish firm Lernout & Hauspie, formed in 1987. Around the turn of the millennium L&H acquired a number of competitors, including Dictaphone and Dragon, and in its heyday it had a market capitalisation of nearly $10bn. Investors loved the company and its technology earned rave reviews and awards.
L&H’s speech technologies were subsequently bought by ScanSoft, later to become Nuance Communications.
A number of companies have since been hailed as the great white hope of voice recognition.
Wildfire was a voice recognition service bought by Orange in 2000: it took messages, placed calls and stored messages for its devoted but small user base and Orange ended up shutting the service down five years after it bought it.
SpinVox, co-founded by one-time Psion marketer Daniel Doulton, provided automated transcription with human intervention when needed by using a “combination of artificial intelligence, voice recognition and natural linguistics”.
The idea was that the technology would get better at translation and learn from its human operators. SpinVox signed deals with many carriers to use the technology to convert voicemail messaging to text, but ended up being perhaps over-reliant on its human operator-run call centres in developing countries. Eventually SpinVox ran out of money and was bought by Nuance in 2009.
Out of tune
It is not surprising that none of these lived up to the bright and shiny future predicted for them. Make no mistake, accurate voice recognition is difficult.
Vocalisation inevitably varies according to accent, pronunciation, pitch, volume, speed and so on, and it can also be distorted by background noise and echo.
Just take that famous Four Candles/Fork Handles wordplay sketch by the Two Ronnies as a silly example, and it is apparent that much of speech recognition is also about context and assumed knowledge.
Voice recognition remains an under-penetrated market, according to a 2012 report from researchers Global Industry Analysts, which cites accuracy as a key stumbling block obstructing wider acceptance of the technology.
That said, it predicts the global market for voice/speech recognition systems will reach $69bn by next year as the players in the space work to improve systems' ability to recognise and respond to natural human speech.
The improvement in accuracy is tangible. In recent years, voice recognition technology has moved on from conventional corporate uses such as interactive voice response systems to mass-market products like those in smartphones or car navigation systems.
Any iPhone user will spend time questioning Apple’s intelligent personal assistant Siri and trying to trip it up, and Microsoft’s intelligent personal assistant Cortana is available in beta on Windows phones in the US and UK. Global availability is due by the end of this year or early next.
Analyst Gene Munster at Piper Jaffay has been conducting occasional experiments with Google Now and Siri to test their accuracy, both in understanding the questions asked and finding the right answers.
He performed the latest experiment this summer by throwing a series of 800 questions at them and measuring their ability to respond to each query. Half the questions were asked indoors, half outdoors, and they covered five categories: local information, commerce, navigation, general information and operating system commands.
Munster’s team found that Google Now fared slightly better than Siri, accurately answering 86 per cent of all questions it heard correctly, while Siri's score was 84 per cent.
Dragon’s most recent voice recognition software release, Dragon NaturallySpeaking 13, boasts an impressive 99 per cent accuracy rate and turns speech into text at up to 160 words a minute.
"The technology is adapting to the way we do things“
Along with accuracy has come improved ease of use. Jonathan Whitmore, UK, Ireland and Middle East sales manager at Nuance Communications, believes that the perceptions of voice recognition technology have changed.
"It’s easy now to create a user profile. Whereas 10 years ago it would require lots of training and reading before you could start using a product, now you can just read a sentence or two and the product will learn as it goes along,” he says.
“There’s the perception that the technology really does work now, so we don’t have to change the way we do things in order to benefit from speech. Instead, the technology adapts to the way we do things.