Look ma, no hands! The machines are speaking our language

Voice recognition grows up

From Hal, the malfunctioning computer in Stanley Kubrick’s ground-breaking film 2001: A Space Odyssey, through to last year’s Spike Jonze film Her, in which a man falls in love with a computer voiced by Scarlett Johansson, film makers and sci-fi writers have long been imagining a future where speech recognition is commonplace.

The reality has been somewhat different. The first landmark in technology's struggle to overcome the challenges of voice recognition came in 1939, when Bell Labs showcased the Voder at New York World’s Fair.

The Voder attempted to synthesise human speech by breaking it down into its acoustic components. It was complex, required months of training to operate and had very limited success. It wasn’t until decades later that there was a real push to develop voice control and speech recognition technology.

The Voder - A Bell Labs presentation from 1939

Youtube Video

The 1980s saw the formation of several high-profile voice businesses, including Dragon Systems, which developed speech recognition software.

Then there was the Flemish firm Lernout & Hauspie, formed in 1987. Around the turn of the millennium L&H acquired a number of competitors, including Dictaphone and Dragon, and in its heyday it had a market capitalisation of nearly $10bn. Investors loved the company and its technology earned rave reviews and awards.

L&H’s speech technologies were subsequently bought by ScanSoft, later to become Nuance Communications.

A number of companies have since been hailed as the great white hope of voice recognition.

Wildfire was a voice recognition service bought by Orange in 2000: it took messages, placed calls and stored messages for its devoted but small user base and Orange ended up shutting the service down five years after it bought it.

SpinVox, co-founded by one-time Psion marketer Daniel Doulton, provided automated transcription with human intervention when needed by using a “combination of artificial intelligence, voice recognition and natural linguistics”.

The idea was that the technology would get better at translation and learn from its human operators. SpinVox signed deals with many carriers to use the technology to convert voicemail messaging to text, but ended up being perhaps over-reliant on its human operator-run call centres in developing countries. Eventually SpinVox ran out of money and was bought by Nuance in 2009.

Out of tune

It is not surprising that none of these lived up to the bright and shiny future predicted for them. Make no mistake, accurate voice recognition is difficult.

Vocalisation inevitably varies according to accent, pronunciation, pitch, volume, speed and so on, and it can also be distorted by background noise and echo.

Just take that famous Four Candles/Fork Handles wordplay sketch by the Two Ronnies as a silly example, and it is apparent that much of speech recognition is also about context and assumed knowledge.

Voice recognition remains an under-penetrated market, according to a 2012 report from researchers Global Industry Analysts, which cites accuracy as a key stumbling block obstructing wider acceptance of the technology.

That said, it predicts the global market for voice/speech recognition systems will reach $69bn by next year as the players in the space work to improve systems' ability to recognise and respond to natural human speech.

The improvement in accuracy is tangible. In recent years, voice recognition technology has moved on from conventional corporate uses such as interactive voice response systems to mass-market products like those in smartphones or car navigation systems.

Any iPhone user will spend time questioning Apple’s intelligent personal assistant Siri and trying to trip it up, and Microsoft’s intelligent personal assistant Cortana is available in beta on Windows phones in the US and UK. Global availability is due by the end of this year or early next.

Analyst Gene Munster at Piper Jaffay has been conducting occasional experiments with Google Now and Siri to test their accuracy, both in understanding the questions asked and finding the right answers.

Fast talker

He performed the latest experiment this summer by throwing a series of 800 questions at them and measuring their ability to respond to each query. Half the questions were asked indoors, half outdoors, and they covered five categories: local information, commerce, navigation, general information and operating system commands.

Munster’s team found that Google Now fared slightly better than Siri, accurately answering 86 per cent of all questions it heard correctly, while Siri's score was 84 per cent.

Dragon’s most recent voice recognition software release, Dragon NaturallySpeaking 13, boasts an impressive 99 per cent accuracy rate and turns speech into text at up to 160 words a minute.

"The technology is adapting to the way we do things“

Along with accuracy has come improved ease of use. Jonathan Whitmore, UK, Ireland and Middle East sales manager at Nuance Communications, believes that the perceptions of voice recognition technology have changed.

"It’s easy now to create a user profile. Whereas 10 years ago it would require lots of training and reading before you could start using a product, now you can just read a sentence or two and the product will learn as it goes along,” he says.

“There’s the perception that the technology really does work now, so we don’t have to change the way we do things in order to benefit from speech. Instead, the technology adapts to the way we do things.

Keep Reading

Skype for Windows 10 and Skype for Desktop duke it out: Only Electron left standing

Updated I just can't quit you, Skype. Oh maybe I can... they've tweaked the close function

Indonesia starts taxing Minecraft, Skype, Zoom and Twitter

Regional video streaming companies added to list of entities required to pay Digital Services Tax

Feds throw book at eBay execs who deny they had anything to do with cyberstalking of site's critics

James Baugh, David Harville hit with several new counts

Ex-eBay global intel staffers to admit they cyberstalked online tat bazaar's critics – who got pig heads, funeral wreath, and more in the mail

Four to plead guilty, accused senior bosses insist they weren't involved

Microsoft sprinkles a little Skype Meet Now integration on Windows 10 for Insiders

Plus: Annoying chat show host asks 'What the hell happened to Skype?' and users cry out: Let my People go... or at least banish it from Start

Another eBay exec pleads guilty after couple stalked, harassed for daring to criticize the internet tat bazaar

Former cop admits conspiracy to tamper with witnesses, too

Concerns raised over privacy and security of UK Home Office's £842m biometrics programme

Updated Plans to aggregate it with other databases should be discussed, says ethics group

Lockdown bidder block shock: Overzealous parental filters on Virgin Media and TalkTalk break eBay for UK users

No-no-no-no-no! I'm going to lose my bid on the £7 horse mask, um, I mean important lockdown things I need

Biting the hand that feeds IT © 1998–2021