Look ma, no hands! The machines are speaking our language

Voice recognition grows up


From Hal, the malfunctioning computer in Stanley Kubrick’s ground-breaking film 2001: A Space Odyssey, through to last year’s Spike Jonze film Her, in which a man falls in love with a computer voiced by Scarlett Johansson, film makers and sci-fi writers have long been imagining a future where speech recognition is commonplace.

The reality has been somewhat different. The first landmark in technology's struggle to overcome the challenges of voice recognition came in 1939, when Bell Labs showcased the Voder at New York World’s Fair.

The Voder attempted to synthesise human speech by breaking it down into its acoustic components. It was complex, required months of training to operate and had very limited success. It wasn’t until decades later that there was a real push to develop voice control and speech recognition technology.

The Voder - A Bell Labs presentation from 1939

Youtube Video

The 1980s saw the formation of several high-profile voice businesses, including Dragon Systems, which developed speech recognition software.

Then there was the Flemish firm Lernout & Hauspie, formed in 1987. Around the turn of the millennium L&H acquired a number of competitors, including Dictaphone and Dragon, and in its heyday it had a market capitalisation of nearly $10bn. Investors loved the company and its technology earned rave reviews and awards.

L&H’s speech technologies were subsequently bought by ScanSoft, later to become Nuance Communications.

A number of companies have since been hailed as the great white hope of voice recognition.

Wildfire was a voice recognition service bought by Orange in 2000: it took messages, placed calls and stored messages for its devoted but small user base and Orange ended up shutting the service down five years after it bought it.

SpinVox, co-founded by one-time Psion marketer Daniel Doulton, provided automated transcription with human intervention when needed by using a “combination of artificial intelligence, voice recognition and natural linguistics”.

The idea was that the technology would get better at translation and learn from its human operators. SpinVox signed deals with many carriers to use the technology to convert voicemail messaging to text, but ended up being perhaps over-reliant on its human operator-run call centres in developing countries. Eventually SpinVox ran out of money and was bought by Nuance in 2009.

Out of tune

It is not surprising that none of these lived up to the bright and shiny future predicted for them. Make no mistake, accurate voice recognition is difficult.

Vocalisation inevitably varies according to accent, pronunciation, pitch, volume, speed and so on, and it can also be distorted by background noise and echo.

Just take that famous Four Candles/Fork Handles wordplay sketch by the Two Ronnies as a silly example, and it is apparent that much of speech recognition is also about context and assumed knowledge.

Voice recognition remains an under-penetrated market, according to a 2012 report from researchers Global Industry Analysts, which cites accuracy as a key stumbling block obstructing wider acceptance of the technology.

That said, it predicts the global market for voice/speech recognition systems will reach $69bn by next year as the players in the space work to improve systems' ability to recognise and respond to natural human speech.

The improvement in accuracy is tangible. In recent years, voice recognition technology has moved on from conventional corporate uses such as interactive voice response systems to mass-market products like those in smartphones or car navigation systems.

Any iPhone user will spend time questioning Apple’s intelligent personal assistant Siri and trying to trip it up, and Microsoft’s intelligent personal assistant Cortana is available in beta on Windows phones in the US and UK. Global availability is due by the end of this year or early next.

Analyst Gene Munster at Piper Jaffay has been conducting occasional experiments with Google Now and Siri to test their accuracy, both in understanding the questions asked and finding the right answers.

Fast talker

He performed the latest experiment this summer by throwing a series of 800 questions at them and measuring their ability to respond to each query. Half the questions were asked indoors, half outdoors, and they covered five categories: local information, commerce, navigation, general information and operating system commands.

Munster’s team found that Google Now fared slightly better than Siri, accurately answering 86 per cent of all questions it heard correctly, while Siri's score was 84 per cent.

Dragon’s most recent voice recognition software release, Dragon NaturallySpeaking 13, boasts an impressive 99 per cent accuracy rate and turns speech into text at up to 160 words a minute.

"The technology is adapting to the way we do things“

Along with accuracy has come improved ease of use. Jonathan Whitmore, UK, Ireland and Middle East sales manager at Nuance Communications, believes that the perceptions of voice recognition technology have changed.

"It’s easy now to create a user profile. Whereas 10 years ago it would require lots of training and reading before you could start using a product, now you can just read a sentence or two and the product will learn as it goes along,” he says.

“There’s the perception that the technology really does work now, so we don’t have to change the way we do things in order to benefit from speech. Instead, the technology adapts to the way we do things.

Similar topics


Other stories you might like

  • Experts: AI should be recognized as inventors in patent law
    Plus: Police release deepfake of murdered teen in cold case, and more

    In-brief Governments around the world should pass intellectual property laws that grant rights to AI systems, two academics at the University of New South Wales in Australia argued.

    Alexandra George, and Toby Walsh, professors of law and AI, respectively, believe failing to recognize machines as inventors could have long-lasting impacts on economies and societies. 

    "If courts and governments decide that AI-made inventions cannot be patented, the implications could be huge," they wrote in a comment article published in Nature. "Funders and businesses would be less incentivized to pursue useful research using AI inventors when a return on their investment could be limited. Society could miss out on the development of worthwhile and life-saving inventions."

    Continue reading
  • Declassified and released: More secret files on US govt's emergency doomsday powers
    Nuke incoming? Quick break out the plans for rationing, censorship, property seizures, and more

    More papers describing the orders and messages the US President can issue in the event of apocalyptic crises, such as a devastating nuclear attack, have been declassified and released for all to see.

    These government files are part of a larger collection of records that discuss the nature, reach, and use of secret Presidential Emergency Action Documents: these are executive orders, announcements, and statements to Congress that are all ready to sign and send out as soon as a doomsday scenario occurs. PEADs are supposed to give America's commander-in-chief immediate extraordinary powers to overcome extraordinary events.

    PEADs have never been declassified or revealed before. They remain hush-hush, and their exact details are not publicly known.

    Continue reading
  • Stolen university credentials up for sale by Russian crooks, FBI warns
    Forget dark-web souks, thousands of these are already being traded on public bazaars

    Russian crooks are selling network credentials and virtual private network access for a "multitude" of US universities and colleges on criminal marketplaces, according to the FBI.

    According to a warning issued on Thursday, these stolen credentials sell for thousands of dollars on both dark web and public internet forums, and could lead to subsequent cyberattacks against individual employees or the schools themselves.

    "The exposure of usernames and passwords can lead to brute force credential stuffing computer network attacks, whereby attackers attempt logins across various internet sites or exploit them for subsequent cyber attacks as criminal actors take advantage of users recycling the same credentials across multiple accounts, internet sites, and services," the Feds' alert [PDF] said.

    Continue reading

Biting the hand that feeds IT © 1998–2022