Science

This article is more than 1 year old

Want a medal? Microsoft 7.2% less bad at speech recognition than IBM

Clash of the machine-learning titans

Tue 22 Aug 2017 // 08:02 UTC

In a machine learning tug-of-war, Microsoft may have just barely slipped ahead of IBM for speech transcription accuracy.

Researchers are studying how to recognise human speech in a variety of settings – from realtime interactions to offline, pre-recorded voicemails. Boffins tell us that one application, particularly of offline transcription, could be government surveillance.

In March, IBM researchers claimed that they had achieved a word recognition error rate of 5.5 per cent for pre-recorded English telephone conversations between strangers on set topics such as sports. They're presenting their peer-reviewed research this week (PDF) at the INTERSPEECH 2017 conference in Stockholm, Sweden.

On Sunday, Microsoft published a blog post and technical whitepaper claiming it has achieved 5.1 per cent on the same task – a small improvement.

Like the IBM work, its algorithms used deep learning architectures for acoustic and language modelling. Microsoft claims it had achieved a word error rate of 5.9 per cent last year and credits its bump to "using the most scalable deep learning software available, Microsoft Cognitive Toolkit 2.1 (CNTK), for exploring model architectures and optimizing the hyper-parameters of our models. Additionally, Microsoft's investment in cloud compute infrastructure, specifically Azure GPUs, helped to improve the effectiveness and speed by which we could train our models and test new ideas."

Eric Postma, a computer scientist at Tilburg University in the Netherlands who studies speech recognition, told The Register it is "a significant step forward" but "not a breakthrough" because the goal is to achieve human-level recognition – like being able to comprehend utterances with multiple voices speaking simultaneously in a cocktail party or when you need common sense.

Microsoft admitted there's still tons of work to be done on recognising various accents, speaking styles and languages – not to mention comprehending conversations in crowded rooms with a distant mic.

And although IBM may claim that a 5.1 per cent error rate on this dataset would be human-level recognition, Postma said: "That's marketing, not science."

Phil Woodland, an information engineer at Cambridge uni who specialises in speech recognition and has worked on the same dataset before, told The Reg that "the error rates have come down significantly" since this problem was tackled in the early 1990s (using one 2004 telephone conversation dataset called RT-04 IBM researchers achieved an error rate of 15.2 per cent).

He pointed out that in addition to recognising speech between strangers, IBM's new paper also transcribed a dataset for speech between family members, who would speak casually (achieving an error rate of 10.3 per cent). By comparison, Microsoft's paper only tackled the "easier" problem – when strangers speak their voice is more formal and easier to understand.

He says it's difficult to "pin down" a metric for human performance since it can vary from task to task. There's a chance the Microsoft algorithms might actually perform worse on the harder dataset or get similar numbers to IBM, he said.

It's also unclear if the Microsoft algorithms could apply to other datasets. It's possible that the researchers' algorithms might be tuned to work specifically on telephone conversations, and would not transfer to tasks such as voice search or transcribing broadcast data from media archives. ®

Topics

Special Features

Vendor Voice

Resources

Science

Want a medal? Microsoft 7.2% less bad at speech recognition than IBM

Clash of the machine-learning titans

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Researchers claim Windows Defender can be fooled into deleting databases

October 2025 will be a support massacre for a bunch of Microsoft products

Microsoft is a national security threat, says ex-White House cyber policy director

Industrial systems integrating digitalisation

IBM accused of cheating its own executive assistants out of overtime pay

Open source versus Microsoft: The new rebellion begins

Microsoft breach allowed Russian spies to steal emails from US government

Microsoft claims it didn't mean to inject Copilot into Windows Server 2022 this week

AI gold rush continues as Microsoft invests $1.5B in UAE's G42

Microsoft to use Windows 11 Start menu as a billboard with app ads for Insiders

Microsoft teases deepfake AI that's too powerful to release

Microsoft aims to triple datacenter capacity to fuel AI boom

About Us

Our Websites

Your Privacy