This speech recognition code is 'just as good' as a pro transcriber

Transcriptionist, your days are numbered, it seems


Microsoft on Tuesday said that its researchers have "made a major breakthrough in speech recognition."

In a paper [PDF] published a day earlier, Microsoft machine learning researchers describe how they developed an automated system that can recognize recorded speech as well as a professional transcriptionist.

Using the NIST 2000 dataset of recorded calls, Microsoft's software performed slightly (0.4 per cent) better than the error rate the company attributes to professional transcriptionists (5.9 per cent) for the Switchboard portion of the data, in which strangers discuss a specified topic.

It saw a similarly narrow margin of success with the CallHome portion of the data – in which family members converse without guidelines – where the human transcription error rate was 11.3 per cent.

A month ago, Microsoft's researchers reported that their software had achieved a 6.3 per cent word error rate. In May, 2015, Google said it had achieved an 8 per cent error rate with its speech recognition technology. Such rapid progress underscores the intense interest in machine learning and artificial intelligence at technology companies.

"This marks the first time that human parity has been reported for conversational speech," the researchers said in their paper, attributing their success to the use of convolutional and LSTM (long short term memory) neural networks, and to techniques that improve the accuracy of data models like spatial smoothing. They also said that they relied on Microsoft's Computational Network Toolkit (CNTK), a machine learning framework the company has made available as an open source project.

Geoffrey Zweig, manager of Microsoft's speech and dialog research group, hailed the achievement as the culmination of over 20 years of effort.

To get there, Microsoft moved the goalpost a bit. The company's researchers dispensed with a 4 per cent error rate cited in a 1997 paper [PDF] for spontaneous conversations over a telephone line. That error rate estimate, they said, "is attributed to a 'personal communication,' and the actual source of this number is ephemeral."

When human transcribers evaluated the same audio files as Microsoft's software, their error rates were 5.9 per cent and 11.3 per cent respectively. Hence, the researchers deemed it inappropriate to use a single, anecdotal figure as the number to beat.

Microsoft expects its speech recognition advance will help improve its Cortana personal assistant software, among other products. And it emphasizes that achieving parity with human transcriptionists shouldn't be confused with perfection, because humans make mistakes too.

Cortana evidently can benefit from further improvement. Last month, security firm Sophos advised against relying on Cortana for making emergency calls, based on an account of a UK woman who used the software to dial the local police in order to report an accident and was directed to authorities in the US.

In the future, those in need of aid might consider calling out to idle transcriptionists. ®

Similar topics


Other stories you might like

  • Verizon: Ransomware sees biggest jump in five years
    We're only here for DBIRs

    The cybersecurity landscape continues to expand and evolve rapidly, fueled in large part by the cat-and-mouse game between miscreants trying to get into corporate IT environments and those hired by enterprises and security vendors to keep them out.

    Despite all that, Verizon's annual security breach report is again showing that there are constants in the field, including that ransomware continues to be a fast-growing threat and that the "human element" still plays a central role in most security breaches, whether it's through social engineering, bad decisions, or similar.

    According to the US carrier's 2022 Data Breach Investigations Report (DBIR) released this week [PDF], ransomware accounted for 25 percent of the observed security incidents that occurred between November 1, 2020, and October 31, 2021, and was present in 70 percent of all malware infections. Ransomware outbreaks increased 13 percent year-over-year, a larger increase than the previous five years combined.

    Continue reading
  • Slack-for-engineers Mattermost on open source and data sovereignty
    Control and access are becoming a hot button for orgs

    Interview "It's our data, it's our intellectual property. Being able to migrate it out those systems is near impossible... It was a real frustration for us."

    These were the words of communication and collaboration platform Mattermost's founder and CTO, Corey Hulen, speaking to The Register about open source, sovereignty and audio bridges.

    "Some of the history of Mattermost is exactly that problem," says Hulen of the issue of closed source software. "We were using proprietary tools – we were not a collaboration platform before, we were a games company before – [and] we were extremely frustrated because we couldn't get our intellectual property out of those systems..."

    Continue reading
  • UK government having hard time complying with its own IR35 tax rules
    This shouldn't come as much of a surprise if you've been reading the headlines at all

    Government departments are guilty of high levels of non-compliance with the UK's off-payroll tax regime, according to a report by MPs.

    Difficulties meeting the IR35 rules, which apply to many IT contractors, in central government reflect poor implementation by Her Majesty's Revenue & Customs (HMRC) and other government bodies, the Public Accounts Committee (PAC) said.

    "Central government is spending hundreds of millions of pounds to cover tax owed for individuals wrongly assessed as self-employed. Government departments and agencies owed, or expected to owe, HMRC £263 million in 2020–21 due to incorrect administration of the rules," the report said.

    Continue reading

Biting the hand that feeds IT © 1998–2022