Brit neural net pioneer just revolutionised speech recognition all over again

Deep learning with Dr Tony Robinson


Rip it apart, cut the chaff, put it back together

"Without wanting to contribute too much to the hype around neural networks here, we had a system that was live on the Speechmatics site and the volume we were doing was going up and up. We knew we needed to get more efficient at it. We took the whole system apart, this time last year, and I set about with two bright guys I work with a task: take it all apart, work out what we need to do, stick it together in the most efficient order. Really question everything we need to do, every assumption. How much can we put in the neural networks? How much can we take away from the CPU-intensive part of it? Get rid of it as much as we can.

"Between the three of us, we came up with a new architecture for doing speech recognition. It heavily relies on neural network acoustic models and language models. We brought the memory down, and the speed up, so it was good enough to go on the phone. We put it on without too much work. But it's using only one processor core."

Even against a noisy background, the demo on an Android was stunningly accurate.

"Last year we were putting languages out every two weeks. There's 27 right now and some more are coming. We're tackling the hardest languages in the world, like Icelandic. A year ago building a new language model was an overnight job, but we've got that down."

There are several reasons why companies would want to use a speech specialist like Speechmatics, rather than Google.

"It's fun. We have so many different people coming to us with so many different needs. Everyone has understandable concerns about who's using the data for what: you want to know it's not leaving the building. We do on-premises work, halfway between cloud and the embedded stuff," Robinson says.

"We can just say: here's a copy, it's the same thing running on Android but we know you're a bank, you cannot have data leave the building, so here's something you can install on the tin and it runs the speech recognition in exactly the same way as it does with the cloud. The cloud is in many ways a shop front for us."

Speechmatics' ability to transcribe and index huge volumes of speech quickly has been noticed in finance and legal circles.

"You need to be able to unwind a financial transaction, and explain that this is the sequence of things that led up to it. A recorded conversation by itself is not good to you. You need to make it searachable. We're just a little tool in their grand scheme of things."

How is Speechmatics able to add new languages so quickly?

"It's the neural networks! We need some data for a particular language, but much less than you normally need, because we can pick up what we've done from other languages. How I make sounds with my mouth is quite similar to a Japanese speaker – you've got the same vocal apparatus. You're making the same sort of sounds.

"So a lot of what we've got to do is, first going from wave form, that acoustic data, is get to the phonemes, the basic sounds of the language. It isn't completely language dependent. So we can have thousands of hours from one language and a smaller amount from another, and just say tweak it a little bit."

Lowering the cost has some unexpected benefits.

"Icelandic has only about 400,000 speakers, and they're worried that their language will die out. But it's a country with only one-twentieth the population of London. If it was expensive to do, we'd never do Icelandic."

And the near future, and long-term goals?

"How can we ensure as many people as possible can use it? One of the things I like about commercial research is that people actually use it. You can publish four-page papers on your work, and people just fall asleep.

"We have released the API to our cloud version and the API to the real-time embedded one is almost ready. There are business problems to sort out – like licensing – but we want to stay the most accurate."

Even if the "neural network hype bursts, we've got a solid base of users." ®

Broader topics


Other stories you might like

  • Meet Wizard Spider, the multimillion-dollar gang behind Conti, Ryuk malware
    Russia-linked crime-as-a-service crew is rich, professional – and investing in R&D

    Analysis Wizard Spider, the Russia-linked crew behind high-profile malware Conti, Ryuk and Trickbot, has grown over the past five years into a multimillion-dollar organization that has built a corporate-like operating model, a year-long study has found.

    In a technical report this week, the folks at Prodaft, which has been tracking the cybercrime gang since 2021, outlined its own findings on Wizard Spider, supplemented by info that leaked about the Conti operation in February after the crooks publicly sided with Russia during the illegal invasion of Ukraine.

    What Prodaft found was a gang sitting on assets worth hundreds of millions of dollars funneled from multiple sophisticated malware variants. Wizard Spider, we're told, runs as a business with a complex network of subgroups and teams that target specific types of software, and has associations with other well-known miscreants, including those behind REvil and Qbot (also known as Qakbot or Pinkslipbot).

    Continue reading
  • Supreme Court urged to halt 'unconstitutional' Texas content-no-moderation law
    Everyone's entitled to a viewpoint but what's your viewpoint on what exactly is and isn't a viewpoint?

    A coalition of advocacy groups on Tuesday asked the US Supreme Court to block Texas' social media law HB 20 after the US Fifth Circuit Court of Appeals last week lifted a preliminary injunction that had kept it from taking effect.

    The Lone Star State law, which forbids large social media platforms from moderating content that's "lawful-but-awful," as advocacy group the Center for Democracy and Technology puts it, was approved last September by Governor Greg Abbott (R). It was immediately challenged in court and the judge hearing the case imposed a preliminary injunction, preventing the legislation from being enforced, on the basis that the trade groups opposing it – NetChoice and CCIA – were likely to prevail.

    But that injunction was lifted on appeal. That case continues to be litigated, but thanks to the Fifth Circuit, HB 20 can be enforced even as its constitutionality remains in dispute.

    Continue reading
  • How these crooks backdoor online shops and siphon victims' credit card info
    FBI and co blow lid off latest PHP tampering scam

    The FBI and its friends have warned businesses of crooks scraping people's credit-card details from tampered payment pages on compromised websites.

    It's an age-old problem: someone breaks into your online store and alters the code so that as your customers enter their info, copies of their data is siphoned to fraudsters to exploit. The Feds this week have detailed one such effort that reared its head lately.

    As early as September 2020, we're told, miscreants compromised at least one American company's vulnerable website from three IP addresses: 80[.]249.207.19, 80[.]82.64.211 and 80[.]249.206.197. The intruders modified the web script TempOrders.php in an attempt to inject malicious code into the checkout.php page.

    Continue reading

Biting the hand that feeds IT © 1998–2022