Stealthy UK startup drops veil on next frontier of speech wizardry

The power's in the phone, not the cloud


If you've been amazed by Amazon's Alexa, Microsoft's Cortana and Google Assistant, you might think continuous speech recognition is done and dusted – and that there are no mountains left to climb. However, a young British company has developed a radical new approach with spectacular results, based on low-level signal processing.

Unlike speech-to-text products, Eloqute analyses speech habits in real time. The result is an educational tool designed to improve an English* speaker's pronunciation – something with a huge and growing market as business travellers seek to impress their clients, and more call centres use non-native English speakers.

The software notches up several technical firsts: giving the user real-time prioritised feedback as they speak, and the ability for the speaker to use any text they want, rather than stock phrases. Remarkably, it will perform this magic on a client device such as a phone.

Johnny Cab

Total recog: British AI makes universal speech breakthrough

READ MORE

"It's difficult and expensive for a non-native English speaker to improve their pronunciation beyond a certain point. Repetitive home learning doesn't work very well, and beyond that, their only option is expensive private tuition," Speech Engineering Ltd's (SEL) Matthew Karas told us.

It's demoralising to be asked to repeat the phrases Linguaphone gives you, he said, so many who start using the software give up. Eloqute spots the most striking pronunciation errors first, and then, via simple targeted advice, prioritises the skills which have most impact on intelligibility. No other software, Karas said, focuses on identifying habits rather than individual errors.

Eloqute is the first commercial product from SEL, whose Karas and Josh Greifer both have storied backgrounds. Karas built the world's first industrial-strength CMS for the BBC News skunkworks – a story we told here – then founded a speech-recognition startup, sold to Mike Lynch's Autonomy in 2003.

After a period as a games programmer in the '80s, Greifer went to work for Charlie Steinberg, writing the audio parts of Cubase. That might not seem relevant right away, but it is: some major technical breakthroughs came when Eloqute's creators started to work where computational linguists traditionally fear to tread – down in the waveform.

"Language technologists can be scared of low-level, real-time signal processing, so they usually get the OS to handle it," Karas explained. "To get something like Cubase to work, you have to guarantee very low latency – musicians need to hear what they're playing soon enough for it to feel instantaneous, while remaining in sync with the backing, and applying effects, mix automation etc."

Compute - scale by smartphone

Greifer's familiarity with complex low-latency processes turned out to be important when they realised the cost of server-based delivery.

"Streaming speech from 300 million learners to the cloud does not scale nicely."

So they started work on a platform which can optimise any combination of speech analysis algorithms on the phone, sometimes achieving 100-fold improvements on legacy techniques. There are big implications of what SEL's underlying platform does that is far broader than computational linguistics.

"We can switch between different configurations of complex processes a hundred times second. We did this to scale our language app, but now that we have the platform, it could be used for things like on-phone video processing – or even speech recognition."

In plain English, SEL is using the immense and untapped processing power of client devices such as phones to do more, and Eloqute is just the first example. Today's phones have eight or 12 cores sitting idle most of the time. What's exciting is that by applying that power selectively every a few milliseconds, a humble phone can perform better than a company with a vast investment in server farms: an Amazon, Facebook or Google.

But for Karas, the most appealing feature is freeing the learner from soul-destroying rote learning.

"Eloqute will help you if you're rehearsing a conference speech, or reading a bedtime story to your kids. This has more benefits than staying motivated: learners don't expose their bad habits when being tested on a fragment of speech, and they don't form new good habits by parroting phrases."

SEL is launching Eloqute via traditional teachers first: "We are talking to large classroom-based operators, like Education First and Apollo English in Vietnam. They are serious about getting results because they come face-to-face with students, and they really know how to analyse learning outcomes." ®

* The product supports English for now, but should be able to adapt to other language models fairly easily. "The tech is completely language neutral - even tonal languages like Chinese and Vietnamese would be possible," Karas told us. "However the market for English is bigger than all the others put together. We will probably use tone and rhythm to teach better English pronunciation in future, before we'd ever get onto other languages."


Other stories you might like

  • Protonmail celebrates Swiss court victory exempting it from telco data retention laws

    Doesn't stop local courts' surveillance orders, though

    Encrypted email provider Protonmail has hailed a recent Swiss legal ruling as a "victory for privacy," after winning a lawsuit that sees it exempted from data retention laws in the mountainous realm.

    Referring to a previous ruling that exempted instant messaging services from data capture and storage laws, the Protonmail team said this week: "Together, these two rulings are a victory for privacy in Switzerland as many Swiss companies are now exempted from handing over certain user information in response to Swiss legal orders."

    Switzerland's Federal Administrative Court ruled on October 22 that email providers in Switzerland are not considered telecommunications providers under Swiss law, thereby removing them from the scope of data retention requirements imposed on telcos.

    Continue reading
  • Japan picks AWS and Google for first gov cloud push

    Local players passed over for Digital Agency’s first project

    Japan's Digital Agency has picked Amazon Web Services and Google Cloud for its first big reform push.

    The Agency started operations in September 2021, years after efforts like the UK's Government Digital Service (GDS) or Australia's Digital Transformation Agency (DTA). The body was a signature reform initiated by Prime Minister Yoshihide Suga, who spent his year-long stint in the top job trying to curb Japan's reliance on paper documents, manual processes, and faxes. Japan's many government agencies also operated their websites independently of each other, most with their own design and interface.

    The new Agency therefore has a remit to "cut across all ministries" and "provide services that are driven not toward ministries, agency, laws, or systems, but toward users and to improve user-experience".

    Continue reading
  • Singaporean minister touts internet 'kill switch' that finds kids reading net nasties and cuts 'em off ASAP

    Fancies a real-time crowdsourced content rating scheme too

    A Minister in the Singapore government has suggested the creation of an internet kill switch that would prevent minors from reading questionable material online – perhaps using ratings of content created in real time by crowdsourced contributors.

    "The post-COVID world will bring new challenges globally, including to us in the security arena," said Minister for Defence Dr Ng Eng Hen at a Tuesday ceremony to award the city-state's 2021 Defense Technology Prize.

    "For operations, the SAF (Singapore Armed Force) has to expand its capabilities in the digital domain. Whether for administrative or operational purposes, I think that we will need to leverage technology to the maximum," he declared.

    Continue reading

Biting the hand that feeds IT © 1998–2021