Eee by gum! Aye up, Microsoft, what's tha y' got? Cloud for accents?
Sorry – t'cloud for accents?
Microsoft has build a cloud service for applications so that software can attempt to understand specialist vocabularies and cope with dialects and accents.
Speech recognition works better if the algorithms can pick from a limited range of possible words and phrases, rather than attempting to recognise everything. Microsoft's new Custom Speech Service, which is part of the corporation's Cognitive Services suite, lets you upload examples of what you expect users to say. You can also assign different weightings, so that when the system is choosing between two possible interpretations of someone's speech input, it can be guided about which is more likely. This is called a Custom Language Model.
The service also supports custom acoustic models, which you create by uploading sound files together with their transcriptions. This is one way of training the system to deal with accents and dialects.
This kind of customisation makes a huge difference to the likely success of applications that support speech input.
Microsoft also announced that two services already in preview will be generally available in March. These are the Content Moderator, for detecting profanity and porn in text, video and images, and the Bing Speech API, for generic speech-to-text and text-to-speech services.
The Custom Speech Service uses the same API as the Bing Speech Service. Usage is a matter of first configuring your service on Microsoft Azure, and then calling it from a REST API or using a client library, available for .NET, Java (for Android) and Objective C.
Importing the speech recognition library into an application in Visual Studio - note the reference to Project Oxford, the code name for Cognitive Services
Microsoft has been hyping its Cognitive Services, which now include 25 different APIs, for some time. In principle, there is plenty of potential for applications that support new kinds of interaction and automate tasks which would otherwise require human intervention.
The reality often falls short, though. At various events Microsoft has demonstrated a machine that guesses your age; I find I can take ten years off by removing my glasses. It has also shown a crude emotion detector, which is easily fooled by fake smiles or frowns.
Some customer support lines now use speech recognition to automate routing your call to the right person; it is often no better and sometimes worse than the old method of "press 1" for this and "press 2" for that.
The technology is improving, though, and voice-powered services such as Apple's Siri, Amazon's Alexa, Google Now and Microsoft Cortana have done a lot to familiarise users with what is possible.
Supporting voice control in an application without the use of a cloud service would be impossibly hard. Microsoft's API makes it relatively easy.
The Custom Speech Service is free for one concurrent request and up to 5,000 per month. After that it costs from $11.29 per day. Full details of the service and pricing are here. ®