Hands-on Microsoft has added audio transcription to the paid-for version of Word Online, the in-browser edition of its ubiquitous word processor.
Back in June 2018, Microsoft added dictation support to Word, Outlook, and OneNote online. Now the company has added transcription alongside it. Transcribing audio is a thankless but critical task for people in many professions, making the service of high potential value. A side-effect of pandemic lockdown is that more meetings and events are online and recorded, which is another reason for interest in automatic transcription.
Microsoft's new service is available now to subscription users of Word Online. The option also appears in the free version, but attempting to use it displays an invitation to "go premium". Supported browsers are Edge and Chromium, according to Redmond, but we were also able to use the feature with Firefox. Even so, supporting features only in Chromium-based browsers is a worrying development.
The company said: "You are completely unlimited in how much you can record and transcribe within Word for the Web," then added: "Currently, there is a five-hour limit per month for uploaded recordings and each uploaded recording is limited to 200MB."
The reason for the apparent contradiction is that you can transcribe without uploading by speaking into the microphone. This is different from dictation since transcriptions appear in a side panel and users add snippets of text selectively into a document as required, whereas with Dictate all the text goes straight into the main document.
The service is driven by Azure Cognitive services and the company stated: "Your audio files will be sent to Microsoft and used only to provide you with this service. When the transcription is done your audio and transcription results are not stored by our service."
English is the only supported language. This is one case where the online versions are ahead of desktop Office. Transcribe is coming to Office Mobile by the end of the year, and dictate already works in Word on Windows and Mac, but there is no promise regarding transcription.
There are many existing options for voice recognition, from Microsoft and others. Windows 10 has dictation and voice control built in as an "ease of access" feature. This can be used in Word and elsewhere, making it possible to compare the offline voice recognition with the newer equivalent powered by Cognitive Services.
Using a high-quality microphone, I tried dictating a few lines of poetry, and it was a clear win for the cloud AI. The excellent Dragon Dictate from Nuance would likely have done better, and results can be improved by voice training, but the instant accuracy of Microsoft's cloud AI was impressive.
This document compares the Windows 10 built-in voice recognition with the cloud-based voice recognition used by Word's Dictate feature, as well as the new transcription service. An easy win for the cloud
Transcription is more challenging than dictation because it is harder to ensure high-quality audio, and no training is possible. Humans speak with a great variety of accents and intonations, and perfect accuracy will never be possible; it can be difficult even for manual transcribers. We tried a couple of interviews in Word Online, and the results were still pretty good.
The key question is whether the accuracy is good enough to save time, and in our case it was. There are still plenty of things to fix. For example, "AWS" was transcribed as "a WS", and "I'm going to take some flak for this" ended up as "I'm going to take some plans for this." We recommend taking care before presuming that an automatic transcription represents what a speaker actually said.
The way it works in Word Online is that the transcription appears in short sections, labelled "Speaker 1", "Speaker 2", and so on. The user can play back each section to check the transcription, make edits, and optionally add it to the main document. It is also possible to vary the speed of the audio, from half speed to double speed, though the usefulness of this for accuracy is doubtful. There is no way to control the speed of the automatic transcription itself, which is somewhat slow, taking over an hour for a 30-minute recording (time may vary according to the clarity of the audio).
The transcript is preserved when the document is saved. Reopen it, and the uploaded recording is shown as an audio file, and the transcript reappears when the Transcription pane is shown. This is important because editing is easier when you can easily play back the original audio. Word's UI for this does fall short of that in the specialist service offered by Otter AI, and Otter does faster transcription in our experience. Otter has a freemium business model, but no longer offers audio upload in its free version.
Our initial impression is that Microsoft's new service will be useful, though results will vary greatly depending not only on source quality, but also on the vocabulary, with less accuracy when specialist terms (like AWS above) are used. ®