Lost voices, ignored words: Apple's speech recognition needs urgent reform
A plea for improvements for disabled people who rely on accessibility features
Opinion As someone who relies on Apple's Voice Control application to dictate, navigate, and interact with my iPhone and Mac via my voice due to a severe physical disability, I can't help but feel both grateful for its existence and frustrated by its shortcomings.
Apple has made commendable progress this year with accessibility updates generally across iOS 17 and macOS Sonoma, introducing features like the new Personal Voice app, Siri improvements, and Live Speech. They are features I am trying in beta testing now which are set to launch in the autumn.
While these advancements are impressive, I can't ignore one area where the company is failing its disabled users – Voice Control and its struggle with accuracy, proper noun recognition, and grammar.
Voice Control: More than just an app?
Voice Control was unveiled to much fanfare on stage at Apple's WWDC developer conference in 2019. It was fortunate timing as Mac users had just been burned after Nuance, the leading player in voice dictation, dropped its Dragon speech to text software from the Mac, frustrated – it was rumored – with the strict limits Apple sets on its APIs.
Peter Hamlin is a rehabilitation engineer with the NHS and one of his jobs is to help set up disabled people with voice dictation hardware and software. He told me:
Although it has never been confirmed why Nuance discontinued Dragon Professional Individual for Mac, the Apple Mac platform has been poorer without its own version of Dragon capable of installing directly on macOS.
The situation has particularly affected users with a severe physical disability, and some of those Mac users who relied on Dragon prior to 2018 have voted with their feet by continuing to use Dragon in Windows running on Parallels on a Mac.
Although using Dragon on Parallels is not an entirely "seamless" experience, it does clearly demonstrate that Apple has thus far failed to create a satisfactory voice control environment within macOS such that users with a severe physical disability feel they no longer require access to Dragon.
Voice Control is more than just another app for those with severe disabilities. Some rely on voice tech for everything they do on a computer or smartphone. For people like me, being able to dictate accurately, and control my Apple devices by voice, can make or break my day. It's much more than a convenience – voice dictation tech is my line to the outside world.
Back in 2019, when Voice Control launched, iPhone and Mac users were optimistic that Apple finally had its own baked-in voice app that could only get better. Sadly, that early optimism has been eroded as the app has been shown little attention by Apple since, except for two minor editing feature updates. Ironically, perhaps this was as sure a sign as any of how certain Apple is you will need editing tools to clear up the app's frequent dictation fails.
Proper noun and grammar frustrations
As I dictate messages, emails, and even whole documents using Voice Control today, I often encounter the annoyance of seeing proper nouns disregarded when it comes to capitalization. Adding custom vocabulary with capital letters doesn't solve the problem, and I find myself constantly having to correct avoidable errors.
When using Voice Control, imagine you have a friend with a foreign name or encounter a company not recognized by the app. You add these proper nouns to custom vocabulary with capital letters, but when you dictate them, Voice Control always ignores the capitalization mid-sentence.
Here are some more examples of how Voice Control struggles with proper nouns. The verb "will" is often mistaken for the proper noun "Will" in Voice Control. Similarly, when referring to the fiery orb in the sky, the app may mistakenly recognize it as the UK tabloid newspaper "The Sun."
Further frustration arises from Voice Control adding unnecessary proper nouns to sentences. For instance, dictating "I will go outside" might be transcribed as "I Will go outside," or dictating "the sun is shining" could be transformed into "The Sun is shining" without your intention.
Ian Gilman is a computer programmer who has had a chronic repetitive strain injury in his wrists for decades. He told me: "As a programmer, I do my work by dictating to a human apprentice who does all my typing for me. For personal correspondence, however, I use voice recognition.
"For many years I used Dragon for Mac, but when it got discontinued, I switched to Apple's Voice Control. I use it instead of Apple's keyboard dictation because it allows me to not only dictate but also correct and edit my words with my voice. It also allows me to navigate my computer, limiting my need to use my hands for mousing or typing. I do still have use of my hands, so some mousing and typing is OK. Because of this, when I encounter recognition errors, I may use the mouse or keyboard to help me fix them. I can easily see, however, if I didn't have use of my hands at all, these glitches would be extremely frustrating!"
Gilman highlighted one of his biggest annoyances with Voice Control: "While you can dictate into any app, you can only edit text in certain apps (like TextEdit). Ideally, it should work everywhere. I generally compose things in a Voice Control compatible app called Drafts, and then copy and paste it where I need it. This is easier for me because I'm OK doing a little bit of keyboard/mouse, but I can see it being a real drag for someone who doesn't have use of their hands trying to carry on a conversation in Slack, for instance."
Shaun Preece is blind and a co-host of the acclaimed Double Tap radio show on AMI Audio. He told me for some people with certain disabilities Apple's Voice Control may be the only way to use a device: "A smartphone or computer offers the ability to shop online, check on your finances, access information or entertainment, and so much more. And for many people, it could be the only way to achieve these tasks. The importance of accessibility features such as Voice Control can't be overstated.
"Using dictation to fire off a quick email or message is convenient and most of us have come to accept those obvious dictation typos as humorous. But what if dictation was the only way you could enter text on your device? Suddenly all those dictation mistakes don't seem so funny. Instead, it can be a frustrating and time-consuming process.”
Why Voice Control needs to be more accurate
The importance of these seemingly minor annoyances becomes evident when we consider the primary users of Voice Control – individuals with severe physical disabilities. Do they truly have the physical energy and breath to correct these avoidable errors? I believe not. Many can hardly take to the keyboard to clear up errors. Accessibility should aim to simplify tasks, not complicate them.
As someone with a disability who can't take to the keyboard, I understand the significance of having Voice Control as a reliable and effortless tool that adheres to the fundamental rules of English grammar. Having to exert additional energy and effort to rectify these avoidable mistakes defeats the very purpose of accessibility. It is crucial that Voice Control caters to our needs without adding unnecessary challenges.
Mystery behind Apple's lack of dictation improvement
I have often wondered why Apple has failed to improve dictation with Voice Control over the past four years.
It's worth noting at this point that there are in fact two ways to dictate on an iPhone and Mac. Apple says keyboard dictation with Siri has been improved this year with iOS 17 and macOS Sonoma. In my testing it seems more accurate and doesn't have as many problems with misrecognition or improper capitalization.
Unfortunately, keyboard dictation with Siri, where what you dictate gets sent to Apple servers, appears not to be integrated with Voice Control dictation, which stays on device and is what accessibility users reply on. You can't use your voice to edit with keyboard dictation (or otherwise control your computer with it e.g. send a message in a messaging app like WhatsApp, open/close an app etc.). In fact, you can't use them together at all.
- Maker of Chrome extension with 300,000+ users tells of constant pressure to sell out
- Modest Apple talks up these 'incredible' advances in iOS
- Microsoft adds features to Windows 11 monthly – managing it is your problem
- Blind man sues Dell over 'inaccessible' website
It has long been rumored that Apple licenses the voice tech that powers Siri and dictation. I don't know if that is true, but it could explain why the company struggles to improve dictation accuracy in Voice Control if there is a link between keyboard dictation with Siri and the dictation technology Voice Control relies on. We also heard earlier this year about how Siri is based on old technology and how even a small update can take months to implement, frustrating Apple engineers.
There has to be a reason why Apple has failed to make any meaningful improvements in dictation in its Voice Control app with its limitations so obvious to those who rely on it. However, I am at a loss to explain why.
Personal Voice: A potential game changer for voice dictation
However, despite this unfortunate background, I believe that hope may be on the horizon in the form of the company's latest accessibility app Personal Voice and the AI (or machine learning as Apple prefers to call the technology) that underpins it.
I appreciate the potential of Personal Voice more than most as one day my disability may make me reliant on it for all my communication.
Personal Voice is a new accessibility feature in iOS 17 that allows users to create a digital copy of their voice that sounds just like them. This feature uses on-device AI to analyze your voice from 15 minutes of recorded audio clips and generates a custom voice model on your iPhone. It is aimed at people with conditions like motor neurone disease where users often lose their voice completely.
Garbled grammar issues aside, I can't help but wonder whether Personal Voice and machine learning will lead to significant improvements in voice dictation accuracy through the development of personalized speech recognition on Apple devices. If Personal Voice with machine learning gets to know every nuance of your voice to replicate it convincingly to others it is going to recognize your voice far better than the one size fits all recognition that is Voice Control and the Siri speech engine.
Preece also sees the potential of machine learning for improving the accuracy of voice dictation, saying:
I really hope that the promised machine learning improvements in iOS 17 and other Apple operating systems can give dictation the jump in accuracy it desperately needs.
While Personal Voice aims to help those who have lost their voices completely, I believe the next logical step for Apple should be to do more to support individuals who still have their voices but have non-standard speech and struggle to be understood due to weak speech, speech impediments or breathing difficulties.
It's not a small market. According to Google, 250,000,000 people are estimated to have non-standard speech. People with non-standard speech are individuals who may have trouble making their words understood.
I've experienced the benefits of more personalized speech recognition through trying out the Google Project Relate app on my Pixel phone. It has offered me the most accurate voice dictation I have ever been able to achieve on a mobile device – and that's while dictating wearing a somewhat noisy ventilator mask over my nose. The app is still a beta research app, and comes with some shortcomings, but in terms of understanding my voice and transcribing my speech into text accurately it is almost perfect.
Comparing Voice Control's dictation accuracy with the Google Project Relate app and Microsoft's Dragon product (Microsoft bought Nuance, which makes Dragon, in 2021), I can't help but notice the yawning gap that still exists. As a user with unique needs, I yearn for Voice Control to be on par with the best available options, providing users with the most accurate and efficient voice dictation experience on Apple devices like the iPhone, iPad, and Mac. At the moment, dictating with Voice Control feels akin to piloting a super tanker in a sea of thick treacle.
Promise of personalized speech recognition
I believe it's time for Apple to prioritize improving dictation with Voice Control through the development of personalized speech recognition, which will help people with non-standard speech make their voices heard. By improving accuracy through enhancing recognition generally and improving Voice Control's recognition of proper nouns and overall grammar, Apple can significantly enhance the accessibility and usability of its products for people of all speech abilities.
Apple should prioritize dictation improvements
I urge the company to continue striving for better accuracy and grammar, ensuring that Voice Control becomes an indispensable tool empowering users with severe physical disabilities to communicate effortlessly and confidently with the world, whether that's a message to Mum, a school essay or an important work project.
Gilman says he understands solving these problems can be challenging: "My wish would simply be that Apple devotes energy to improving Voice Control with each new version of macOS. With steady progress, it'll get better and better, and eventually we won't have these impediments. Without improvement, the problems will pile up until it is unusable.
"Furthermore, I would ask Apple to communicate with the Voice Control user community, so they are focusing on the highest priority issues".
Accessible technology levels the playing field and in the 21st century makes everything else possible. It's why apps like Voice Control need to progress so disabled people don't get left behind.
If keyboard dictation accuracy has been improved this year, and Voice Control dictation hasn't, that would be a bizarre set of circumstances as Voice Control is mission critical for people like me, but keyboard dictation is just a nice add-on for others who can take to the keyboard to clear up errors.
Talking of getting left behind, I am sure Apple itself will want to trump anything Google and Microsoft are able to offer their disabled users through the billions they are spending on AI at the moment.
There are no risks for Apple in making voice dictation smarter and more accurate for disabled people. There are only benefits for the company, and everyone who wants or needs to type with their voice. It's the right thing to do.
To echo what Preece told me, while as disabled people we appreciate the work and thought that goes into features such as Voice Control, we shouldn't be afraid to talk about the areas where something just isn't working as it should.®
Colin Hughes is a former BBC producer who nowadays campaigns for greater access to technology for disabled people. He has lived with the genetic muscle wasting disease muscular dystrophy all his life. He writes about issues around accessibility and technology here.