One man and his dog and his laptop
Voice recognition in the field
Review Quocirca's man in the field, Jon Collins, takes a hands-on look at voice recognition. Armed with nothing more than a laptop in a rucksack and a dog, he gamely battles to bring you the following:
This is an attempt to dictate a document using voice recognition. I am doing this slightly differently to how I would normally. For a start, I am also walking the dog. Therefore, it is not possible to look at my computer in the normal way, so I have put it in my rucksack. This makes it impossible to see the screen so I have had to resort to other means. I am therefore viewing my laptop screen via a wireless network connection from my Dell Axim PDA.
From there, I am using a VNC (Virtual Network computing) software client to create a remote display of the laptop screen, on the PDA. When I require some mouse control, I have a Hand Track portable trackerball from Trust, or I can also use the touch screen of the PDA to control where I am on the laptop screen. The voice recognition software is Dragon NaturallySpeaking XP, running on Windows 98.
And, guess what. It all works - well, mostly.
This is by no means a perfect solution. The laptop is one of the first Sony PictureBooks, running only a 400 MHz processor with 128 Meg of RAM and Windows 98. All the same, it would appear to be adequate for the job. I would rather not have to walk around staring at a PDA, but it isn't that much trouble, the occasional glance is enough.
The microphone I am using works well, as long as there is no wind. I confess that for this unplanned test I am not using a noise-reducing headset, but rather a cheaper Plantronics model that clips to my ear. For some reason, when the breeze picks up, the recognition starts to favour the words inward, wooden and women. Why exactly this is, I leave to your imagination, but I do know that I have achieved better results with a noise-reducing model.
Not all of us would want to be staggering along with a laptop in their backpack in order to dictate an article, but this is clearly a possibility, and it does the job. It wasn’t easy to set things up – getting a peer to peer network between two wireless cards took an age (until I found an undocumented checkbox), and then there was the mucking around with the temperamental VNC client to make the screen viewable on the PDA. Everything came together in the end, but it wasn’t a job for the faint-hearted.
Mobilizing the masses
Perhaps what all of this illustrates is the power of integration, or the lack of it with mainstream vendors. If I could get things set up with old technology, why exactly have the big IT companies been unable to bring such capabilities to market? While Microsoft and Intel still struggle to deliver the perfect tablet PC with integrated voice recognition, an old PC with an old operating system and an old version of a software package were perfectly capable. Equally, while network operators and equipment vendors try to tackle the concepts of “mobility”, trying to turn it into some distant target that will make a great deal of money for whoever can crack the code, they missed the point.
For the past five years, there have existed opportunities to mobilize the masses, and they didn’t require multi-billion, high-bandwidth infrastructures. Not everyone is going to want to use voice recognition, but let’s face it – the idea of people walking about chattering into space is no longer as unnerving as it was. And what if – just imagine – voice recognition turns out to be the missing piece in the entire mobility puzzle? Not that we should all be lugging laptops around, but many of us are doing this anyway.
Ultimately, if it all boils down to integration, the biggest problem is that nobody is doing the integrating. There are lots of options out there – IBM has a version of its own recognition package ViaVoice that runs on Linux, so there would be nothing to stop someone porting it to a Sharp Zaurus PDA (though, truth be told, users of ViaVoice in general have met with varying levels of success). The PDA device I have in my hand has a processor equally powerful to the laptop in my backpack, at least if the clock speed is anything to go by. Perhaps a smaller laptop (there’s some great ones available in Japan), with a Bluetooth-integrated remote screen and microphone, rather than VNC over Wi-Fi? Great theory, but as anyone who’s tried to connect a Bluetooth headset to a computer will tell you, it just ain’t happening at the moment. There are lots of options, but each has to be tried and tested. Even if things did work as they should, the mass market of punters won’t be spending the time using computer equipment like Lego sets, and nor should they have to.
Perhaps things have been moving too fast for even the vendors to stop and think. In our struggle to look for the latest and greatest gadgets and (and I confess, I have reverted from my new Nokia 6600 phone to my old 6310i because it was better at the basic job of making calls), it is possible to take our eyes off the ball. Or perhaps – but surely not – there is something more insidious going on here – the big guns don’t want us to have such capabilities just yet? A bit like dodgy accounting practices, maybe they prefer to spread out innovations over a number of years?
Before the conspiracy theorists pick this up and run with it, they should recognize that the truth is a little more mundane – driven by fear and greed, even the biggest companies are still insisting on following technological rainbows rather than making existing products work together as they should. Networking with Bluetooth is a good example - rather than trying to fix existing “standards” they are already pursuing the next generation. Ultra Wide Band (UWB) will begin to appear next year (100Mb/s bandwidth to start going to 400Mb/s), not to mention the short-distance Wi-Fi version that's just been announced – hopefully somebody will treat the issue of compatibility at the outset, rather than leaving it down to the consumer to fix yet again.
That’s not to say that new developments won’t be very welcome. Meanwhile, as I walk along watching the sunset, my faithful mutt off in some bushes, I think to myself how this was, without doubt, one of the most enjoyable experiences I have ever had writing an article. If this is the future of portable computing, I can’t wait.
Making the effort
So, given a bit of effort and a few old components, it is possible to start using technology in new ways. Clearly however, just because something works, that doesn’t mean anyone will want it, or be prepared to pay for it. There are a number of marketing criteria that the big guns like to apply, which boil down to basic questions such as – is it useful, is it usable, is it affordable and is it desirable?
Before considering usefulness, I thought I’d tackle usability. The solution previously described had a large number of dependencies (you’d need both a PC and a PDA, for example) so I wanted to shorten the list a little. Picking up the mantle of integrator could be fun, I thought to myself as I browsed for head-mounted displays on the Web. A couple of phone calls later and I was heading to London-based high-tech reseller Inition for a visit. What had particularly caught my eye was a tiny screen (from a company called MicroOptical) that could be mounted on a pair of glasses. Inition had some other products that would slip comfortably into the “cool stuff” category, such as laptops with 3D displays, VR gloves and so on, but I tried not to get too distracted.
The micro screen plugged into a standard VGA port on my laptop, and was self-powered from a camcorder battery, so within a short period of time I was ready to go. Frankly I was worried that the experience might be an anticlimax (“two hours on the train to London – for this?”) until, like Joe 90, I put on the glasses and my world was transformed. There, in the corner of my vision, was a computer screen. Small but adequate, it floated in space like real life with a picture in picture setting, which I suppose was exactly what it was. Five seconds later I had clipped on my microphone and I was dictating into the computer in my hand. A few seconds more and I could be browsing the Web, sending and receiving email, checking the traffic news or buying a pizza – I knew this as I had already played with the voice commands available in Dragon NaturallySpeaking, and I’d found them comprehensive enough.
The little neon sign flickered on in my head – you know the one, bearing the words: “I want one”. I was sold. The whole experience gave me the impression that nothing would ever be the same again – once I could afford such a gadget, that is. But, was it really useful? The good people at Inition told me some of the reasons people used their displays – orchestral conductors reading music, surgeons consulting manuals – but the display/recognition combination seemed to have a more profound value.
As I used the voice/display combination, it felt immediately apparent that this was not some niche application, but a core productivity tool. Consider for example, auditors and surveyors who create reports containing their observations. Surveyors, for example, already use voice recognition, however they usually use some intermediary recorder, which then requires to be played back and edited. How much faster could things be done if the report could be created, edited and delivered within minutes of the observations being noted?
Indeed, there are plenty of workers who combine a dependency on the written word, with a reality that they are not always in front of a keyboard-driven computer. Meeting rooms and the corridors of power, not to mention airport lounges, planes, trains and automobiles, all so much dead time spent in transit, couldn’t this be better spent? To give you an example – following one meeting I used a twenty minute walk back to the station, to collate my thoughts and send some immediate feedback. Had I not had such a facility, the feedback would have been a couple of days, if it had happened at all.
Alternative input mechanisms
While it is clear that the computer keyboard will not be going away, equally, other input mechanisms remain largely unexploited. There are potential issues – is it safe to dictate while driving, for example, what of the eye strain, and perhaps people need empty time to keep on top of stress – but few would deny there are moments when moving from one place to another that we would love to be doing something more useful. I once told someone I was writing a report on when I was sitting at the beach at Nice. They said to me, of all the things to do on the beach at Nice, you write a report. I replied, of all the places to be writing a report, and where would you rather be but on the beach at Nice!
That’s usefulness, usability and even desirability covered to an extent, but then comes the question of cost. At £1,200 a pop, head mounted displays are not going to hit the mainstream anytime soon. There are cheaper versions, but this is just one component: it is the integrated package that needs to be delivered at a reasonable price. iPod sales would suggest this needs to be the sub-£500 mark before any such package would register on peoples’ radar.
If integration is the answer, then somebody needs to start integrating, and getting products out to the early adopters. This is of course the model applied by consumer networking companies such as LinkSys, as well as credit card companies such as MBNA. The issue is not whether it is the best product, but to get as many potentially useful products to market as cheaply as possible. In this way, the market can decide which are worth having and which are not. It should be possible to do it with old technology – indeed, given the bloated size of Windows XP, newer technology would push the hardware requirement back into the unaffordable so we’d be better with the old.
Meanwhile, Microsoft has tried to achieve something similar with their tablet PC specification, but clearly something went wrong there. If this article illustrates anything, it is that we do not need some new and improved spec; instead, design shops should be concentrating on integrating what exists, and delivering it in a package that thinks more about function than form. Once this delivers a package at a price people can afford, then we might see a major advance in voice recognition use, and with it significant gains in productivity. All it needs is for the industry to get its act together.