What bugs me the most? World+dog just accepts crap software resilience

Flawless applications are for time-rich people with endless cash

Opinion With Boeing's 737 Maxes grounded and its MCAS anti-stall software being patched, a high-intensity spotlight has been shone on the issue of software reliability. But putting aside whether Boeing's software is ultimately shown to be a risk factor, for some years now the industry has been sleepwalking into a tacit acceptance of unreliable software.

In my daily life, I am forced to work around dozens of software errors every day. My new Audi A5 has a passenger window which goes down at unexpected times, leaving the car vulnerable if I don't engage in a complex sequence of engine starts, locks and unlocks.

jeep cherokee

Buggy software could lock a Jeep's cruise control


To submit an international payment, my HSBC electronic banking application requires a variable number of clicks on the submit button. My Sony Bravia TV takes 10-20 seconds to respond to a keypress on its remote controller. My Intuit Quickbooks accounting software will often refuse to accept a bill payment because it pre-clears my customer field. "Did you know", the hold message on its support line is enthusiastic in telling me, "that 50 per cent of problems can be solved by clearing cookies?"

I could fill the article with the list of my regulars, but let's just pause to think about that last one. Users should never have to clear a cookie – that's an internal system function. Any problem that's solved by a general clear-all-cookies means that the software has either mis-set the cookie, misinterpreted it or failed to update or clear it (or, in the best case, provide individual means for the user to do so).

What Intuit is saying in effect is: "OK, there's a problem, but we can't afford the effort to diagnose it and we'd far prefer to just get you back up and running without making a fuss. And preferably without spending any tech support man-hours either."

The pervasiveness of that kind of attitude creates a steady drip of bad influence in the head of every software developer and development manager. Don't bother about finding that last 5 per cent of the bugs, it says. As long as you keep delivering the features that marketing want, and as long as tech support headcount is being kept under control, that's good enough. And by now, consumers of software are so used to the results that we barely even complain.

Don't blame the techies!

The key problem here isn't a technical one: rather, it lies in the economic and contractual world. Software developers know how to create and deploy software for which extremely low bug counts are guaranteed. The problem is that it's time consuming to do so, so the reliable product is significantly more expensive to produce – to the point of development costs being doubled, trebled or more.

top gun

F-35s failed 'scramble test' because of buggy software


Add in the training costs of bringing all your developers to that level of competence and this constitutes major expenditure. But what is the payback?

The thing we don't have is the means to allow our customers to assess effectively the level to which we have taken care to avoid bugs. Sure, there are industry standards for steps that should be taken to ensure high software reliability: code review (every line of code is stared at by an expert who isn't the author) and regression testing (new code is run through a rigorous process to ensure that it's free from unexpected side-effects).

If one implements the spirit of those standards with skill and flair, that results in a good product. But it's also possible to treat the standards as a box-ticking exercise. If the software is being written in response to a contract tender, the box-tickers will look the same on paper as the good guys, and they will win the tender because they will be cheaper.

We need to get ourselves into the situation where normal practice is for consumers who suffer bugs to report them to the developers, whose normal practice is to diagnose and fix them.

If zero tolerance of bugs becomes the new normal, software will initially become more expensive. But it will be massively worth it. And in the long run, costs will come down as we spend less time on field support and as the incentives increase for rewriting some of our older legacy code.

There's one area, though, where genuine technical advances are needed, and that's in the diagnosis of errors made by AI algorithms. The combination of recent machine-learning systems and new high-performance hardware yields some amazing results, not least in the field of self-driving vehicles.

The problem is that the underlying algorithms often lack any means of providing an audit trail of how they arrived at their decision, at least not in any form that can be picked apart by a human. If the car swerves and hits a pedestrian because (for example) the shape of some brightly lit debris in the road provided a better match for its "this looks like a person" profile than the dimly lit real person by the side of the road, we want, desperately, to ask the algorithm: "Why did you decide this?"

But if the answer lies in thousands of iterations of matrix arithmetic with thousands of coefficients, themselves derived from an even lengthier training process, it's extremely difficult to know how to turn that answer into something one can act on to reduce the likelihood of a recurrence. Or even to explain to the victim's family how it happened.

These two problems, the inscrutability of machine-learning algorithms and the unwillingness or inability of those who pay for software to distinguish between proper reliability and box-ticking, will only get worse. The software industry and the organisations who purchase software – and that means almost all of us, small businesses as much as big public sector organisations – need to improve our act drastically. That includes putting zero tolerance of bugs ahead of our desires for ever snazzier features. ®

David Karlin's technology career started with semiconductor engineering in Silicon Valley. Since then he has held engineering roles (at Sinclair Research and Sage UK, where he was CTO), managing director roles (also at Sage and at audio makers Harman International) as well as long spells running his own startup ventures. David currently spans both sides of the divide – software writing and general management, as well as writing about music at classical music website Bachtrack, which he and his wife Alison Karlin co-founded in 2008.

Similar topics

Other stories you might like

Biting the hand that feeds IT © 1998–2021