StorageBod Blog Every now and then, I write a blog article that could probably get me sued, sacked or both; this started off as one of those, and has been heavily edited by myself to avoid naming names.
Software quality sucks. The "release early, release often" model appears to have permeated into every level of the IT stack; from the buggy applications to the foundational infrastructure, it appears that it is acceptable to foist beta quality code on your customers as a stable release.
Running a test team for the past few years has been eye-opening; by the time my gang gets its hands on your code, there should be no priority-one ("must fix") and very few priority-two ("should fix") bugs left to squash, but the amount of fundamentally broken code that has made it through to us is scary.
And also, running an infrastructure team is beyond scary - it's heading into realms of terror. Just to make things nice and frightening I "like" to, every now and then, search vendor patch and bug databases for terms such as "data corruption", "data loss" and other such cheery things. Don’t do this if you want to sleep well at night.
Recently I have come across wonderful phenomena such as a performance-monitoring tool that slows your system down the longer it runs; clocks that drift for no explicable reason and can lock out authentication; reboots that can take hours; non-disruptive upgrades that are only non-disruptive if run at a quiet time; errors that you should ignore most of the time but sometimes they might be real; files that disappear on renaming; and even installing fixes can be fraught with risk.
Obviously no one in their right mind ever takes a vendor's new code release straight into production; certainly your sanity needs questioning if you put into production a new product that has had less than two years of QA. Yet we are often ordered to do so.
It does leave me wondering, has software quality gone downhill? It certainly feels like it. So what are the possible reasons, especially in the realms of infrastructure?
Could it be increased complexity? Yes, infrastructure devices are trying to do more; nowhere is this more obvious than in the world of storage where capabilities and integration points have multiplied significantly. It is no longer enough to support the Fibre Channel protocol; you must support SMB, NFS, iSCSI and integrate with VMware and Hyper-V. And with VMware on pretty much a 12-month refresh cycle, it is getting tougher for vendors and users to decide which version to settle on.
The internet: how could this cause a reduction in software quality? Actually, the web as a distribution method has made it a lot easier and cheaper to release fixes; before if you had a serious bug, you would find yourself having to distribute physical media and often in the case of infrastructure, mobilising a force of engineers to upgrade software. This cost money, took time and generally you did not want to do it; it was a big hassle. Now, send out an advisory notice with a link to a download and let your customers get on with it.
Are users to blame? We are a lot more accepting of poor quality code. We are used to patching everything from our PC to our consoles, cameras and TVs, especially those of us who work in IT and find it relatively easy to do so.
Perhaps it is time to start a Slow Software Movement that focusses on delivering things right first time? ®