This article is more than 1 year old
Hardware has never been better, but it isn't a licence for code bloat
Have as many lines as you want, just make it efficient
My iPhone 6 recently upgraded itself to iOS 11. And guess what – it's become noticeably slower. This is no surprise, of course, as it's the same on every platform known to man. The new version is slower than the old.
It's tempting to scream "code bloat" but that's not necessarily fair because new stuff usually has extra features added, which can also mean more code.
Now, I think it's fair to believe that companies such as Apple and Microsoft do try their damnedest to make their products perform as fast and efficiently as they can – although we all remember Windows Vista, 11 years old in November. Microsoft's infamous operating system went from 40 million lines of code in Windows XP to 50 million, contributing towards slow performance and bulking up the underlying hardware requirements to run it. Microsoft was back down to 40 million with Windows 7.
The same can't be said, though, for the average company doing software development – particularly for internal projects.
Code bloat can be blamed on a number of factors, including deficiencies of the language or the compiler, or the actions of the programmer.
I believe that one overriding reason for the latter is fairly simple: there's no longer a compulsion to write super-efficient code. These days we measure computer RAM in gigabytes, not kilobytes, and CPU clock speeds are in gigahertz, not megahertz. So back in the day you had to write code incredibly defensively if you were to make it work on the hideously constrained hardware available. Algorithms had to be elegant: processors were so slow that a brute-force algorithm just wasn't really an option, and with tiny amounts of RAM you had to be fastidious with data structures.
Today you've generally got more RAM and CPU cycles than you can shake a stick at. But is it acceptable to take that for granted and ease off on the efficient coding?
Well... sometimes.
And the reason's simple: everything in IT life is a compromise.
Look back to the 1990s, for instance, when Java appeared. Java brought automated garbage collection that meant you didn't have to worry about managing memory yourself, but at the expense of an increase in memory usage until the garbage collector decided something was no longer needed. On balance, though, it was worth it (and still is), as it makes development easier and software less buggy with only a modest performance hit.
But relying on an automated garbage collector isn't sloppy programming: you're just using the features that makes your life easier and your program less likely to break.
Code bloat comes when those writing the software – and I'll include here those from architect down – aren't suitably diligent or think in terms of efficiency.
How do we think, act and program with diligence and efficiency?
Let's take an example: many will have solved (or tried to solve) a Sudoku puzzle. But how did you do it? A fiver says you don't simply try every possible permutation of digits in each box until you get to the right answer – the number of Sudoku solution grids has been calculated as 6,670,903,752,021,072,936,960. No, you apply logic and deduction to identify which numbers go where, and it takes just a few minutes to solve the puzzle.
I've used Sudoku for programming tests and competitions, and the candidates who use a "brute-force" mechanism (try every combination in turn until you find the right one) always fail because I limit the run time to five minutes. A brute-force algorithm will run for hours, where a half-decent logical one will complete in a few seconds.
But take a step back for a second. I've defined "success" in a particular way – in this case based on the time taken to produce the right answer. This is a common measure, because it's what users notice most. Think: "Argh, the CRM system's really slow this morning." In modern cloud-based setups, delayed compute time really is money.
If, on the other hand, the requirement is for the program to be ready to run as soon as possible but processing time isn't critical, it may have been more appropriate to run up a nice, simple brute-force algorithm and leave it to run overnight. And the brute-force algorithm would have been easier to debug (it's fewer lines of less complex code than a clever algorithm) as well as being quicker to write. Think Hadoop or MapR-style batch processing.
Much of the issue with inefficient software is a lack of understanding about what computation is taking place. It's common to find a lack of knowledge of complexity theory in software designers and developers, because it tends to be taught only in the more theory-oriented courses.
Now, although applying a big dollop of common sense is sometimes enough, a robust understanding of how to analyse the complexity of an algorithm is essential to making one efficient – and particularly to making it scale to large data sets.
The word "complexity" isn't the best, actually. When you think of a "complex" algorithm the first thought is of how many lines of code there are, how readable it is, and so on. Computational complexity is really about the efficiency of the algorithm, not how complicated the code is that you've used to implement it. It's about how much work is done by the algorithm in producing a result, and whether there's a more efficient way you could do it.
How does it work?
The other sin that I've come across time and time again is a lack of understanding about how the underlying systems work – and I've seen it even at the highest levels of software designers and teachers of software engineering. And that's because theory and practice don’t always play well together.
For example, if you design a relational database "by the book" you'll normalise it to death so there's no duplication anywhere. In reality, though, performance will often be improved by having the occasional duplicate field to avoid a string of table joins the length of the M1. If you write a .NET program with SQL Server at the back end the "right" way to pass variables into queries is to parameterise them, but every so often you find circumstances where that performs less well than just banging together a string and firing it at the query engine.
I once had to debug a script that took hours to run despite being apparently quite trivial: it did some straightforward calculations, dumped the answer to a text file and added that text file to a Zip archive. Except that the Zip implementation would secretly, when told to add a file, unzip the archive, add the file to the resulting folder of stuff, zip everything up again, and delete the temporary files. 60,000 calculations meant 60,000 unzips and 60,000 zips of an increasingly large archive. Changing it to do a single Zip at the end took the time from two hours to five minutes: the underlying problem was that the original developer hadn't thought to consider how the Zip program worked.
Size isn't necessarily important
When we think of "fat code", or "code bloat", we instinctively believe we're talking about size – how much disk space and memory it takes. But code bloat comes in other forms, too. It might simply be too long or coded in a way that wastes.
To the user on the other side of the UI, they won't care about the root cause. What they'll notice, and care about, is the fact that it's running slow.
This then is a call to shape up, not merely to shed few pounds. It's a call to think and to design, architect and to program differently.
Many is the 18-stone individual who would feel better to be down to 15-stone. Similarly a company with the equivalent of an 18-stone program would quite like to slim it down, but it's all relative and you aren't solving the root problem. Unpicking that code will likely cost ridiculous amounts of cash, without making any significant improvement in terms of RAM and disk usage.
Time to start again. Time to replace an out-of-shape piece of software with something that's been built to be suitably responsive. It will take time but be worth it – to the user and to you, because when the time comes to maintain that code as part of the DevOps lifecycle process, it's easier to work with something that's been built efficiently and designed to be be extensible. Much better than swaddling that burdened piece of software with yet-more code wrappering, add-ons and workarounds.
No. Putting your fat code on a bit of a diet, just to try and make it run a little closer to the sub-seconds you'd prefer, isn't the way out of this. ®