Breaking compression, one year at a time
DIY is hard
Sysadmin Blog Computers physically last a lot longer than vendors would like. The idea of the three-year refresh cycle is considered sacred amongst a certain crowd, but when pressed most will admit that refreshes of that nature are exceptionally rare. While we can keep equipment running for a decade or beyond, there are hidden issues in doing so of which we should all be aware.
Data centres are organic; equipment is added and removed as needed. For the small business it isn't odd to see units that are five, six or even 10 years old. Even enterprises will often have an "if it ain't broke, don't fix it" policy for various workloads. This can lead to some truly astonishing finds when picking through inventory reports.
Most sysadmins who operate in a post-refresh economy will be able to quote some well-known headline issues with keeping old gear around. Let's consider storage as a quick example.
Storage arrays are often the most tempting devices to push past refresh. The bug guys cripple-force us back on the treadmill by charging obscene amounts for replacement drives when out of support (if they supply them at all), preventing the use of standard enterprise drives, and making support contracts past the three-year mark punitive.
The storage wars changed some of this. Oh, the big storage vendors still treat their customers poorly, but there are plenty of alternatives to those out-of-date fossils. Software to make whitebox servers into storage superheroes now come in flavours, and a lot of the nanoNAS vendors have stepped up to provide resilient storage for the mid market.
Problems abound here too. RAID cards and HBAs for your whitebox servers are notorious for only supporting drives to a certain size. Once the drives you want to use for you (usually archival) whitebox storage array get too big, it's refresh time all over again, only for different reasons. The nanoNAS market has the same problem.
All of the above is fairly well known and regularly griped about. In response, software vendors have focused on making it even easier to scale up and scale out whitebox solutions. RAID cards are no longer welcome: the simpler the HBA the better. This has led to fewer hardware restrictions, only to see sweating your IT assets run up against software barriers.
Data gets fat quickly
Consider Windows Server 2008 R2. This operating system came out seven years ago. Despite this, it's a great operating system for any number of tasks, including storage. Server 2012 and later have deduplication (which Server 2008 R2 lacks), however, 2008 R2 remains a stable and reliable sysadmin favourite.
Back in the day it was fairly simple to build Server 2008 R2 storage units around chassis like Supermicro's 6047R-E1R72L (72 SATA drives) or the 6047R-E1R36L (36 Drives). I have a number of these boxes of madness out there and at first blush they make great archival units. With 2TB Drives you can get 60TB out of a 6047R-E1R36L (3x 12-drive RAID 6 arrays) or 120TB out of a 6047R-E1R72L (6x 12-drive RAID 6 arrays).
Seven years ago that was a decent amount of storage for the price and rack space. (Remember: back then, Windows Server wasn't ruinously expensive.) Today, I could easily double that just by swapping in 4TB drives, and I'd cheerfully get another five years out of those systems, however, the operating system would stop receiving updates long before the hardware gave out.
The wrench in the works is that what I am storing on those servers today isn't the same as what I was storing on them seven years ago. Seven years ago it would have been pretty abnormal to, for example, be storing backups of 500GB+ VMs. Today I find myself periodically archiving off a copy of an entire production file server which is a 16TB VM broken into 2TB drives lashed together at the OS level. These "virtual file servers" work wonders in a hyperconverged environment.
7 years ago the 200GB "operating system drive" of that 16TB VM would have been considered a large VM in its own right. Today, I don't even think about that 200GB disk when talking about the file server; 200GB virtual drives are almost functionally an irrelevance.
In an effort to stave off the purchase of those 4TB drives, software compression was enabled a few years ago. The archival servers do nothing all day, and their CPUs go unutilized. Compression on cold storage wasn't thought to be a problem.
It turns out that you really shouldn't enable compression on folders in which you intend to dump gigantic VMs. It leads to interesting scenarios where, for example, attempting to copy over files simply stops at some arbitrary point and neither fails nor proceeds.
We could have this same conversation about weirdness regarding deduplication in Server 2012 R2, and the ways in which every operating system/file system behaves when you start hitting a few hundred million files is worth a weekend symposium to discuss.
Over the course of seven years what the archive nodes were used for hasn't changed. Deep and cheap archival storage, a place to store an on-site copy of backups, and a place to stage VMs when converting them, copying to testing clusters, etc. In that seven years the hardware has held up, and could easily keep going for years more.
What changed was that over seven years the magnitude of what's being done. The same systems started throwing wobblies on the same tasks not because the gear gave up, but because we have, inch by inched, been changing what we're asking them to do.
In our drive to avoid the cost and complexity of refreshes we must remember that we cannot avoid the burden of continual testing. There are no vendors quietly pushing the limits with each new version. In the post-refresh world, the burden of making sure our solutions are keeping up with our own velocity of change is on us. ®