This article is more than 1 year old
It's all got complicated: The costs of data recovery
Do the sums, folks. You won't be disappointed
Data protection companies have multiplied in the past 10 years and they are locked in a bitter battle for market share.
Before Amazon arrived on the scene, the main options were owning all links in the data protection chain or renting or leasing appliances, with or without attached offsite services.
Thanks to Amazon-style public clouds we have a third option: storage owned by someone else. This comes in three varieties: cloud provider offers a software/service combo; you provide the software and rent the cloud storage; or a third-party provider offers software and storage.
While the basic "how" of data protection has evolved little beyond these three options, there has been a proliferation of different ways to license it.
Mine, all mine
The easiest and simplest type of data protection is the one you own from start to finish. You supply two sites, you buy the hardware that fills them, you purchase the software that makes the backups do their thing and you supply the bandwidth to connect them. When your storage is full, you buy more, whether that is working storage or your data protection array.
Simple enough – except for the part where it is not. All the little details of everything intersect with all the details of everything else. And trying to figure out what you need to buy can be a mind-altering omnishambles.
How often does everything need to be backed up? Different documents from different sources have different relevance half lives, and not all databases are created equal.
The local file share with all the safety PDFs that nobody reads gets updated once a year before the auditor shows up, so you are probably fine if you don't back that up every night. But the financials database needs to be as close to real time as possible lest you irritate the people who sign your pay cheques.
What does your second site look like? If you are a small company it might look a lot like a drawer in the sysadmin's garage. If you are big enough, that second site could be a shiny data centre with all the fixings and its own guard detail. Or it could be anything in between.
What your second site is determines how you get the data there. Do you have to send a tape or hard drive by courier or have someone take it home? Can you schlep it across the internet? What's the bandwidth like between the sites?
Are you rotating tapes or using a fixed storage device? Are you using deduplication or compression at either end? What is providing this service – the storage device or the data protection software?
And how do these considerations affect the licensing of the data protection software? Is software charged per server? Per TB? Per feature, for example extra for deduplication?
Could your supplier charge you several times per server – say, one licence for an SQL instance on a server and another to back up Windows files?
Change one thing and it cascades. The costs have to be redone, the calculations rerun from the beginning. Owning the whole stack makes it easy to know what the costs are and that they won't change – but it can also lead to nervous breakdowns.
Appliances to let
And so the appliance was born. Sure, you could roll your own software, but why bother? With an appliance you can just buy a widget from a data-protection provider and get unlimited protection, features, servers and what have you.
That is until the appliance is full, and then you need another appliance, the cost of which bears little relationship to the hardware it runs on.
The appliance will almost certainly be a re-badged Supermicro affair and the actual cost per raw TB is materially insignificant. Unfortunately, data protection companies need to pay their staff too so they must add margin on top of the hardware.
In addition, you still have to worry about that pesky second site
The margin can be as high as five times the cost of the hardware, and that makes the incremental upgrade cost hard to swallow for smaller businesses.
In addition, you usually still have to worry about that pesky second site. You might choose to put the appliance on the second site and stream the data over the internet as the backups occur.
You might back up to a local appliance and sync with an offsite one, or you might have a data protection gateway that you back up to locally, which then deduplicates and compresses the data before firing it off to a second site.
And that second site could again be anything. It could be a second location your company owns, some space in a colocation facility, or perhaps a copy of the data is kicked up to a public cloud provider, either by you as a target or as part of a service offered by the appliance vendor.
Decisions, decisions. Appliances take a lot of the complexity out of calculating what you will ultimately pay but they do not eliminate all of it, and they are not necessarily cheaper than rolling your own.
Cloud provider appeal
In this context, the cloud provider solution starts to make a lot of sense, even if it is likely to cost you more than you bargained for.
In a scenario in which the cloud provider provides the data protection software rarely are you nickel and dimed for features. You get software (or a cloud gateway appliance) that does X and it will cost you Y per TB per month. There is not a lot of wiggle room on the interpretation side, nor the implementation side.
At first blush, cloud provider operated data protection services are surely a sanity-saving salve for swamped sysadmins? If only.
The problem is that you still have to get the data there. That means you are still provisioning the bandwidth to schlep everything to your second site, which in this case happens to be the cloud. And when a restore event does occur will you get back the relevant data in time to be useful?
There is also that awkward part where all your backups are now just a cracked password away from going bye-bye, or having someone downloading everything and read through all your preciouses.
Two-factor authentication is a good start, but for the critical stuff you might consider more than one cloud provider.
Also factor in that data protection storage never gets smaller. If you pay $5 this month, next month it will be $5 plus whatever was added. The month after that it will be $5 plus whatever was added plus whatever was added after that. So on and so forth forever.
Oh, and then there is the cost of restores. Cloud service providers can be somewhat asymmetrical about the cost of restoring data. Right when the chips are down and you are at your most desperate, they hit you with a restore fee that is substantially more than the cost of storage or bandwidth to get the stuff up there. Ouch.
Of course, the data protection companies saw a way here of not being driven out of business by the public cloud providers, and thus we have our last category of data protection to consider.
Public cloud storage gets cheaper when you buy in bulk and are prepared to make longish term commitments/ A file storage company like my preferred Sync.com can buy its storage from cloud providers in bulk and manage it with its own software.
Sync uses this purchasing power to provide a secure Dropbox-like and do so at ridiculous prices such as $50 a year for 500GB of storage.
Why can it do this? Because nobody uses that full 500GB, because it can compress data as it goes into the storage and because one person not using up all the bandwidth factored in to that $50 a year means someone else can go a little over.
It is sort of like the way we used to buy insurance before some git legalised tracking us 24/7 so that insurers don't actually have any risk or share the costs among subscribers.
It should be clear where I am going with this: the world's data protection companies have glommed onto public cloud storage as a means of eliminating a big whack of complexity for the customer, but are finding they can provide value by engaging in a price war. No, the data protection industry never did learn anything from history.
At the moment, most data protection companies are not quite so far along. They charge you based on the cost of storage to them, with a surcharge on top for software and support services. They typically bury the recovery costs charged by the service provider by assuming that everyone will have to recover 100 per cent of their data X number of times a year.
It's a battle
In the end, data protection companies eke out only a fixed margin above the costs of the raw cloud storage. Since they all long ago gave up on ease of use as a differentiator, and everyone more or less has the same features, you end up with a huge swathe of data protection providers that all cost about the same and deliver roughly the same product.
Since the public cloud providers don't offer a whole lot of flexibility in how they price storage, bandwidth or restore (especially if you are using "slow" storage, like Amazon's Glacier), the world's data protection companies have decided that the new battleground is to be around the methods by which they tweak that risk-sharing algorithm.
Among these providers, another Canadian company, Asigra, catches my eye. Asigra has decided to take a middle road between the sort of raw pricing you get from a cloud provider directly and the fully shared cost model used by its competitors.
Asigra examines your backup and recovery events and assigns your organisation a recovery performance score (RPS), which it measures on a scale of 0-10. The RPS is basically a ranking of what percentage of your total data you needed to recover that year.
The less you recover, the lower your cost per TB of backup storage. Only successful recoveries count towards the RPS and the single largest recovery event in each term is excluded from the calculation.
The comparison between data protection and insurance makes a lot more sense now. Asigra's approach sounds rather a lot like the grid system my province in Canada adopted to prevent some pretty ruthless discrimination by our insurance companies. It is an approach that worked out rather well in the end.
Now, some companies might freak out a bit about this model because they do recovery drills. These may require restoration of 100 per cent of the data, and wouldn't that push a company into the highest RPS bracket?
Apparently, Asigra has that covered too, with recovery drills being priced separately from production storage and recovery events and not contributing to the RPS position at all.
So the pricing model war of the cloud storage era has officially begun. I suspect that cloud backups are about to evolve from a one-size-fits-all model into an industry where you can find a financial model that is actually tailored to your company needs and to the business value. ®