This article is more than 1 year old
Want to spoil your favourite storage vendor's day? Buy cloud
Leaving the premises might just work
Organisations continue to buy storage. In fact, I was talking to a storage salesman not so long ago who was telling me that one of his customers regularly calls asking for a quote for “a couple more petabytes."
However, on-premises storage is not the end of the story. Yes, you need to have storage electronically close (with minimal latency) to your servers, but procuring on-premises storage needs more than cash. It needs power, space and support.
You can't keep buying more and more storage because power and data centre space are extremely limited.
And, even if your data centre does have the space, you often can't get the new cabinets next to your existing ones so you end up dotting your kit all over the building (with the interconnect fun that implies).
If part of your data storage requirement can live with being offline then you have the option of writing it to tape – which, in turn, brings the problem of managing a cabinet or two full of tapes.
Leaving aside the fact that they degrade over time if not kept properly, there's always the issue with tape technology marching on (which means you have to hang onto your old tape drives and keep them working, just in case).
Throw it somewhere else?
So is there mileage in putting your data somewhere else – specifically in the cloud? In a word, “yes”. To take just one of many possible examples, Amazon's Glacier storage costs one US cent per GB per month, which means you can keep 100TB for a year for a shade over £7,500 per annum.
Now, in my book that's pretty good value, even if you do have to live with the performance and accessibility downsides — and don't forget, you also pay for shoving data in and out — though the charges are pretty much negligible with Glacier.
That's all very well for archive storage, but what about cloud-based disk you can access interactively – so it's a true disaster-recovery option that lets you access your storage at will in instances where (for example) your primary storage has turned up its toes?
Well, for the same 100TB of storage you'd be looking at a smidge over £18,000 on Amazon for their reduced-redundancy option - which, presumably, is fine as it's your secondary and you have a live copy.
During a three-year period that's likely to work out to a fair chunk more than a decent quality on-premises offering, but remember of course that with the latter you have the additional overhead of feeding, watering, monitoring, maintaining and upgrading it.
I reckon the cloud's just about affordable with regard to storage space, then, so what about actually connecting your world to it? I'm not going to go into physical connectivity – put simply, you'll probably use a VPN – but instead will focus on the mechanisms that you can layer on top once you've got A connected to B.
Sellers can't ignore new markets...
Vendors of on-premises storage are unsurprisingly also looking to sell you stuff that will enable you to use cloud storage: after all, given that they're not getting revenue from flogging disks to you, they may as well find ways of extracting your cash by selling cloud-enabling products.
One of the most famous, primarily through its early entry into the market, was RiverBed's Whitewater – now called AltaVault cloud storage optimiser, though these days there's a wider range of offerings – more about one of those later.
What do these systems actually do? Well, the two primary functions are: (a) local caching of the recently used data that's going in and out to the cloud so that the average access time is way shorter than it would be for direct interaction with the cloud; and (b) reducing, either by compressing or de-duping, the volume of data flying over the WAN between on-prem and the cloud.
Of course, if you're doing compression or de-duping then both ends of the link need to know how to chant the right incantations; these days the storage APIs for the leading cloud services make this a relatively straightforward thing for the vendors to achieve, which means they can concentrate their research and development effort on optimising the caching aspects of the product.
What do we mean by “secondary?”
Secondary storage might simply mean a duplicate copy of your core data, which you retain in the cloud in case the primary entity is corrupted, deleted or destroyed. You have choices of how you get the data to the cloud, depending on how immediately accessible you want it:
- Backups: instead of using local disk or tape drives you point your backup software or appliances at the cloud storage area. This is fine if you'll only need to pull back lost files occasionally and you don't mind having to do file restores on an ad-hoc basis via the backup application's GUI
- File-level copies: you replicate data to the cloud storage using a package that spots new files and changes and replicates in near real time (if you've ever used Google Drive on your desktop, you'll know the kind of thing I mean, but we're talking about the fileserver-level equivalent in this context)
- Application-level: you run your apps in active/passive mode using their inherent replication features – for instance a MySQL master on-prem and an equivalent slave in a VM in the cloud. Actually, this isn't really storage replication, as the data flying around is application data, not filesystem data
The second of these three is the common desire: a near-real-time remote copy of large lumps of data. Yes, you'll often have a bit of the other two but these (particularly app-level replication) tend to represent a minority of the data you're shuffling.
Secondary = temporary?
The other side of secondary storage in the cloud is where you're one of those companies that genuinely uses the cloud for short-term, high-capacity compute requirements.
One of the benefits the cloud vendors love to proclaim from atop a convenient mountain, of course, is the idea of pay-for-what-you-use scenarios: running up loads of compute power for a short-term, high-power task then running it down again.
Does the average business care? Nah – all this stuff about: “Oh, you can hike up your finance server's power at year-end then turn it down again” is a load of old tosh in most cases. But there are in fact plenty of companies out there with big, occasional requirements – biotech stuff, video rendering, weather simulation, and so on – so real examples are far from non-existent.
So, actually, your secondary storage might only be a holding pen for occasional processing tasks: you chuck the source data onto the cloud, run the processing, then hoover the results back before blowing away the remote storage volumes.