For years we have been told that consolidation and centralisation are good things, especially in the realm of data storage.
Not only do they mean fewer boxes to manage, but with the utilisation rate of direct-attached storage running at maybe 30 per cent, moving to shared storage can allow us to dramatically prune back slack space – space that is allocated merely to leave headroom in case it is needed in the future.
But should we be asking now if consolidation has gone too far?
Is there such a thing as too much centralisation, when the risks it introduces – a single point of failure, larger failure domains – outweigh the benefits? Does the consolidated emperor still have clothes on?
Fear of consolidation is natural. When one hard disk can hold what not so many years ago required an entire rack, aggregating those multi-terabyte drives can put petabytes into a storage system – and several petabytes of data in a single box is a lot of eggs in one basket.
Face the fear
Much of this fear is visceral but the risks are very real, as some of the innovation and development under way in the storage industry demonstrate.
For examples, look at the new RAID-like techniques designed to spread data across ever-increasing numbers of hard drives, as developers strive to keep rebuild times within sensible bounds; or the use of replication and mirroring-type technologies to spread the load and remove single points of failure.
“As storage devices hold higher capacity, there is also more risk of a problem within the device,” says Mike Vildibill, vice-president of emerging technologies at DataDirect Networks, which builds storage for big data and high-performance computing.
“An obvious one is the rebuild time on a 6TB or 8TB drive – and there are larger drives coming.”
The problem is, of course, that while bigger drives mean more and cheaper storage capacity, they also mean you have to copy a lot more back onto your hot spare should a drive in a RAID group fail. That takes time, during which your no longer redundant array is vulnerable.
To make matters worse, the bandwidth onto the disk is not increasing anything like as fast as disk capacity. Not for nothing has getting data onto a multi-terabyte drive been compared to trying to suck the ocean through a straw.
Spread the load
“A common response is de-clustered RAID, which is basically taking the RAID group and distributing it across dozens or even hundreds of disks,” says Vildibill.
De-clustered RAID is analogous in many ways to some of the parallel-striped RAID techniques, but without the data stripes tied to particular drives. In some versions, what is in effect a single RAID set could be striped across every disk in an array.
Another interesting way to deal with the problem is to stop thinking about hot-spare hard drives and start thinking about hot-spare reserved disk capacity, says Frank Reichart, senior director of product marketing at Fujitsu.
"If customers have a second failure during that rebuild they will have big problems"
“Customers are facing challenges because hard drives are getting much bigger. We will have 6TB and 8TB drives in production systems soon,” he says.
“With these bigger drives, the array rebuild times are getting so long that it can even take days, depending on the workload. Customers are worried that if they have a second failure during that rebuild they will have big problems.
“So we are looking at new technologies. For example with our Fast Recovery technology we go away from having a dedicated spare disk. Instead each disk has a spare area with parallel rebuild on that spare space.”
Reichart adds that writing to all these reserved spare stripes in parallel, instead of writing to a whole new drive, can dramatically reduce rebuild times. For example, he says, a 2TB disk can be rebuilt in 90 minutes instead of six hours. Once the failed drive is replaced, the hot-spare data is copied back to it to restore redundancy.
Beyond the challenges of disk capacity and performance come the risks of centralisation.
“The challenge at the system level is if you have so many petabytes in a system, what happens if the whole thing fails? Some customers are putting storage virtualisation on top to combat this, but that's adding management complexity,” says Reichart.
A simpler route, he suggests, is a high-availability storage cluster where a second mirrored storage system on a remote site provides application-transparent failover if the primary system fails or is lost.
In Fujitsu's version, the failover can also be manually triggered, for example to allow routine maintenance or planned upgrades.
As well as adding complexity, the virtualisation route also reduces visibility, points out Steve Willson, EMEA chief technical officer at enterprise Flash specialist Violin Memory.
“In the old days, I knew which volumes were on which RAID group – there was simplicity of mapping,” he says.
“Over the last 10 years though, the abstraction layers have grown monumentally more complicated. I can't identify the physical failure domain of a storage volume in the new hybrid virtualised world, with caching, thin provisioning and so on.
“So a rebuild might impact only one LUN or it might impact every LUN on the system. You can't tell. And if you can't map logical to physical in real time, you can't take evasive action.
“You have to assume that the failure domain is the entire system, which means you either have to decide how to deal with that or simply replicate everything.”
Just to complicate things further, there is also the somewhat counter intuitive aspect that while techniques such as de-clustered RAID give you faster rebuilds, they may actually reduce the mean time to data loss.
That is because when you spread the data over a large number of disk drives, it becomes statistically more likely that another participating disk will fail during the rebuild.
And, of course, it is not just drive failures that make a large storage system a single point of risk.
“In the old days, people were nervous about losing the controllers on the front because they weren't just the interface to the storage but also did the caching,” says Willson.
“On the VNX and similar systems, if you lost a controller performance could be hideous; applications could be almost unusable. People had to design around that, for example by putting more controllers in or sizing the solution so that if need be a single controller could run your whole system – if you can afford that.”
Assuming that you want to keep the benefits of storage consolidation, such as having less to manage and doing more work with less hardware (and these days, how many bean counters will let you do less with more?) avoiding these risks is paramount.
One option, given that you should already have replicated your storage to protect against disasters, is to make those replicas earn their keep, according to Vildibill.
“In some cases it makes sense to rely on your geographical data protection, not just the in-box protection,” he says.
“For instance, it may be possible to use a remote replica as part of the rebuild. We think of that as having both local and global parity. It’s dual-parity disks in effect.”
He warns, however, that just as storage technology needed to be redesigned for RAID 5, the same is true to a certain extent for local/global parity.
“You have to be mindful of which failure modes you want protected locally and which globally. That's a design issue,” he says.
“And you need to be mindful of storage amplification. If you have one gigabyte of data, how many gigabytes do you need to protect it?
“For example a mirror makes 2GB; add a replica and it's 3GB. But you can also look at 1.4, 1.6. 1.8 and so on.”
Another way to use replication while maintaining the advantages of storage consolidation is active-active storage clustering. It can be pricey but major vendors such as EMC, IBM and NetApp have all implemented this type of thing in their systems. It can also be implemented via software such as DataCore's SANsymphony.
Again, though, you need to think carefully about which failure modes this can and can't protect against and what other processes you will need for proper data protection.
And, of course, there is the question of what does and doesn't need data replication. For example a group of universities working on genetic research might decide not to replicate the basic gene sequences because they can be re-created if needed.
However analysing those sequences can take large amounts of compute time and power, so the analysis results need a different policy as they definitely need protection.
Flash to the rescue
Could enterprise Flash reduce the physical risks associated with storage consolidation or will it make them worse?
Certainly it can help with the issue of data bandwidth, especially in areas such as high-performance computing and on parallel file systems where data must be aligned onto the disk stripes.
Don't assume, though, that it will be vastly more reliable. Yes, of course Flash has no moving parts, whereas the engineering inside a hard drive has been likened to flying a Jumbo jet a few feet off the ground.
But while Flash is still evolving, disk drive technology is now a mature 50 years old. It is pretty well understood and remarkably reliable, with drives having maybe a million hours mean time between failures.
There is also the fact that until quite recently enterprise Flash was usable only for a certain subset of application needs because it did not have the remote replication or clustering capabilities that equivalent disk arrays have had for some time.
That is changing, though, as enterprise Flash developers led by Violin Memory integrate these data services.
“People don't trust arrays, even high-availability ones. They want synchronous mirrored copies on separate assets for specific applications,” says Willson.
He adds that until recently this limited Flash to applications that needed performance so much that the customer was willing to put up with – or build around – the potential risks.
“The next wave of Flash users wants those flash performance benefits but with the availability they have been used to, so we need replication to open up the market to Flash,” he says.
Inspired by the cloud
As storage systems grow bigger, two other emerging areas of IT are experiencing some of the same challenges and could be sources of useful ideas: the cloud and big data.
“DataDirect Networks has been in very high-performance block storage, running over 1TBps into simple parallel files systems, for years,” says Vildibill.
“Then we extended our portfolio to the cloud, which implies geographical distribution and forced us to think about geographically distributed error correction too.
“Now we are realising that many of the methods we use in block arrays and in the cloud are coming together and our engineering teams are learning from each other. As arrays get bigger it's almost like continents within the box. It’s distributed data within the box.”
“As size gets so huge, physics come into play, especially at the network level,” adds Markus Pleier, vice-chairman of the Storage Network Industry Association Europe and chief technical officer at EMC EMEA.
“For example, moving a terabyte over a SAN is OK, but moving hundreds of petabytes is another question. Then it becomes a continuous migration.
“New technologies are coming up where we store vast amounts of data that is neither structured nor unstructured, and where our traditional technologies will no longer be enough. For example, object storage is totally distinct from the underlying physics.
Free to roam
“If you have cloud storage, a query can be answered by 15 different services if necessary. There is no longer a physical infrastructure behind it, and from an application point of view it's no longer interesting to see how the physics work.”
This is particularly evident with the likes of Ceph, adds Reichart. This open-source distributed object store automatically distributes replicas of data across many disks and storage nodes. It gives cloud and other service providers a way to make storage more reliable and fault tolerant without the usual vendor lock-in.
“These new software-defined approaches bring high availability by design, without the pitfalls and cost of vendor-specific HA,” he says.
“Ceph-based appliances could offer an alternative to today's RAID systems in environments that need extreme scalability. Here we could have no RAID, just replicated data, distributed within the system.” ®