Two tiers to stop storage weeping: It's finally happening
Flash + object strategies
Comment Enterprises are storing much more data than they did in the past (no news here), and they are going to be storing much more than now in the next future (no news here either). About a year ago I wrote an article about the necessity for enterprises to consider a new two tier strategy based on flash and object storage technologies.
You can see the first signs of this happening ― even if we are at the beginning of a long trail. There are aspects we need take into serious account to make it really successful.
Flash, hybrid and other storage creatures
Flash memory, in all its nuances and implementations, isn't a niche technology any more and every primary data storage deal in 2015 will contain a certain amount of flash. Some will be all-flash, others will be hybrid but it can no longer be avoided.
The economics of traditional primary workloads (IOPS and latency-sensitive) running on flash memory are undeniable when compared to spinning media. But the opposite is also true: when it comes to space ($/GB), the hard disk still wins hands down.
Furthermore, another point of excellence for the hard disk is throughput or, at least, $/MB/sec. Which doesn’t mean that HDD is better than flash but, when data is correctly organised, you can stream data out of a disk very quickly and at a lower cost than flash. For example, with HDFS blocks which are huge, of the order of 128/256MB.
In the next few years flash will become more and more relevant and we will see it growing up to 10 or 20 per cent of total data storage capacity in most enterprises. This is why the right integration between flash and disk tiers will bring a lot of advantages in terms of simplification, and will definitely drive down total cost of ownership as well.
Flash and disk need to talk
Let's suppose that we are talking about a large infrastructure. In this case it wouldn't be about a single large hybrid system but a hybrid storage infrastructure made of different systems.
Primary storage could be part of a hyper-converged infrastructure or external arrays and it has all the smart data services we are now used to seeing (I mean thin provisioning, snapshots, remote replicas and so on). On the other side we could have huge object-based scale-out distributed infrastructures capable of managing several petabytes of data for all non-primary (or better, non IOPS/latency-sensitive workloads), in practice everything ranging from file services and big data, to backup and cold data (like archiving).
As you are probably aware, some vendors are already proposing systems to de-stage snapshots to a secondary storage system. For example, SolidFire has the ability to copy snapshots and data volumes directly to an S3-compatible storage and manage their retention. Something similar to what can be found on HP 3PAR systems (even if it only works with HP StoreOnce VTLs).
These kinds of mechanisms lead to a better overall efficiency in terms of space used and simplification but can also help to have more automation at the infrastructure layer without needing separate/additional software, for example; traditional backup servers. Even though some backup software can already leverage data services available in the array to do backups, I would like to see more arrays directly supporting Object Storage APIs to move data between primary and secondary systems.
I know that other array vendors are working on similar functionalities and I hope we will see more Object-enabled primary storage systems soon on the market.
They need to be more intelligent
Smart cloud-based analytics is becoming more and more common for primary storage vendors (vendors like Nimble are giving analytics a central role in their strategy, and rightly so). We can’t say the same for secondary systems (which are becoming not so secondary after all). If object-storage becomes the platform to store all the rest of our data, then analytics will become of considerably greater importance in the future.
More types of data and workloads will be concurrently managed by single large, and distributed, systems. With this in mind, it’s quite obvious we need to have a clear view of what is happening, when and why. And, of course, predictive analytics will be fundamental too.
Fortunately the first signs of change are visible this year. Cloudian, an object-storage startup, has launched a new version of its product and now it continuously collects information from installed systems which then feeds an analytics tool to help the customers. This is the first release (and I haven’t had the chance to look at it personally yet) but it is going towards the right direction, for sure.
In the future I would like to have more insights from my storage analytics than I do now. What is happening in primary storage systems should also happen, and it’s even more important here, in secondary storage systems.
Technology already shown by companies like Data Gravity would be even more interesting if applied to huge data repositories (call them “expanded data lakes” if you like) … and I can’t wait to see some startups, still in stealth mode, showing up with their software to analyze storage content.