This article is more than 1 year old

Two tiers to stop storage weeping: It's finally happening

Flash + object strategies

They need to be application-aware and data-aware

One more important aspect is application-awareness. Some primary storage systems know when they are working with a particular database or hypervisor (just to bring up a couple of examples), and they enable specific performance profiles or features to offload servers from doing some heavy tasks (VMWARE vSphere Storage APIs are a vivid example here).

We need similar functionalities on secondary object-based storage too, but in this case it is necessary to climb up the stack. It’s not only about being aware of the application but also about how the application works with data. Fortunately, object-based storage systems provide rich metadata capabilities which can be leveraged to build a lot of stuff and make gateways or native interfaces much cleverer than they actually are.

For example, in a big data environment, during the collection-preparation-analysis process, metadata could be useful to tag data and offload some basic operations to the storage system during the preparatory stage.

This is not easy to achieve but it could enrich data during the ingestion process and open new opportunities for other applications. Searching and big data analytics are the first types of applications that come to my mind, and it is very interesting to see some object storage vendors evolving from being a cold (big) data archive/repository to an active storage archive, now capable of being used instead of HDFS in conjunction with a (diskless) Hadoop cluster.

Scality demonstrated this recently and Cloudian raised the bar yesterday with the announcement of its Hyperstor 5.1, certified to work with Hortonworks’ HDP. This leads back to what I wrote a while ago about the use of containers, flash and objects to build next generation data-driven infrastructures.

Closing the circle

This post stands somewhere between predictions and wishful thinking. But many of the necessary pieces to build the complete picture are ready or in development. However, connecting all the dots will take time, probably somewhere in the range of four to five years.

Fast, application-aware primary storage on one side and highly reliable, distributed and data-aware secondary storage on the other… capable of talking together ― isn’t it exciting?

It the meantime, you can already check startups like Primary Data. It is working on products that will make most of the stuff I talked about in the first part of this article already possible, up to a certain level at least. It adds a virtualisation layer on top (and some complexity) but it also has the big advantage of abstracting the entire storage infrastructure and making it software-defined for real. ®

[Disclaimer: I recently did some work for Scality and Cloudian.]

More about

TIP US OFF

Send us news


Other stories you might like