This article is more than 1 year old

Gazing at two-tier storage systems: What's the paradigm, Doc?

Cloud’s fundamental role in primary storage analytics assessed

Embedded functionalities

Trash includes a lot of things. It’s not just waste that you have to stock forever in a landfill but, in many cases, is recyclable and can bring value. The problem is having the right tool to do just that.

Scale-out storage systems are becoming much more common now, and the trend is clear: they are embedding a series of functionalities to manage, analyse and do automated operations on large amounts of data without the need for external compute resources. Most of these systems have recently started to expose HDFS to be easily integrated with Hadoop for in-place data analytics.

In other cases, for example HDS HSP, we can see the evolution of this model with the analytics part already embedded in the product, like a specialised hyper-converged platform. A solution I’m sure will be available also from others in the future.

What I also find noteworthy is the ability shown by Coho Data to run code, triggered by events, directly on cluster nodes. A solution that could become very helpful in case of data preparation and that could lead, again, to systems capable of running specific fully fledged analytics tools.

Different solutions

In any case, solutions capable of analysing data and/or metadata of huge file repositories are already available (Qumulo is one example) and some of these are now starting to demonstrate extensive search capabilities too.

At the same time, others are also taking full advantage of high resiliency characteristics of these distributed systems to implement backup features and copy management solutions to support primary storage (Cohesity is very promising here).

Swiss knives without clouds

In order to allow interaction between these systems, APIs and simple query languages will become much more common over time than we think, enabling the development of powerful, data-driven, vertical applications.

At the same time, because of the nature of the operations performed by these systems and the size of the storage infrastructure, the analytics component is always implemented on-premises. In fact, we are talking about large clusters that can use part of their resources to run specific analytic tasks.

Closing the circle

Storage is changing very quickly, traditional unified storage systems are no longer the solution to every question (and this is why companies like NetApp are no longer growing).

We are seeing an increasing demand for performance, very predictable behaviour, and specific analytics features to help attain the maximum efficiency and simplify the job of IT operations.

On the other side of the fence, we need the cheapest and most durable options to store as much as we need, but with the potential to re-use the data or make it quickly available elsewhere when needed.

Analytics is rapidly becoming the minimum common denominator to build smarter data-driven infrastructures.

The scope differs between secondary and primary storage, but the basic concepts are similar and they are all devised to carve out the most from the resources we manage (performance and capacity).

It’s not at all surprising that companies such as Nutanix, with the right technology and potential, will soon be targeting scale-out storage with specialized products. ®

More about

More about

More about

TIP US OFF

Send us news


Other stories you might like