This article is more than 1 year old
Gazing at two-tier storage systems: What's the paradigm, Doc?
Cloud’s fundamental role in primary storage analytics assessed
Comment I’ve been talking about two-tier storage infrastructures for a while now. End users are targeting this kind of approach to cope with capacity growth and performance needs.
The basic idea is to leverage flash memory characteristics (all-flash, hybrid, hyperconvergence) on one side and implement huge storage repositories, where they can safely store all the rest (including pure trash) at the lowest possible cost, on the other. The latter is lately also referred to as a data lake.
We are finally getting there but there is something more to consider — essentially, the characteristics of these storage systems.
In either case we are going towards classic/typical storage paradigms. In fact, some of these systems are starting to understand how they are effectively used and what is stored in them.
With the help of analytics they are now building a new set of functionalities which can make a huge difference in terms of how they can be used/implemented to improve both total cost of ownership (TCO) and business.
Smarter primary storage
When it comes to primary storage, analytics is primarily used to improve TCO and to make life simpler for sysadmins. The array continuously collects tons of data from sensors that are then sent to the cloud, aggregated and organised with the goal of giving you information and insights about what is happening to your storage.
Thanks to predictive analytics these tools can open support tickets or send alarms before issues become evident. Such tools can be very helpful in a wide range of occasions ranging from troubleshooting to capacity planning.
Sometimes, the analytics tool crosses over the storage boundary. A good example of this comes from Nimble Storage, where InfoSight is now capable of analyzing data coming from the array, the network and the hypervisor.
From a certain point of view this is becoming the most interesting feature to look at when it is time to buy a new storage system and efficiency is on top of the requirement list.
The role of cloud
Cloud has a fundamental role in primary storage analytics. It has three major advantages. The first is that storage doesn’t need to waste system resources for this application, concentrating all its power to IOPS, latency and predictability.
Secondly, cloud allows the aggregation of data coming from all over the world, enabling comparisons which would otherwise be impossible to make.
And, last but not least, cloud helps to simplify the infrastructure because there is no need for a local console or analytics server.
There is however one considerable exception. DataGravity, which is developing enterprise storage for the mid market, has a peculiar architecture capable of running analytics directly in the system.
In contrast to other primary storage systems this array doesn’t focus on the infrastructure management part but primarily on stored-data analytics.
The technology developed by this company allows end users to dig into their data and produce many kinds of different insights with applications ranging from data discovery/recovery to auditing, policy compliance and security.
It’s a totally different approach which is quite difficult to find even in bigger systems and can have a great impact on both business and operations as well.