Today's organisations must store ever-increasing volumes of digital information arriving in an ever-wider range of formats. At the same time, more and more data-hungry applications are coming on stream.
It is harder than ever for your storage to provide each application with the appropriate quality of service (QoS).
How can you best organise your storage resources? How can you balance the provide users and applications with different levels of access to different types of data without letting some become those demanding noisy ones who drown out their important but unassuming neighbours?
Or could it be that we need to fundamentally change the way we think about information retrieval?
Suppliers like to say that it is all about enforcing QoS. Quality is a rather subjective and personal term, but in business and technology it typically translates as fitness for purpose. Other key aspects feed into that, including consistency, predictability, traceability and so on.
As far as data storage is concerned, the basic measure of fitness for purpose is simple: it must deliver the I/Os per second (IOPS) that the application needs.
That is relatively easy to measure at the host layer, says Adam Carter, vice-president of product development at SolidFire, which specialises in enterprise flash storage.
He warns, though, that you need to make sure when you make your measurements that your apps are actually requesting data. “So you need to monitor queue depth as well,” he says.
The complexity comes when several applications are requesting IOPS, because applications are typically unaware of each other. This is especially true in multi-tenanted storage, where multiple users independently buy or book services.
To make matters worse, many applications were built to consume as many IOPS as they could because they were designed with direct-attach storage (DAS) in mind, not shared storage.
Indeed, some applications such as databases could not run on shared storage because its performance could not be guaranteed.
Yet the ability to host multiple applications on a single storage platform or subsystem is essential for a storage consolidation strategy.
Storage QoS is equally essential for consolidation because you need to prevent the actions of one set of users from affecting the QoS experienced by other users. In effect, it enables applications that could not safely run within a multi-tenant environment to do so.
Most storage providers and virtualisation developers have now implemented techniques to enforce QoS and prevent those ever-hungry applications from turning into noisy neighbours.
There are three main ways to do this: rate limiting; storage tiering; and prioritisation. Some companies use one and others use a combination of two or more.
Rate limiting sets hard limits for the amount of I/O or bandwidth that an application or customer can consume. These could be set in terms of MBps for sequential workloads or IOPS for transactional workloads.
Many systems let you set the maximum amount allowed, also known as capping. As well as quietening noisy neighbours it can be used to protect the storage from overloading.
Some systems, such as the QoS implementations in Microsoft's Hyper-V Manager and EMC's Navisphere QoS Manager, also let you set minimum IOPS values and will alert you when the storage performance falls below this defined threshold.
Others, such as SolidFire's, additionally let you define a burst level for when there is spare bandwidth available, so even a noisy application can shout if there is no one else around to deafen.
Prioritisation means applications are ranked according to their importance, from mission critical to moderate or low. However, simple prioritisation does not guarantee performance and assigning priorities can be somewhat arbitrary. It can also make matters worse if a noisy application is specified as mission critical.
Storage tiering is when several types of storage media – for example tape, various types of hard disk and perhaps the cloud – are combined to offer different levels of performance and capacity.
Performance can be adversely affected while data is being moved
Workflows, usually driven by predictive algorithms and historical usage information, move data from one tier to another.
The downside of tiering is that performance can be adversely affected while data is being moved. Also tiering cannot control noisy neighbours or guarantee a minimum QoS.
Nevertheless, it can be a useful part of an overall QoS mix – not least because it can be automated, says Frank Reichart, senior director of product marketing at Fujitsu.
“Customers are really looking for more automation. Storage administrators are having to study application performance, and that's really difficult,” he says.
“The result is either you go back to silos, which is exactly what people don't want, or you tune the storage to the application. But if the tuning and retuning effort is endless, you probably go back to silos.”
Mix it up
Reichart thinks the ideal is to automate a mix of mechanisms, as Fujitsu has done in its Eternus SF storage management software.
“QoS is traditionally about capping your non-important applications, but now we have unified QoS with automated tiering. There can be a delay during the copy mechanism but you no longer have to figure out which processes need tuning, so the workflow is no longer static,” he says.
According to Carter, the big issue with QoS arises when changes in storage usage patterns collide with the lack of visibility and cross-tenant awareness that comes with consolidation.
“People are zoomed into their volumes. They are not looking at the other volumes on the system and what they are doing. But purpose changes, applications and uses are not static,” he says.
“A lot of QoS is just massaging resources with no concrete guarantees. For example, tenants have no awareness of each other, so it's all very well to sell me a priority level, but as one tenant I don't know who else is on the system or who has a higher priority.”
Another factor is over-provisioning. The ability to over-provision spare capacity is one of the key reasons for storage consolidation and the same is true for QoS.
Instead of provisioning each server with its own high-performance storage, which it will max out on only on a few occasions, we can share that performance across several applications so long as we can safely assume that they will not all call for their maximum allowance at the same time.
“People can over-provision. There are risks to that, but good monitoring and control means you should be able to avoid them,” says Carter.
“Suppose I have 200,000 IOPS available: I could sell a minimum guarantee of 100k each to two clients. The service-level agreement [SLA] is usually built on the minimum.
“Suppose one client now wants 150k. I could add more performance to my system or I could look at my monitoring and see that the other customer never really used more than 20k and decide to over-provision.
"All those virtualisation layers create major problems with visibility"
"If all the tenants did now come for their minimum, it would allocate shares proportionately. It turns into prioritisation.
“As a service provider you might want to monitor the maximum allowance too, because if users are regularly running into their limits you could try to upsell them to a higher SLA.”
Storage consolidation clearly has advantages, yet it also brings considerable complexity – which increase as time goes by. Could it be safer to instead have your data estate span different systems – perhaps even to go back to the comparative simplicity of a DAS-type approach?
“We've been educating people for the last 15 years to move to consolidated storage, first with SAN fabrics, then tiering and thin provisioning. But all those virtualisation layers create major problems with visibility,” says Nigel Houghton, regional sales manager EMEA for storage management and reporting specialist Aptare.
“From a QoS perspective, DAS has advantages. Dedicated storage has its own QoS but the downside is the management and administration overhead that it would require. I don't think anyone could justify the cost these days.”
Yet he notes that there is a common response to the problem that can arise when you have a mixture of virtual machines running on the same storage volume, which is that storage performance suffers because of their differing block sizes and read/write patterns.
The response is to group virtual machines into sets with similar demands, for example by putting all the Oracle virtual machines together. This virtual DAS model can result in some storage being under used, though, just like physical DAS.
Hot under the collar
“The other aspect we get asked about is to tell them when things are running hot, because at the moment the first time they hear about it is when users start complaining that their apps are running slow,” Houghton says.
“The application guys have tools to tell them how the applications are running, for example BMC, but the storage guys don't.”
By the same token, enforcing QoS within the storage system guarantees it only there. Managing service delivery across your data-hungry applications requires end-to-end visibility into storage performance.
“The problem could be in your host server virtualisation, in the SAN fabric, or it could be the back-end storage. In larger organisations those could all be run by different teams and as soon as you get them together it becomes a finger-pointing session,” Houghton says.
“So you need a report on what's hot and on the main problems and reasons. Then auto-tiering can migrate hot applications, which means that centralising and automating are part of the same thing.
“Then we add the end-to-end visibility, for example who is asking for this high-performance storage and why? Application designers may specify a storage profile but get it wrong – there's no intelligence in the storage to say so.
“But we can compare the storage profile with the application's actual profile. For example, we could show the sleepy applications and the ones that don't need so much performance and should be on a lower tier.”
Plan for tomorrow
Even that might not be enough to avoid the same problems arising in the future. We really need to design workflows that allow us to move from prediction to planned anticipation, suggests Matt Starr, chief technical officer at backup and archiving specialist Spectra Logic.
He points out that being always-on has changed our attitudes to data. We assume our data will always be there when we need it, and that is not necessarily the case.
“What can happen is that the data is not at the right level in the stack. In fact, in a tiered environment, it's almost always not at the right level. It goes back to the fact that the guy running the workflow and the creation of data has no idea of how data gets tiered,” he says.
Houghton believes the solution is to change how the workflow starts. “If you treat data like a physical asset, the job comes down to a warehouse where it is picked and then delivered. But while most people are good at figuring out how to move data to an archive, they are not so good at figuring out the workflow to get it back,” he says.
“If you know you will need certain storage at a certain time, why not touch it up a day before? I have my diary booked out all day, I know I will need certain information at 1pm, so that data should be waiting on my laptop.
“Or take aerial imagery. There are hotspots where there's news interest, so track the news and use that to drive up the data. Or maybe you want images of the same spot over six years, so the software fires off the request.”
What we need is a kind of ERP of storage, or an enterprise edition of Google Now.
“It’s such a change from the days of paper records. We were used to recalling records ahead of time but we no longer do that,” Houghton observes. ®