Once upon a time pretty much the only thing a storage manager had to worry about was running out of capacity.
Disk space was expensive and a slew of products offered data compression, moving disused files off to tape and so on.
Disk space is pretty cheap now, with multi-terabit drives readily available along with petabyte-class storage arrays, and the challenge has shifted first to storage I/O performance and then to application response time.
The performance challenges arise because no matter how fat a disk drive is, it is fundamentally limited in two areas: how fast the read/write head can get data on and off the physical disk platter; and how fast the disk interface can then get it to and from the host.
Need for speed
There are ways to increase these, of course. Faster spin speeds can let the head read data faster, for example, and second-generation SAS offers 6Gbps, making it about 10 times faster than the final iterations of parallel SCSI.
However, neither of these two factors has grown as fast as either the demand for storage or the capacity of the drive.
Solving the I/O performance issue was not too hard, although it often resulted in great inefficiency because increased performance was linked to increased capacity.
To increase performance you added more spindles – and therefore more read heads and disk interfaces. But because no one made low-capacity drives for clustering, that meant adding far more capacity than you actually needed. The result in some cases was arrays that used less than 30 per cent of their available capacity.
What is making things even more complex is the collision of data growth – running as high as 50 to 100 per cent a year in some industries – and the desire for big-data analysis.
It is increasingly impractical to offload huge production databases for analysis, yet trying to run business intelligence (BI) analytics on your production system risks fundamental conflicts between two data access patterns.
"They are two very different classes of application," says Frank Reichart, senior director of product marketing at Fujitsu.
"Production applications are more random access, the growing world of business analytics is more sequential. In the past you divided the storage hardware between the two to avoid conflicts, although that meant over-provisioning.
"But now you often can't separate the data like that because offloading terabytes of data for BI takes too long. So more people are doing BI on production systems, which means huge investments in storage hardware."
Cut the spin
Faster storage systems can help to a degree. That means 12Gbps SAS, 16Gb and 32Gb Fibre Channel, Infiniband and so on, as well as faster controllers that take full advantage of today's multithreaded and multicore processors.
It also means faster storage devices – and the big one here is of course the solid-state drive (SSD) based on Flash.
"In the old days, the only way to improve performance was to throw hundreds of disks at it," says Kevin Brown, CEO of Coraid.
"Today, a simple Flash drive can do what 100 hard disks did in terms of performance. Some vendors have even built guaranteed storage performance, but that's all Flash and it's expensive."
And if you don't go all-Flash and opt for layers of SSD on top of hard disk, at some point your bottleneck is still likely to be the speed of getting data to and from a rotating disk, and that has not grown much in the last few years.
"Many storage developers are now using SSDs as cache, not just as a target," says Reichart. "They are adding more processing power too, but current systems make relatively poor use of multiple cores and multithreading, and they don't use the latest Xeons."
But even if the storage array controller is faster than a very fast thing and the resulting array stomps all over the industry-standard benchmarks, there is a lot more to storage performance in today's complex shared environments than just the storage arrays.
Seeing is believing
To make things worse, the visibility that is expected elsewhere in the infrastructure is all too often missing on the storage side.
"Storage performance has been a problem for a long time because a lot of factors are involved," says Brown.
"Server and network performance is fairly linear, but storage depends on server performance and network performance. Then how you lay out the disks can also have a profound effect, and you need to know what your applications are doing too.
"Processor and network capacity is cheap, so you can chuck it at a problem. What you don't have, though, is storage taken care of. It could be a noisy neighbour on the server, it could be a flapping switch, the storage array head could be out of gas or a LUN could be overloaded.
"To work out what's going on you need a deep vision into the packets at both ends of the connection, and also visibility into the VLUN."
Alex D’Anna, director of solutions consulting EMEA at Virtual Instruments, agrees.
"You need visibility from the storage to the application. You can have the fastest array in the world but quality of service in a shared environment is about more than that – it is about your hosts," he says.
"What is important as you share the infrastructure is that bottlenecks are created in the HBAs and elsewhere at host level."
This also means you need a deeper understanding of and visibility into the relationships between storage and applications.
"Storage admins understand LUNs and so on but don't always understand applications. The application owner says 'The storage is slow', but what's really slow?" says D'Anna.
Getting that understanding is a challenge, notes Reichart. "In theory, storage managers could work more closely with the applications and database people, because if the applications and databases aren't optimised they will consume too much overhead," he says.
"The problem is that the storage people usually don't have much input into application development. They just get asked for capacity."
This is where infrastructure performance management (IPM) tools come in, argues D'Anna. He says that where his company's IPM work used to be primarily about troubleshooting, "now it's all about providing a quality-of-service view to application owners who have previously simply seen storage as a black box”.
This tendency to regard storage as a nebulous resource that is expected to “just work” has also led some users to expect too much for too little, he adds.
"Are you spending hundreds of millions on applications, but only millions on storage?" he asks.
“You need automation. You just can't wait for your users to call and complain”
So where should you be looking to detect and eliminate storage performance bottlenecks? Gaining end-to-end visibility from applications to the storage is an obvious element, although it is only part of the solution.
Jesper Matthiesen is CTO at Debriefing Software, whose cloud-based storage resource management software takes in data from the likes of IBM's Tivoli Storage Manager and Storwize and analyses it to generate automated reports and alerts.
"The mid-market is still concerned with ensuring there are enough Gigabits and keeping up with growth, but in large environments, performance management and analysis become more of a topic, especially if you have large numbers of virtual hosts and volumes," Matthiesen says.
"So you need automation. You just can't wait for your users to call and complain."
Flash of inspiration
Storage virtualisation is also likely to become more important. Not only is it a good place to do tiering, but when it is combined with server virtualisation it allows capacity planning to look at the performance factors that matter in shared storage – throughput and I/O rates – not just at terabytes.
Look too for more Flash storage, used in more ways – and in the hosts as well as the arrays – says John Rollason, NetApp's director of product, solutions and alliances marketing.
"Flash is transforming everything. The key now is usable performance with the right mix of Flash at whatever layers with the highest capacity drives," he says.
"The other thing for Flash is server-side caching. We have brought out technology there in partnership with Fusion-IO and others to do cache coherency. It's early days, but people are working out how best to use that."
And while a relatively modest investment in a tier of Flash is a great way to get immediate benefit without the expense of going all-Flash, try to think of it as a stopgap – something that helps paper over the performance cracks and preserve your investments while you rethink your storage architecture for the long term.
What will that architecture look like? Flash is sure to be there but is unlikely to yield its full performance potential if it is simply used as a faster version of what was in use before.
So as Reichart says: "Look for new system designs, not simply more capacity." ®