This article is more than 1 year old
Little top tech tip: Take care choosing your storage drives
Flash, rust, MCS, SMR PMR TDMR HAMR and QLC – oh my!
Sysadmin Blog RAID is dead. Or maybe it's not. I think it might be off having a conversation with a cat in a box. Regardless of whether or not you use hardware RAID cards or HBAs and some kind of software, the idea of big boxes full of drives that store lots of things isn't going away any time soon. The drives you put in them, however, are becoming a problem.
Big boxes of storage generally serve one of two purposes: make things go fast or store lots of things that don't need to go fast. Hybrid solutions try to do both. All of this used to be fairly straightforward, but it's gotten a lot more complex.
Way back in the beforetime, fast drives spun faster and archive drives spun slower. Those really desperate for speed could short-stroke their 10k and 15K rpm drives so they were only using the very fastest possible chunk of the disks. Archival storage was 5.4 or 7.2K rpm. Life was simple.
Must go faster
Along came flash. New and shiny and with a lot of FUD around write life, flash made headway in the consumer market and we slowly learned to trust it for enterprise workloads. Despite setbacks to confidence (such as OCZ), flash eventually won the enterprise performance market over.
The result has been predictable: flash is negatively affecting all segments, but the impact on consumer and enterprise performance disk drives has been nothing short of catastrophic.
On the consumer side, once you've gone flash, you won't be going back. On the enterprise side it's actually worse. I have personally been able to replace a 24-disk 15K rpm RAID 10 array with a pair of Micron 9100 NVMe SSDs and see performance increase dramatically, and latency drop by more than an order of magnitude. In other words, at a bare minimum, for every two enterprise SSDs I buy, I'm not buying 24 spinning disks. Ouch.
Today we're on the verge of another sea change in performance. Storage class memory is starting to emerge from vendors like Xitore and Diablo, as well as more traditional silicon houses like Intel and Micron. This promises even lower latencies and faster storage, and is competing with technologies such as battery/flash-backed DIMMs that are a layer faster still.
Most of us still don't even push our flash arrays all that hard.
Bulk bit bins
For all the excitement and innovation in the "making things go faster" space, the truth of the matter is that the need for extreme speed is still fairly niche. I still run my company's server workloads off of spinning disks and don't feel a huge pressure to change that. My concerns, as with so many other businesses, center more around the ever-growing demand for capacity.
Here again there is a lot of innovation in the market. Traditional spinning rust, which uses perpendicular magnetic recording (PMR), is being complicated by technologies like shingled magnetic recording (SMR), heat assisted magnetic recording (HAMR), two-dimensional magnetic recording (TDMR), green (variable rotation speed) drives and helium-filled drives. Adding to the confusion, flash is at it again with quad level cell (QLC) flash drives that have high capacities but low write life.
Mixing and matching drive types in RAID arrays is very, very bad. Do not mix helium drives with SMR drives in the same array, for example. Helium drives are rather a lot faster than SMR (especially for writes), and your RAID array will probably kick the SMR drives out as bad.
Similarly, green drives can cause all sorts of problems in RAID arrays. Sometimes you get a RAID array that knows how to cope with variable rotation speeds, but often those controllers will not deal with green drives that are different models or manufacturers. They're just different enough in responsiveness to make some drives seem bad when compared to the others.
Modern object storage solutions can mitigate some of this, assuming the HBAs can talk to the new drives. Software tends to be a bit more flexible, and you can usually smooth out some of the performance quirks with caching or tiering.
Stick with what you know
It's bad to make assumptions. If you plan to mix and match drive types, a long discussion with your object storage vendor is called for. I've some horror stories about just how badly things can go when you start mixing PMR, SMR, helium and green drives across multiple nodes but all part of the same object store. Do not do.
Storage software can smooth some of the rough edges, but only by so much. With 3-year refresh cycles largely a thing of the past, and considering how long storage arrays and especially whitebox object clusters can last, it's worth doing maths on just how long you plan to keep nodes in service.
More importantly, it's worth determining whether the drive types you're using will still be available in quantity towards the end of the planned storage cluster's life. Things in storage are changing, and for the next few years, they'll be changing quite fast. ®