The network: Your next big storage problem
Thinking about a single point of access for storage is just crazy!
Comment A few days ago I had an interesting chat with Andy Warfield at Coho Data and the topic of the network/storage relationship came up several times. (Quick disclaimer: I'm currently doing some work for Coho). In a couple of my latest articles (here and here), I talked about why many large IT organizations prefer PODs to other topologies for their datacenters but I forgot to talk about networking (I also have to admit that networking is not my field at all). So, this article could be the right follow-up for those posts.
Network is storage
The problem is simple. In the past, primary enterprise networked storage was fibre channel (FC) only. Topology, connectivity, protocol management and anything else were easier if not non-existing problems. Yes, at scale everything becomes more complex – but still, FC was more manageable than everything else.
Things got more and more complicated with the introduction of Ethernet fabrics. At the beginning it was just secondary storage, NAS repositories and other non-critical applications ... but year after year, Ethernet-based protocols matured, switches became faster and faster, and hardware was commoditized. Now most modern storage systems are going Ethernet first ... long story short? Ethernet won hands down.
NVMe over fabrics? It's on Ethernet! And operations have changed too, reflecting the opportunity given by this new powerful commodity hardware and the continuous search for simplification. In fact, there are a lot more less-specialized sysadmins than in the past – "jacks of all trades" who can understand and manage all the infrastructure components, but are not true specialists. And again, consolidating on a single protocol/technology/wire has been much more convenient for everyone.
Storage is network
But storage has also evolved a lot and has drastically changed in the last few years. Up to a couple years ago, the biggest problem in storage was performance, wasn't it? The fastest (15K RPM) hard disk was (and still is) capable of about 180 – 200 IOPS (and 4 – 7 ms of latency), and any sort of expedient was good for carving out something more from it ... do you remember short-stroking for example?
CPU and networking were already much faster, and many software/hardware mechanisms were in place to mitigate what the real bottleneck of the entire infrastructure was (meaning storage). Now you have all sorts of flash memory, and there's more coming in the form of memory class storage (Intel 3D Xpoint, for example). Storage is no longer the bottleneck; now it can be outrageously fast, and will be even faster tomorrow!
In fact, not only the introduction of 3DXpoint alone but, in a very few years, 3D TLC (QLC?) memory will have a $/GB that will be comparable to SAS/SATA HDDs but with an IOPS rate that will be as high as MLC nowadays (for reads at least).
In practice, this means that capacity and performance could easily come together from the same primary storage system, leading to massive throughputs and IOPS per single system.
Is networking the next bottleneck for storage?
It has already happened in large-scale infrastructures (Big Data for example), and it's already happening in other contexts. If you consolidate capacity and performance in a single system, throughput can become unsustainable from the networking perspective, throwing the whole infrastructure off balance.
A traditional array, connected to a limited number of Ethernet ports concentrated in a few switches, could become a major problem. Even a scale-out storage array, if concentrated in a single rack and accessed through top-of-rack (TOR) switches, would have the same problem. With traffic consolidation on a single wire, traditional three-tier network topologies are becoming more and more difficult to manage.
Each single compute node can now host 10s or 100s of VMs, and even 1000s is not that far-fetched; imagine containers per server in the future, and if we add storage traffic we can easily get huge amounts of data transferred per second to/from each single server.
VM and workload mobility make things even worse, and thinking about a single point of access for storage (a single switch in a single rack) is just crazy. Some scale-out storage solutions (like HDFS for example) have already addressed this problem, with data localisation putting chunks of data close to the compute resources that will probably use them. At the same time, new flatter network topologies like leaf-spine are being preferred to minimize the number of hops.
In some cases, like with Coho, SDN is another key technology that can be implemented to boost overall infrastructure efficiency by putting storage nodes in different racks and distributing traffic transparently while having the advantages of data localization and a single logical point of access.
Closing the circle
Is it history repeating itself? In the 90s, NUMA (non-uniform memory access) got a lot of attention in computer design. Now, at the infrastructure level, it looks like we are facing similar problems with very similar solutions. It's always the same problem: how to design efficient and balanced systems (infrastructures in this case). Now storage is no longer the bottleneck and, thanks to different types of non-volatile memory, CPUs and network are no longer wasting time while waiting for data; in fact, we are now starting to experience the opposite problem.
Here you can find a really interesting read that goes much deeper into analyzing the problem and paints possible future scenarios ... enjoy the read!
If you want to know more about this topic, I'll be presenting at the next TECHunplugged conference in Austin on 2/2/16. It's a one-day event focused on cloud computing and IT infrastructure, with an innovative formula that combines a group of independent, insightful and well-recognized bloggers with disruptive technology vendors and end users who manage rich technology environments. Join us! ®