Interview An NVME over fabrics controller-less array is not a SAN because it can’t share data. That was the essence of Datrium CTO Hugo Patterson’s view.
Jeff Sosa, head of products at stealthy startup Pavilion Data Systems, has views on this topic and what to do about it.
El Reg: What do you think of the Datrium CTO’s view on an NVMe JBOF not being a SAN?
Jeff Sosa: I agree nearly 100 per cent with what he is saying. E8 (and Excelero when it is deployed with a disaggregated storage shelf) implement the JBOF (Just a Bunch Of Flash) architecture where they scale by remotely accessing NVMe drives directly and running software on the hosts/clients to manage it, but you can’t natively share block storage volumes across hosts as a result.
They also replace the NVMeF client driver with their own software stack that probably does some level of volume management, etc.
El Reg: Can you say more about the disadvantages of this approach?
Jeff Sosa: [There is ] no sharing across servers as Hugo points out.
Since these JBOF products leverage the host tier to scale storage performance, you can’t really separate compute and storage in a way that allows you to scale them independently, which is one of the reasons that you would want to disaggregate the storage from the server tier in the first place.
They require customers to replace the community NVMeF driver with their own thick software stack, which is a non-starter for many customers in high-scale cloud environments these days. It is kind of like the old days when vendors like Fusion-io and Virident (I worked at both) ran thick drivers for accessing direct-attached PCIe Flash Cards in servers.
Customers were willing to do it because there was no alternative available that would deliver the performance and latency of these products. Now the world has moved on to using a community driver NVMe to access PCIe-connected flash, so you no longer need to install a vendors’ custom driver to access flash over PCIe.
NVMeF offers the same benefit over a network, but these vendors are taking customers back to the proprietary host software world that NVMe was supposed to eliminate.
El Reg: How does Pavilion approach this issue?
Jeff Sosa: Pavilion implements what Hugo describes as a true NVMeF SAN array, where blocks/volumes can be shared out to multiple hosts and all data services are self-contained in the array.
However, we eliminate the controller bottleneck he speaks of by putting up to 20 storage controllers and 40 network ports in a single 4U Array that you can manage like a traditional SAN Array. We have designed the product to allow customers to run the standard NVMeF client driver in their hosts, so no custom software is required from Pavilion in the application servers. We currently add multi-pathing support, however, which is not yet part of the community NVMeF driver.
El Reg: What about latency?
Jeff Sosa: The software stack we run in our array has been built from the ground up for PCIe and RDMA to eliminate latency, so we can deliver very low latency even though we provide traditional storage services, like RAID-6, thin provisioning, space-efficient snapshots, and full HA including active-active controllers.
Again, we can achieve all of this without requiring users to install anything but the community NVMeF client driver in their application servers.
Pavilion Data Systems appliance
Architecturally a pavilion is a subordinate structure that is separate from or attached to a main building. In IT terms, envisage accessing servers as the main building and a SAN as the attached storage pavilion.
The box Pavilion Data Systems is developing in a 4U rackmount appliance containing 20 active-active controllers and 40 x 40GbitE ports. It holds up to 72 SSDs offering 500TB of capacity. The access latency is equivalent, Pavilion claims, to local direct-attached SSDs on a server.
You can get a 2-page briefing doc from Pavilion Data Systems here if you wish (registration needed).
We think we can expect Pavilion to emerge from stealth and launch its product later this year. ®