First XPoint, then Z-NAND: Oh dear, server-makers. SCM is happening
Storage-class memory Nirvana 1.0 could be a 2019 event, says our man
Analysis Storage-class memory (SCM), in the shape of Optane, is already here and, with Samsung's Z-SSD, set to become available for use by servers. What does this mean and when will it actually happen?
SCM, also known as persistent memory (PMEM) is a faster version on non-volatile memory, built using Intel/Micron's 3D XPoint media or Samsung's Z-SSD – tweaked LLC (1bit/cell) NAND. It is faster than flash, also more expensive, and slower than DRAM but less expensive.
SCM is fast enough that it can be used as an adjunct to to DRAM and addressed/used as memory by applications and system software but it is also persistent and so apps don't need traditional IO stack code code to read and write data from/to SCM.
OK: but how might this affect servers and storage?
A diagram can lay out the land for us:
SCM in a server (with thanks to Dave Hitz)
Looking at this set of boxes we see a set of applications (App) and applications in virtual machines (App/VM) in the top of the diagram. The Apps run in a physical server with an operating system. The App/VMs run in a virtualised server with a hypervisor.
There are no containers in this diagram; we'll consider them as another form of server virtualisation for the time being. Today the App and the App/VMs use DRAM and then run IO operations to local direct-attached or external storage (red arrows).
If SCM is installed, it is conceptually placed alongside DRAM in this diagram. Some software is needed to present DRAM + SCM as a single memory address space/entity to the Apps and App/VMs (blue arrows).
The diagram has this as a software shim sitting between the O/S and hypervisor boxes and the Apps. It looks like memory to these pieces of running code, bulks out the DRAM and enables the Apps and App/VMs to run a lot faster if they are IO-bound, as most are.
Intel reveals Optane SSDs: 375GB to start, at surprising speedREAD MORE
Think of it as a transparent cache with cache management software; the shim.
It is for hot data, and needs initially loading with data and and having newly-created data shifted to longer-term storage. So blue arrows link it to direct-attached storage in the server or across a network to external storage.
The cache management software or shim might be a system-level application, such as NetApp's Plexistor, or it could be part of the server's OS or hypervisor. In this case we ask, firstly, how will physical server operating systems, such as Windows, Unix and Linux, support SCM?
Secondly, we ask, how will hypervisors – such as vSphere, XEN and VM – support SCM?
But there is a more basic question. The SCM media is fitted inside a server. How? Is it a PCIe interface drive in a standard drive bay or PCIe slot? Or is it fitted direct to the memory bus by using the NV-DIMM form factor?
The latter is the fastest connection but there needs to be an NV-DIMM standard for this so that any industry-standard X86 server can use it – or any server, period. We're thinking IBM POWER, Fujitsu/Oracle SPARC, and ARM processors here.
Another question in this area is: who fits it? The obvious answer is the server vendor. OK, that means taking a 1U blade-format server, or a 2U x 24-slot workhorse model; there is less space in the enclosure for other components, such as SSDs. What should be the right balance between the amounts of DRAM, SCM and local storage?
That is a tricky problem for server vendors to solve, meaning Cisco, Dell, Fujitsu, HPE, Huawei, Lenovo, SuperMicro, Vantara, etc.
Who will make the SCM DIMMs? Independent NV-DIMM suppliers such as Diablo Technologies have so far been fighting an uphill battle, and it seems clear that it must be the SCM media manufacturers – Intel/Micron and Samsung thus far. Both Micron and Samsung make DRAM as well as NAND, so are DIMM-aware, and they have the relationships with server vendors to supply DRAM and NAND.
We can see how the physical supply chain favours the SCM media manufacturers and server vendors. Anybody outside this ecosystem is going to have a hard time selling SCM media drives in whatever form they might exist – 2.5-inch drive, add-in-card or NV-DIMM.
The software side of this house comes down to asking if the OS and hypervisor vendors provide the needed SCM software or do independents such as NetApp's Plexistor, the in-memory people like Hazelcast, or some as-yet-unknown open source initiative building on, for example, Memcached, provide a separate shim?
Whatever the result there needs to be no consequent changes to applications for SCM adoption to proceed at a fast lick.
Let's throw a curve-ball in here and ask how hyper-converged infrastructure (HCI) appliance suppliers will react to SCM adoption. It's likely that they (a) want to use it so as not to be left behind performance-wise, and (b) want to aggregate it across HCI nodes.
That's a problem for Cisco, Dell EMC (VxRack/Rail), HPE, NetApp (now), Nutanix, Pivot3, Scale, and software HCIA suppliers such as DataCore, Maxta and so forth.
It seems to us that Nutanix is in a good place here because it bought Pernixdata, whose technology provided hypervisor-level caching. Can SCM be aggregated across servers (or HCI nodes) to provide a single logical SCM resource pool? Should it?
We have no answer to these questions.
Looking at the SCM backend, as it were, it has to eject cool data to a longer term storage device, meaning an IO in the traditional IO stack sense, unless an NVMe over Fabrics (RDMA) link is used. So code could be needed to accomplish this, as part of the shim. Target devices can be local to the server or remote (filer or SAN or, conceivably, public cloud).
Our diagram shows the network list including HCI and cluster nodes and this is a matter, we feel, for the HCI and cluster suppliers. But network links to storage arrays is a matter for the storage array industry together with the storage array interconnect ecosystem, meaning Ethernet/iSCSI/NFS, Fibre Channel and InfiniBand suppliers.
The network interconnect people need to have a mature NVMe over Fabrics standard so that servers at the front end and arrays at the back-end can use a standard NVMeoF HBA or adapter at either end of the link. It doesn't seem necessary for suppliers like Brocade or Mellanox to do anything special for SCM; NVMeoF will do as the necessary network plumbing.
Storage array angle
The storage array people may think they do need to do something special, and contribute actively to keeping the SCM cache optimally populated with hot data. What they might also do, though, is not have an end-to-end NVMe link between servers and their array, because that means data traffic bypasses their controllers, which consequently are ignorant about the state of the drives they no longer control, and can't sensibly apply data services to the data in those drives unless told to do so by applications in the servers.
What the array suppliers could do is to serve incoming requests across an NVMe fabric from controller cache and so get NVMeoF-class speed without turning their arrays into effective flash JBODs, which pure, controller bypass, end-to-end NVMe would do.
In fact, array controllers could use SCM for such caching, and NetApp is looking into this.
There may be differences between how SANs and filers react to and use SCM, and also object storage, but we won't look into that here.
Nirvana is coming – just not yet
A bunch of servers, fitted with SCM and talking NVMe to longer-term storage, will be vastly more productive than today's servers. Machine learning, database responsiveness and analytics will be, we think, literally revolutionised by the kind of data amounts that can held in a server and be processed in real time.
But the fitting of SCM media alone will not lead to this nirvana. A whole series of technology areas and technology suppliers have to be integrated and work together to make this happen.
It no longer seems that SCM-using servers are a remote possibility. They seem a definite possibility now, but we'll probably have to wait a couple of years. SCM Nirvana 1.0 may be a 2019 or later event. ®