First XPoint, then Z-NAND: Oh dear, server-makers. SCM is happening

Storage-class memory Nirvana 1.0 could be a 2019 event, says our man


Analysis Storage-class memory (SCM), in the shape of Optane, is already here and, with Samsung's Z-SSD, set to become available for use by servers. What does this mean and when will it actually happen?

SCM, also known as persistent memory (PMEM) is a faster version on non-volatile memory, built using Intel/Micron's 3D XPoint media or Samsung's Z-SSD – tweaked LLC (1bit/cell) NAND. It is faster than flash, also more expensive, and slower than DRAM but less expensive.

SCM is fast enough that it can be used as an adjunct to to DRAM and addressed/used as memory by applications and system software but it is also persistent and so apps don't need traditional IO stack code code to read and write data from/to SCM.

OK: but how might this affect servers and storage?

A diagram can lay out the land for us:

SCM_In_Server

SCM in a server (with thanks to Dave Hitz)

Looking at this set of boxes we see a set of applications (App) and applications in virtual machines (App/VM) in the top of the diagram. The Apps run in a physical server with an operating system. The App/VMs run in a virtualised server with a hypervisor.

There are no containers in this diagram; we'll consider them as another form of server virtualisation for the time being. Today the App and the App/VMs use DRAM and then run IO operations to local direct-attached or external storage (red arrows).

If SCM is installed, it is conceptually placed alongside DRAM in this diagram. Some software is needed to present DRAM + SCM as a single memory address space/entity to the Apps and App/VMs (blue arrows).

The diagram has this as a software shim sitting between the O/S and hypervisor boxes and the Apps. It looks like memory to these pieces of running code, bulks out the DRAM and enables the Apps and App/VMs to run a lot faster if they are IO-bound, as most are.

Intel_Optane_SSD_DC_P4800X

Intel reveals Optane SSDs: 375GB to start, at surprising speed

READ MORE

Think of it as a transparent cache with cache management software; the shim.

It is for hot data, and needs initially loading with data and and having newly-created data shifted to longer-term storage. So blue arrows link it to direct-attached storage in the server or across a network to external storage.

The cache management software or shim might be a system-level application, such as NetApp's Plexistor, or it could be part of the server's OS or hypervisor. In this case we ask, firstly, how will physical server operating systems, such as Windows, Unix and Linux, support SCM?

Secondly, we ask, how will hypervisors – such as vSphere, XEN and VM – support SCM?

Hardware angle

But there is a more basic question. The SCM media is fitted inside a server. How? Is it a PCIe interface drive in a standard drive bay or PCIe slot? Or is it fitted direct to the memory bus by using the NV-DIMM form factor?

The latter is the fastest connection but there needs to be an NV-DIMM standard for this so that any industry-standard X86 server can use it – or any server, period. We're thinking IBM POWER, Fujitsu/Oracle SPARC, and ARM processors here.

Another question in this area is: who fits it? The obvious answer is the server vendor. OK, that means taking a 1U blade-format server, or a 2U x 24-slot workhorse model; there is less space in the enclosure for other components, such as SSDs. What should be the right balance between the amounts of DRAM, SCM and local storage?

That is a tricky problem for server vendors to solve, meaning Cisco, Dell, Fujitsu, HPE, Huawei, Lenovo, SuperMicro, Vantara, etc.

Who will make the SCM DIMMs? Independent NV-DIMM suppliers such as Diablo Technologies have so far been fighting an uphill battle, and it seems clear that it must be the SCM media manufacturers – Intel/Micron and Samsung thus far. Both Micron and Samsung make DRAM as well as NAND, so are DIMM-aware, and they have the relationships with server vendors to supply DRAM and NAND.

We can see how the physical supply chain favours the SCM media manufacturers and server vendors. Anybody outside this ecosystem is going to have a hard time selling SCM media drives in whatever form they might exist – 2.5-inch drive, add-in-card or NV-DIMM.

Software angle

The software side of this house comes down to asking if the OS and hypervisor vendors provide the needed SCM software or do independents such as NetApp's Plexistor, the in-memory people like Hazelcast, or some as-yet-unknown open source initiative building on, for example, Memcached, provide a separate shim?

Whatever the result there needs to be no consequent changes to applications for SCM adoption to proceed at a fast lick.

Hyper-converged play

Let's throw a curve-ball in here and ask how hyper-converged infrastructure (HCI) appliance suppliers will react to SCM adoption. It's likely that they (a) want to use it so as not to be left behind performance-wise, and (b) want to aggregate it across HCI nodes.

That's a problem for Cisco, Dell EMC (VxRack/Rail), HPE, NetApp (now), Nutanix, Pivot3, Scale, and software HCIA suppliers such as DataCore, Maxta and so forth.

It seems to us that Nutanix is in a good place here because it bought Pernixdata, whose technology provided hypervisor-level caching. Can SCM be aggregated across servers (or HCI nodes) to provide a single logical SCM resource pool? Should it?

We have no answer to these questions.

Network angle

Looking at the SCM backend, as it were, it has to eject cool data to a longer term storage device, meaning an IO in the traditional IO stack sense, unless an NVMe over Fabrics (RDMA) link is used. So code could be needed to accomplish this, as part of the shim. Target devices can be local to the server or remote (filer or SAN or, conceivably, public cloud).

Our diagram shows the network list including HCI and cluster nodes and this is a matter, we feel, for the HCI and cluster suppliers. But network links to storage arrays is a matter for the storage array industry together with the storage array interconnect ecosystem, meaning Ethernet/iSCSI/NFS, Fibre Channel and InfiniBand suppliers.

The network interconnect people need to have a mature NVMe over Fabrics standard so that servers at the front end and arrays at the back-end can use a standard NVMeoF HBA or adapter at either end of the link. It doesn't seem necessary for suppliers like Brocade or Mellanox to do anything special for SCM; NVMeoF will do as the necessary network plumbing.

Storage array angle

The storage array people may think they do need to do something special, and contribute actively to keeping the SCM cache optimally populated with hot data. What they might also do, though, is not have an end-to-end NVMe link between servers and their array, because that means data traffic bypasses their controllers, which consequently are ignorant about the state of the drives they no longer control, and can't sensibly apply data services to the data in those drives unless told to do so by applications in the servers.

We note that Datrium and Excelero and others have a server presence and could do this in principle.

What the array suppliers could do is to serve incoming requests across an NVMe fabric from controller cache and so get NVMeoF-class speed without turning their arrays into effective flash JBODs, which pure, controller bypass, end-to-end NVMe would do.

In fact, array controllers could use SCM for such caching, and NetApp is looking into this.

There may be differences between how SANs and filers react to and use SCM, and also object storage, but we won't look into that here.

Nirvana is coming – just not yet

A bunch of servers, fitted with SCM and talking NVMe to longer-term storage, will be vastly more productive than today's servers. Machine learning, database responsiveness and analytics will be, we think, literally revolutionised by the kind of data amounts that can held in a server and be processed in real time.

But the fitting of SCM media alone will not lead to this nirvana. A whole series of technology areas and technology suppliers have to be integrated and work together to make this happen.

It no longer seems that SCM-using servers are a remote possibility. They seem a definite possibility now, but we'll probably have to wait a couple of years. SCM Nirvana 1.0 may be a 2019 or later event. ®


Other stories you might like

  • Will Lenovo ever think beyond hardware?
    Then again, why develop your own software à la HPE GreenLake when you can use someone else's?

    Analysis Lenovo fancies its TruScale anything-as-a-service (XaaS) platform as a more flexible competitor to HPE GreenLake or Dell Apex. Unlike its rivals, Lenovo doesn't believe it needs to mimic all aspects of the cloud to be successful.

    While subscription services are nothing new for Lenovo, the company only recently consolidated its offerings into a unified XaaS service called TruScale.

    On the surface TruScale ticks most of the XaaS boxes — cloud-like consumption model, subscription pricing — and it works just like you'd expect. Sign up for a certain amount of compute capacity and a short time later a rack full of pre-plumbed compute, storage, and network boxes are delivered to your place of choosing, whether that's a private datacenter, colo, or edge location.

    Continue reading
  • Intel is running rings around AMD and Arm at the edge
    What will it take to loosen the x86 giant's edge stranglehold?

    Analysis Supermicro launched a wave of edge appliances using Intel's newly refreshed Xeon-D processors last week. The launch itself was nothing to write home about, but a thought occurred: with all the hype surrounding the outer reaches of computing that we call the edge, you'd think there would be more competition from chipmakers in this arena.

    So where are all the AMD and Arm-based edge appliances?

    A glance through the catalogs of the major OEMs – Dell, HPE, Lenovo, Inspur, Supermicro – returned plenty of results for AMD servers, but few, if any, validated for edge deployments. In fact, Supermicro was the only one of the five vendors that even offered an AMD-based edge appliance – which used an ageing Epyc processor. Hardly a great showing from AMD. Meanwhile, just one appliance from Inspur used an Arm-based chip from Nvidia.

    Continue reading
  • NASA's Psyche mission: 2022 launch is off after software arrives late
    Launch window slides into 2023 or 2024 for asteroid-probing project

    Sadly for NASA's mission to take samples from the asteroid Psyche, software problems mean the spacecraft is going to miss its 2022 launch window.

    The US space agency made the announcement on Friday: "Due to the late delivery of the spacecraft's flight software and testing equipment, NASA does not have sufficient time to complete the testing needed ahead of its remaining launch period this year, which ends on October 11."

    While it appears the software and testbeds are now working, there just isn't enough time to get everything done before a SpaceX Falcon Heavy sends the spacecraft to study a metallic-rich asteroid of the same name.

    Continue reading

Biting the hand that feeds IT © 1998–2022