Hopping the flash stepping stones to DIMM future

How levels, layers, stacks and DIMMS are boosting speeds


Analysis Up until very recently the main thrust of data access and storage technology development was to make the media faster, hence the move from disk to flash, and flash technology developments to increase capacity.

But the opportunity to make further advances is closing and attention is turning to removing the final obstacle to fast stored data IO, the operating system code stack. Removing that really will enable stored data access to run at memory or near-memory speed and revolutionise stored data IO forever.

The road to there starts here, with the current transition from storing data on disk to storing it on flash, in SSDs. The obvious reasons for this are that flash access latency is sub-millisecond, whereas disk latency can be 20 - 100 milliseconds, that flash takes up less physical space, and that it uses less electricity and needs less cooling.

As its reliability and endurance have risen - and cost has come down, helped initially by deduplication and compression reducing the amount of data that needs to be stored - flash is taking over the duty of storing primary data and beginning to move into secondary data storage.

Flash is a constantly developing storage medium. The initial planar or 2D NAND, involving a single layer of flash cells, stored 1 bit per single level cell. The cell is set by applying a write-level voltage and is read by testing its resistance level with a threshold voltage. If current flows then the applied threshold voltage has beaten the resistance level and indicates a binary 1 in the cell, and if it does not flow then a binary 0 value exists.

The way to get more capacity in a NAND die was to shrink the cell size, using a smaller lithography, to cram more cells into the die. Thus 80nm-type cells gave way to 70nm-type ones, and then 60nm, 50nm and so on down to 14nm after which point there are fewer and fewer electrons left in the cell to produce a reliable signal reading. So cell size shrinkage produced a dead-end that had to be avoided or delayed to further increase capacity and reduce cost.

The answer was twofold, involving levels and then layers.

More levels (bits)

It was possible to add more bit values to a cell by effectively sub-dividing its resistance level range (basically all or none in an SLC cell and hence two states) into four states for a 2bits/cell product. That doubles capacity so a 50GB SSD becomes a 100GB one, produced on the same manufacturing process and so roughly half the cost.

This was called MLC for multi-level cell NAND - before manufacturers realised they could sub-divide further, into eight states and have 3 bits/cell, which is called TLC (Triple-Level Cell) NAND with a 50 per cent capacity increase over MLC. And now manufacturers are wondering about having 4 bits/cell with QLC (Quad-Level Cell) technology and another 25 per cent capacity increase.

Here is how these multiple value cells work, with possible binary values for SLC, MLC, TLC and QLC cells being:

  • SLC = 0 or 1 - meaning two states and one threshold voltage,
  • MLC = 00, 10, 01, or 11 - four states and so three threshold voltages,
  • TLC = 000, 001, 010, 011, 100, 101, 110, 111 - eight states and thus seven threshold voltages,
  • QLC = 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111 - 16 states and so 15 threshold voltages.

We can’t go on adding more bits because SSD controller technology can’t cope with the precision and errors involved. Also, each extra cell bit slows SSD speed and shortens its endurance, meaning the SSDs don’t perform as well or as long. Here is where extra layers come in.

More layers

It’s possible, with careful manufacturing, to have more than one layer of cells, and so increase capacity by increasing the vertical height of the flash die. So 24-layers were checked out, and then 48-layers, with Samsung taking the lead, and SanDisk/Toshiba following in its footsteps while Intel and Micron pondered whether it was more cost-efficient to stay with planar NAND for longer.

It was easier to produce 3D NAND with larger cell sizes than the then current 2D NAND, which meant that each cell had more electrons, and thus performed faster and for longer at any bit-level state – SLC, MLC, or TLC. Samsung is now producing 64-layer, TLC 3D NAND (or V-NAND) SSDs, with huge capacities – 7TB and 15.3TB, bigger than 2.5-inch disk drives. Planar TLC NAND is stuck at the 2TB or 3TB level and so now every manufacturer is moving to 3D NAND technology.

They are all looking at the 64-layer level and hoping to go to 96 layers and even further as ways of continuously increasing capacity. But problems loom. Building a multi-layered chip means stacking layers incredibly precisely atop one another and then constructing (etching) holes between the layers for connecting wires. These holes have to be precisely vertical and have a precise diameter and intersect with precisely the right features in each layer as they go down through the chip. It is an extraordinary feat of manufacturing technology. A 0.01 degree per layer error in any direction at the top layer becomes a 0.48 degree error at the bottom of a 96-layer stack and that could mean features are missed and the die is a dud.

Now add another 24 layers and/or shrink the cell lithography and the error tolerance is even less.

We will probably get to 96 layers. Whether we’ll be able to shrink the cell lithography while doing that is a moot point. But, anyway, another technology dead-end seems to be looming capacity-wise and a different technology, a post-NAND technology would be needed to get over that.

Post-NAND media technologies

The most prominent post-NAND technology is Intel and Micron’s 3D XPoint (CrossPoint) which uses a bulk-change mechanism in a chalcogenide glass medium to alter its resistance. This technology is claimed to be both denser than NAND (more capacity in less space) and faster. But the actual mechanism is a secret and XPoint is only just being sampled in Optane SSD form, with faster-access DIMM-mounted XPoint due later this year and close Intel CPU-coupling (Purley CPU and Omni-Point interconnect possibly) in 2018.

Samsung has an intermediate and competing Z-SSD technology which is some secret way of developing NAND to increase access speed to near-DRAM levels. Then there is a WD (SanDisk) resistive RAM technology under development together with an HPE-only variant of ReRAM, the memristor. We also have IBM researchers looking at Phase-Change Memory, and startups focussing on Spin-Transfer Torque RAM (STT-RAM), EverSpin for example.

PCM IBM chip, photo IBM

IBM Phase-Change Memory

All these technologies will provide what’s called Storage-Class Memory (SCM), persistent (non-volatile) storage accessed at memory speed. In a computer, memory is accessed directly, being byte-addressable and accessed with, for example, load and store instructions. Traditional storage (not storage-class memory) is accessed via an IO code stack that reads and writes files or blocks of data to a block-access drive full of non-volatile media; paper tape, magnetic drums, rotating disks, magnetic tape and, of course, latterly NAND flash.

In CPU cycle terms traversing the IO stack takes thousands of CPU cycles; it wastes CPU time, it delays application execution but it's necessary because the storage devices are zillions of times slower at doing their thing than CPUs are at doing theirs. So the CPU initiates a context switch for an IO and moves to processing another app, another thread, when one issues an IO.

If storage operates at memory or near-memory speed then the data could be read and written directly from/to the medium and there would be no need for a time-consuming IO stack’s involvement.

Keep this thought in mind as we context switch ourselves and look at non-volatile storage’s access speed. These two threads will ultimately come together.

The need for speed

Meanwhile, just as one part of the industry has focussed on increasing SSD capacity, another part has focussed on increasing its access speed. When SSDs first came in they were built as quasi-disks to fit in disk drive slots in host systems, hence 3.5-inch and 2.5-inch form factors, and SATA and SAS interfaces. Then suppliers figured sticking these fast drives on the PCIe bus meant data access was faster and latency lower, which meant more IOs per second (IOPS). A PCIe card could also have more real estate on which to mount flash chips and so increase capacity.

But each supplier’s PCIe flash card had to have its own driver for every different host system, which was idiocy compared to the standard SAS and SATA drivers available. Quickly the industry realised a standard driver was needed and this is called NVMe (Non-Volatile Memory Express). We’re now seeing a flood of NVMe SSDs hitting the market with more coming.

Are you getting the picture here? The NAND media, device and interconnect areas are in perpetual motion as suppliers try to constantly and simultaneously increase device capacity and get data access speed closer to raw NAND access speed. So SATA and SAS were improved from 3Gbits/secs to 6Gbit/sec and on to today’s 12Gbits/sec while PCIe is faster still, but it is still a hop away from a host system’s CPU-Memory bus, the gold DRAM-speed interconnect standard. NetList and Diablo Technologies have pioneered putting NAND directly on memory DIMMs and so directly connecting to the memory bus.

Software obstacles to faster data access

Although this provides, in theory, the very fastest NAND data access, server manufacturers have been slow to adopt it, because realising that speed requires operating system and application software changes.

Let’s step back a moment, and consider that continually making the media faster brings in diminishing returns, because there are other elements causing access latency which faster media does not fix. Here is a diagram showing this;

Handy_SSD_timimg

Data access latency contributions from Jim Handy of ObjectiveAnalysis.

The IO stack is becoming the single most important obstacle in the way of increasing data access speed and, now, with SCM it can be sidestepped.

IO stack sidestep

SCM implemented as memory or near-memory speed NVDIMMs can be accessed in a way that delivers partial IO stack latency reduction by having an SCM driver added to the host operating system. It receives file-level IO semantics (read file, write file, etc) from the OS and translates those into memory level (load, store) semantics for the storage medium on the DIMM. Using a DRAM DIMM with non-volatility provided by battery- or capacitor-backed NAND (NVDIMM-N) means this can be about ten times faster than IO stack-level access to an NVMe SSD. This is termed block-level access to the NVDIMM.

If we use byte-level access to the NVDIMM, by having the OS treat it as a direct-access volume (DAX in Microsoft terminology) so modified application code can issue memory-level load-store commands directly to it then block-level time wasting is avoided and the NVDIMM is accessed eight times faster again, meaning 80 times faster than the NVMe SSD.

DAX_Architecture

Microsoft DAX architecture for Windows Server 2016

These comparative numbers are based on Microsoft tests which revealed:

  • In a simple 4K random write test using a single thread against an NVMe SSD the latency was 0.07ms (70 microsecs) and bandwidth about 56MBps,
  • Running the same test in a block mode NVDIMM-N we see 0.01ms (10 microsecs) latency and 580MBps, 10 times faster,
  • Running the same test in byte-addressable mode the result is eight times faster still, more than 8GBps at c820ns latency.

You can explore the background context to this here.

The benefit to all this is not just that tens of thousands of CPU cycles can be saved per app per unit of time, but that the host system CPUs stop spending 30, 40, 50 or more percent of their time doing IO handling, for which there is now no need, and can do more useful work. Put very crudely it’s as if a 40 per cent recovery of CPU resources in a server means up to 40 per cent more virtual machines can be run, or an up to 40 per cent cheaper server used, as well as having existing VMs run faster.

It is like having layers of OS code friction removed that currently slow everything down.

Takeways

  1. MLC and TLC 3D NAND SSDs are now the norm for storing primary data
  2. NVMe access protocols are taking over from SATA and SAS
  3. Enhanced layering of 3D NAND SSDs will enable flash to cost-effectively start storing secondary data
  4. DRAM/Flash NVDIMMs will enable IO code stacks to be side-stepped to gain near memory, byte-addressable storage access speed
  5. 3D XPoint and/or other media will enable NVDIMMs to substitute for DRAM with greatly increased capacities
  6. Non-volatile SCM DIMMS with direct access volumes will radically increase server (and ultimately client) stored data access speed

All this will be - and is being - achieved by adding layers and levels to NAND, by mounting the media on DIMMS and then getting rid of the IO code stacks. It’s a rosy future and one that is only a few years away. Close your eyes and it will be here in a flash. ®


Other stories you might like

  • What's the big deal with service meshes? Think of them as SDN at Layer 7

    A technical yet demystifying dive into networking tech you now can't avoid

    Systems Approach I remember when I first heard about Service Meshes in 2017, and wondering what the big deal was. Building cloud applications as a graph of microservices was commonplace, and telcos were hard at work inventing yet other ways to chain together virtualized network functions. Service graphs, service chains, service meshes… how many ways do we really need to talk about composing complex systems from a collection of smaller components?

    It wasn’t until I recognized a familiar pattern that I got it: a Service Mesh is just SDN at Layer 7. That’s probably what happens when SDN is the hammer you keep hitting nails with, but I’ve come to believe there is value in that perspective.

    The figure below highlights the similarities between the two scenarios, both of which include a centralized controller that issues directives to a distributed set of connectors (physical/virtual switches in one case, and a sidecar container in the other case) — based on a combination of policy intents from above and monitoring data reported from below. The primary difference is that the SDN controller on the left is controlling L2/3 connectivity and the Service Mesh on the right is controlling L7 connectivity.

    Continue reading
  • Mars race: China dreams of nuclear rockets, manned bases, and space elevators

    We're looking forward to the late 21st-century colony wars

    Over the next quarter century, China wants to set up a permanent base on Mars for "large scale development of the Red Planet," and install a sci-fi carbon-nanotube elevator to shuttle goods between the surface and spacecraft in orbit.

    That’s according to the China Academy of Launch Vehicle Technology (CALT), the country’s largest rocket maker, which described a road-map outlining the Middle Kingdom's ambition to explore the unforgiving dust world. Missions to Mars are planned for 2033, 2035, 2037, 2041, and 2043 quite possibly using nuclear-propelled spacecraft.

    In a speech, CALT’s President Wang Xiaojun said his state-owned organization first intends to send robots to Mars to collect samples of material to study back on Earth. These machines will also scout out good locations to develop a human settlement.

    Continue reading
  • Bridging the observability gap

    Trace the journey through all those microservices in the background

    Sponsored In modern IT, visibility is everything. IT admins and Site Reliability Engineers (SRE) survive on their ability to see what's happening in their systems. Unfortunately, as systems get more sophisticated, it has become harder to see what they're doing. That's why the industry is promoting observability as the evolution of existing concepts like monitoring and metrics. Vendors are stepping up with tools to address a growing visibility gap.

    Continue reading
  • Google: About that whole getting rid of third-party cookies thing – we're gonna need another year or so

    Plan to reinvent advertising turns out to be more difficult than expected

    Google, which makes the only major browser not blocking third-party cookies by default, has revised its commitment to phase out third-party cookies by 2022.

    The super-corp's biscotticide is now scheduled to begin in mid-2023 and run through late 2023.

    Third-party cookies refer to tracking files deposited in one's browser when visiting a website that includes code interacting with third-party domains. The firms associated with these domains, typically marketing and analytics businesses, check for the presence of their cookies across different websites and use this information to build marketing profiles and to target ads based on behavior.

    Continue reading
  • These six proposed bipartisan antitrust laws put Big Tech in the cross-hairs – and a House committee just OK'd them

    Well, it's a start

    The US House Judiciary Committee this week approved half a dozen major bipartisan antitrust bills aimed at clamping down on the growing power of Big Tech and its monopolization of some markets.

    The panel, led by Jerry Nadler (D-NY), debated for nearly 30 hours on Wednesday and Thursday to advance the wide-sweeping six-bill package. The proposed laws includes all sorts of measures to prevent companies like Google, Apple, Amazon, Microsoft, Facebook, and others from dominating their sectors of the technology industry.

    There was likely plenty of lobbying and other wrangling going on in the back and foreground over the exact wording of the package. For instance, there was a concern by some lawmakers that Microsoft would end up avoiding certain provisions in the proposed acts that would otherwise hit Google and Apple. Tweaks were made – such as removing "mobile" from "mobile operating system" in the fine-print – to ensure no one was wriggling out.

    Continue reading
  • You won't want that Linux bling if it comes from Pling: Marketplace platform has critical vulnerabilities

    No one wants to be pwned by a drive-by RCE

    A Berlin startup has disclosed a remote-code-execution (RCE) vulnerability and a wormable cross-site-scripting (XSS) flaw in Pling, which is used by various Linux desktop theme marketplaces.

    Positive Security, which found the holes and is not to be confused with Russia’s Positive Technologies, said the bugs are still present in the Pling code and its maintainers have not responded to vulnerability reports.

    Pling presents itself as a marketplace for creative folk to upload Linux desktop themes and graphics, among other things, in the hope of making a few quid from supporters. It comes in two parts: code needed to run your own bling bazaar, and an Electron-based app users can install to manage their themes from a Pling souk. The web code has the XSS in it, and the client has the XSS and an RCE. Pling powers a bunch of sites, from pling.com and store.kde.org to gnome-look.org and xfce-look.org.

    Continue reading
  • Would-be password-killer FIDO Alliance aims to boost uptake with new UX guidelines

    Throws a bone to complex enterprise deployment, too

    The FIDO Alliance, which operates with no smaller mission than to "reduce the world's over-reliance on passwords", has announced the release of new user experience (UX) guidelines aimed at bringing the more technophobic on board.

    Launched back in 2013 as the Fast Identity Online Alliance, the FIDO Alliance aims to do away with passwords altogether through the introduction of standards-compliant "authenticators" including USB security dongles, fingerprint readers, Trusted Platform Modules (TPMs) and more.

    While the organisation's standards, which were updated with the launch of FIDO2 in 2018, have enjoyed adoption in the majority of web browsers and with a range of companies, they're still seen as unusual and even inconvenient compared to the good ol' username and password combo – which is where the new UX guidelines come in.

    Continue reading
  • UK's Vodafone network runs trials on standalone 5G in London, Manchester and Cardiff

    These are networks that are not dragged down by LTE core

    Vodafone has launched 5G SA (Standalone) trials in London, Manchester, and Cardiff in its largest test of the technology yet.

    The commercial launch has allowed the carrier to experiment with new ways to commercialise its network, including network slicing – where a portion of network is dedicated to a specific customer for their exclusive use. It will also allow customers to test 5G SA devices on a live, public network.

    Vodafone selected Ericsson's dual-mode 5G core network as the dedicated provider for this trial. It follows trials at Coventry University in 2020, and a separate trial in Spain.

    Continue reading
  • What you need to know about Microsoft Windows 11: It will run Android apps

    The operating system they said shouldn't exist

    Microsoft on Thursday announced Windows 11, or tried to as an uncooperative video stream left many viewers of the virtual event flummoxed by intermittent transmission gaps in the opening minutes.

    The technical issues proved bad enough that Matt Velloso, Technical Advisor to the CEO at Microsoft, suggested trying the YouTube video stream as an alternative to the Microsoft-hosted one.

    But with some of the features already known as a result of a leaked build last week, the impact of the intermittent video dropouts was less than it might have been.

    Continue reading
  • Russia spoofed AIS data to fake British warship's course days before Crimea guns showdown

    Great powers clash while the rest of us sigh and tut at data feed meddling

    Russia was back up to its age-old spoofing of GPS tracks earlier this week before a showdown between British destroyer HMS Defender and coastguard ships near occupied Crimea in the Black Sea.

    Yesterday Defender briefly sailed through Ukrainian waters, triggering the Russian Navy and coastguard into sending patrol boats and anti-shipping aircraft to buzz the British warship in a fruitless effort to divert her away from occupied Crimea's waters.

    Russia invaded Ukraine in 2014 and has occupied parts of the region, mostly in the Crimean peninsula, ever since. The UK and other NATO allies do not recognise Ukraine as enemy-held territory so Defender was sailing through an ally's waters – and doing so through a published traffic separation scheme (similar to the TSS in the English Channel), as Defence Secretary Ben Wallace confirmed this afternoon.*

    Continue reading
  • Lego bricks, upcycled iPhone lenses used in new low-cost, high-res microscope

    Full instructions given away for free, to 'nurture natural curiosity'

    A trio of boffins at the Georg August University Göttingen and Münster University have put together a low-cost yet high-resolution microscope for educational users – using smartphone parts and Lego bricks.

    "An understanding of science is crucial for decision-making and brings many benefits in everyday life, such as problem-solving and creativity," said Timo Betz, professor at the University of Göttingen and co-author of the paper detailing the project. “Yet we find that many people, even politicians, feel excluded or do not have the opportunities to engage in scientific or critical thinking.

    "We wanted to find a way to nurture natural curiosity, help people grasp fundamental principles and see the potential of science."

    Continue reading

Biting the hand that feeds IT © 1998–2021