Western Digital CTO Martin Fink refused El Reg's questions, but did write this sweet essay

On storage-class memory and leaving HP Labs


Interview When "retired" HP Labs head Martin Fink surprisingly joined Western Digital as CTO we were interested how this memory-driven computing proponent would affect WD's solid state tech strategy.

After a few weeks we sent him some questions, to which he responded: "Some of your questions need a bit of context around them. In some cases, your questions imply something that's not a valid assumption."

(Presumably he is also bound by various confidentiality agreements.)

He added: "Rather than do a point-by-point response, let me try to answer your questions with a bit more color around them. Hopefully that will help."

Here are the questions we asked:

  • Do you think the server vendors will adopt memory-driven computing and for what reasons?
  • Why is Storage-Class Memory (SCM) important?
  • Can SCM replace DRAM and NAND and become a universal memory?
  • If we view SCM as filling a price/performance gap between DRAM and NAND, then how big is this gap and how many technology segments might there be in it?
  • How would you characterise Memristor, ReRAM, STT-RAM, MRAM, Carbon Nanotube Memory, PCM and 3D XPoint as SCM technologies in terms of performance, technology maturity and productisation likelihood?
  • Why isn't NVRAM a viable SCM candidate?
  • How would you contrast and compare NVMe over Fabrics-accessed storage arrays, using say NVMe flash drives, with SCM technologies?
  • Are current NVDIMM technologies, such as those from Diablo Technologies and Netlist, anything other than short-term stopgaps before SCM arrives? What is their role in servers?
  • Has time run out for the Memristor technology?
  • What format will SCM use, such as NVDIMM, a PCIe card of some sort (such as M.2), or some other format?
  • Will SCM be a server OEM-supplied technology or/and a retrofit technology by VARs and system houses?
  • Did you leave HP Labs to retire?
  • What attracted you to the Western Digital CTO position?

Reg pulpit

Next is Martin Fink's response but first we'll try to answer a question you, dear reader, might have: why are we providing a pulpit for a supplier's CTO?

It's justified in this special case, we'd argue, because of Martin Fink's former position as Memristor and memory-driven computing evangelist at HPE Labs, the second most important supplier-run research labs in the business after IBM Research.

It's also justified because disk manufacturer WD has over several years become one of the most important solid state chip-to-component-to-systems vendors in the business and Fink will provide an interesting view on its strategy.

In other words, the quality and relevance of the information we are getting is worth the exception. If you disagree then get in touch and bust my balls.

Martin Fink

Western Digital CTO Martin Fink

What Martin Fink thinks

Let's start with your last questions. Yes, I really did retire from HPE. My plan was to move back to Northern Colorado and enjoy my new house there and my new grand baby boy. But I was presented with an opportunity too hard to resist and had to put that retirement plan on hold.

More than a decade ago, I started the early work on a memory-first paradigm. I still very much much believe in that future, which gets to why I joined Western Digital. I believe that there is a value-shift that is happening in our industry. That the value is shifting from compute to data.

Where we've historically looked at data stores (memory, rotating media, etc.) as the commodity, and the compute engine as the value; I think the model is reversing – where the data is the value and the compute engine is the commodity. Data is where we derive information, which gives us knowledge, then insight.

Western Digital is the only industry player that participates everywhere data has been stored, data is being stored, and data will be stored. That combination puts Western Digital in a unique place in the industry ecosystem to be able to derive insight from all sources of data. That's why I joined!

I'm now in the center of where future value comes from with the only player that can holistically look at all your data. It's a world where we bring the compute to the data, rather than move the data to the compute.

Many of your questions attempt to focus on the details of the core underlying technologies of SCM and imply a competitive position between each of them. This is looking at SCM the wrong way. You're thinking of SCM as a future commodity much like DRAM is today (buy on price alone).

The reality is that more than one type of SCM is likely to succeed and play a role. But, unlike DRAM, each will have a unique combination of characteristics that will make it ideal for one workload or another. Some of the characteristics include: cost, latency, throughput, power consumption, endurance, durability, etc. Each SCM will be optimized for a few of these, but is unlikely to be able to be the best-in-class in all of these.

Martin Fink talking about HPE's The Machine project

Martin Fink talking about HPE's The Machine project

You should also think about ReRAM and Memristor as being very closely related. ReRAM is Western Digital's choice for SCM and we have a close partnership with HPE to deliver ReRAM. You'll have to ask HPE if they still plan on using the Memristor brand going forward.

This gets to your question on Universal Memory. Yes, it's still a dream of mine that we eventually find the right technology to deliver Universal Memory. This memory would need to achieve best-in-class latency, cost/bit, endurance, and durability. We're likely still a long way from this, but it is useful for us to have that as the end-target of where we want to get to as an industry.

Here's the next critical point. Your questions are very device focused.

This is not unusual because I run into this everywhere I go. We have a tendency to want to compare individual device characteristics. For example, a common question is: "What is the latency of ReRAM relative to DRAM?" I don't want to imply that device characteristics don't matter, but it's the wrong question to ask.

The whole point of dreaming about a memory-first world is about collapsing the storage/memory hierarchy and eliminating/shortening all the code paths to data. The end point, I think is much less about device latency, and much more about system latency. Thus, my device-level latency may be higher than a competing device (or DRAM), but at the system level – if I've done the right software work – I achieve much better results.

Many (if not most) have a tendency to look at SCM as just another layer in the storage/memory hierarchy. If that's all we ever do, I will truly retire a very disappointed person.

The next point you touch on is Fabrics as they relate to NVMe and storage.

Here we get to another critical point: memory (byte addressable) semantics versus storage (block addressable) semantics. In the world we live in today, memory semantics implies DRAM, which implies volatility. Storage semantics implies HDD/SDD interfaces and persistency (non-volatility). What SCM does is break down that model. It now becomes possible to have memory semantics and non-volatility at the same time.

That's what is so exciting, but it's also very difficult for our industry to fully internalize the implications and the opportunity. Many (if not most) have a tendency to look at SCM as just another layer in the storage/memory hierarchy. The real opportunity is to rethink the programming models from the ground up and take full advantage of what SCM offers us (lower latency, high-density, lower cost, non-volatile).

Now comes the pragmatic/realistic side. This will take time. It is a journey. I fully expect us to see initial use cases for SCM that do in fact add a layer to storage/memory hierarchy. That's OK. It's part of the learning and discovery cycle. To one of your other questions, that's how I think about NVDIMM. It's a great way for us to learn about the true potential of non-volatility. It's not likely to win the density/cost award, but still a very valuable capability in the learn/discover cycle. Like any other tool, it will have a few use cases where it's a great choice.

Now, let's get to what's really important and what you didn't ask.

Your questions attempt to create some sort of controversy where none exists. As an industry, we all are working toward SCM and we'll all make progress in different ways. But there's really nothing too controversial about any of it. Here's where you can help make a contribution to our industry by focusing a lens on the issues that really matter:

SCM would allow us to have extremely high-density connections to processors.

WD_SCM_slide

WD is interested in SCM

But, today, processor vendors limit the physical connections, or how much memory can be attached. While there are practical reasons for this (like the number of pins you can put on a processor), we can work as an industry to overcome those limits together (e.g. industry-standard optical interconnects).

There's also today a limit to the maximum memory that can be attached to processors (anywhere from 16TB to 64TB, even for the beefiest of processors). In a world where I might want Exabytes of memory – semantically attached SCM, these limits are artificial and need to be overcome.

The other issue related to this are the connection standards where all the memory and storage attach together (combinations of fabrics), and we can bring the compute to the data via the fabric.

In order for us to have fabric-attached storage and memory, the industry needs to galvanize around standards that allow us to connect everything to these fabrics. That's how we bring compute to data rather than the other way around.

There are too many: GenZ, OpenCAPI, C6, RapidIO, etc. None of these are competing directly, but do have overlapping characteristics and target different uses cases. It's probably unlikely that we'll get down to just one, but we need to get together as an industry and get as close to one as possible, and then get everyone aligned around an open industry standard.

This is a place where Western Digital will work with all the industry stakeholders and bring us together to as much commonality as possible. We also need to be clear and vocal that proprietary connections are harmful to the industry overall and ask everyone to push back on any proprietary attempts. You can help by raising the visibility of this challenge and help drive the industry to a common industry standard that helps everybody.

I think this answers your questions with some context, and hopefully helps you expand the industry's thought process around SCM. We can use your help. ®


Other stories you might like

  • James Webb Space Telescope has arrived at its new home – an orbit almost a million miles from Earth

    Funnily enough, that's where we want to be right now, too

    The James Webb Space Telescope, the largest and most complex space observatory built by NASA, has reached its final destination: L2, the second Sun-Earth Lagrange point, an orbit located about a million miles away.

    Mission control sent instructions to fire the telescope's thrusters at 1400 EST (1900 UTC) on Monday. The small boost increased its speed by about 3.6 miles per hour to send it to L2, where it will orbit the Sun in line with Earth for the foreseeable future. It takes about 180 days to complete an L2 orbit, Amber Straughn, deputy project scientist for Webb Science Communications at NASA's Goddard Space Flight Center, said during a live briefing.

    "Webb, welcome home!" blurted NASA's Administrator Bill Nelson. "Congratulations to the team for all of their hard work ensuring Webb's safe arrival at L2 today. We're one step closer to uncovering the mysteries of the universe. And I can't wait to see Webb's first new views of the universe this summer."

    Continue reading
  • LG promises to make home appliance software upgradeable to take on new tasks

    Kids: empty the dishwasher! We can’t, Dad, it’s updating its OS to handle baked on grime from winter curries

    As the right to repair movement gathers pace, Korea’s LG has decided to make sure that its whitegoods can be upgraded.

    The company today announced a scheme called “Evolving Appliances For You.”

    The plan is sketchy: LG has outlined a scenario in which a customer who moves to a locale with climate markedly different to their previous home could use LG’s ThingQ app to upgrade their clothes dryer with new software that makes the appliance better suited to prevailing conditions and to the kind of fabrics you’d wear in a hotter or colder climes. The drier could also get new hardware to handle its new location. An image distributed by LG shows off the ability to change the tune a dryer plays after it finishes a load.

    Continue reading
  • IBM confirms new mainframe to arrive ‘late’ in first half of 2022

    Hybrid cloud is Big Blue's big bet, but big iron is predicted to bring a welcome revenue boost

    IBM has confirmed that a new model of its Z Series mainframes will arrive “late in the first half” of 2022 and emphasised the new device’s debut as a source of improved revenue for the company’s infrastructure business.

    CFO James Kavanaugh put the release on the roadmap during Big Blue’s Q4 2021 earnings call on Monday. The CFO suggested the new release will make a positive impact on IBM’s revenue, which came in at $16.7 billion for the quarter and $57.35bn for the year. The Q4 number was up 6.5 per cent year on year, the annual number was a $2.2bn jump.

    Kavanaugh mentioned the mainframe because revenue from the big iron was down four points in the quarter, a dip that Big Blue attributed to the fact that its last mainframe – the Z15 – emerged in 2019 and the sales cycle has naturally ebbed after eleven quarters of sales. But what a sales cycle it was: IBM says the Z15 has done better than its predecessor and seen shipments that can power more MIPS (Millions of Instructions Per Second) than in any previous program in the company’s history*.

    Continue reading

Biting the hand that feeds IT © 1998–2022