Interview When "retired" HP Labs head Martin Fink surprisingly joined Western Digital as CTO we were interested how this memory-driven computing proponent would affect WD's solid state tech strategy.
After a few weeks we sent him some questions, to which he responded: "Some of your questions need a bit of context around them. In some cases, your questions imply something that's not a valid assumption."
(Presumably he is also bound by various confidentiality agreements.)
He added: "Rather than do a point-by-point response, let me try to answer your questions with a bit more color around them. Hopefully that will help."
Here are the questions we asked:
- Do you think the server vendors will adopt memory-driven computing and for what reasons?
- Why is Storage-Class Memory (SCM) important?
- Can SCM replace DRAM and NAND and become a universal memory?
- If we view SCM as filling a price/performance gap between DRAM and NAND, then how big is this gap and how many technology segments might there be in it?
- How would you characterise Memristor, ReRAM, STT-RAM, MRAM, Carbon Nanotube Memory, PCM and 3D XPoint as SCM technologies in terms of performance, technology maturity and productisation likelihood?
- Why isn't NVRAM a viable SCM candidate?
- How would you contrast and compare NVMe over Fabrics-accessed storage arrays, using say NVMe flash drives, with SCM technologies?
- Are current NVDIMM technologies, such as those from Diablo Technologies and Netlist, anything other than short-term stopgaps before SCM arrives? What is their role in servers?
- Has time run out for the Memristor technology?
- What format will SCM use, such as NVDIMM, a PCIe card of some sort (such as M.2), or some other format?
- Will SCM be a server OEM-supplied technology or/and a retrofit technology by VARs and system houses?
- Did you leave HP Labs to retire?
- What attracted you to the Western Digital CTO position?
Next is Martin Fink's response but first we'll try to answer a question you, dear reader, might have: why are we providing a pulpit for a supplier's CTO?
It's justified in this special case, we'd argue, because of Martin Fink's former position as Memristor and memory-driven computing evangelist at HPE Labs, the second most important supplier-run research labs in the business after IBM Research.
It's also justified because disk manufacturer WD has over several years become one of the most important solid state chip-to-component-to-systems vendors in the business and Fink will provide an interesting view on its strategy.
In other words, the quality and relevance of the information we are getting is worth the exception. If you disagree then get in touch and bust my balls.
Western Digital CTO Martin Fink
What Martin Fink thinks
Let's start with your last questions. Yes, I really did retire from HPE. My plan was to move back to Northern Colorado and enjoy my new house there and my new grand baby boy. But I was presented with an opportunity too hard to resist and had to put that retirement plan on hold.
More than a decade ago, I started the early work on a memory-first paradigm. I still very much much believe in that future, which gets to why I joined Western Digital. I believe that there is a value-shift that is happening in our industry. That the value is shifting from compute to data.
Where we've historically looked at data stores (memory, rotating media, etc.) as the commodity, and the compute engine as the value; I think the model is reversing – where the data is the value and the compute engine is the commodity. Data is where we derive information, which gives us knowledge, then insight.
Western Digital is the only industry player that participates everywhere data has been stored, data is being stored, and data will be stored. That combination puts Western Digital in a unique place in the industry ecosystem to be able to derive insight from all sources of data. That's why I joined!
I'm now in the center of where future value comes from with the only player that can holistically look at all your data. It's a world where we bring the compute to the data, rather than move the data to the compute.
Many of your questions attempt to focus on the details of the core underlying technologies of SCM and imply a competitive position between each of them. This is looking at SCM the wrong way. You're thinking of SCM as a future commodity much like DRAM is today (buy on price alone).
The reality is that more than one type of SCM is likely to succeed and play a role. But, unlike DRAM, each will have a unique combination of characteristics that will make it ideal for one workload or another. Some of the characteristics include: cost, latency, throughput, power consumption, endurance, durability, etc. Each SCM will be optimized for a few of these, but is unlikely to be able to be the best-in-class in all of these.
Martin Fink talking about HPE's The Machine project
You should also think about ReRAM and Memristor as being very closely related. ReRAM is Western Digital's choice for SCM and we have a close partnership with HPE to deliver ReRAM. You'll have to ask HPE if they still plan on using the Memristor brand going forward.
This gets to your question on Universal Memory. Yes, it's still a dream of mine that we eventually find the right technology to deliver Universal Memory. This memory would need to achieve best-in-class latency, cost/bit, endurance, and durability. We're likely still a long way from this, but it is useful for us to have that as the end-target of where we want to get to as an industry.
Here's the next critical point. Your questions are very device focused.
This is not unusual because I run into this everywhere I go. We have a tendency to want to compare individual device characteristics. For example, a common question is: "What is the latency of ReRAM relative to DRAM?" I don't want to imply that device characteristics don't matter, but it's the wrong question to ask.
The whole point of dreaming about a memory-first world is about collapsing the storage/memory hierarchy and eliminating/shortening all the code paths to data. The end point, I think is much less about device latency, and much more about system latency. Thus, my device-level latency may be higher than a competing device (or DRAM), but at the system level – if I've done the right software work – I achieve much better results.
Many (if not most) have a tendency to look at SCM as just another layer in the storage/memory hierarchy. If that's all we ever do, I will truly retire a very disappointed person.
The next point you touch on is Fabrics as they relate to NVMe and storage.
Here we get to another critical point: memory (byte addressable) semantics versus storage (block addressable) semantics. In the world we live in today, memory semantics implies DRAM, which implies volatility. Storage semantics implies HDD/SDD interfaces and persistency (non-volatility). What SCM does is break down that model. It now becomes possible to have memory semantics and non-volatility at the same time.
That's what is so exciting, but it's also very difficult for our industry to fully internalize the implications and the opportunity. Many (if not most) have a tendency to look at SCM as just another layer in the storage/memory hierarchy. The real opportunity is to rethink the programming models from the ground up and take full advantage of what SCM offers us (lower latency, high-density, lower cost, non-volatile).
Now comes the pragmatic/realistic side. This will take time. It is a journey. I fully expect us to see initial use cases for SCM that do in fact add a layer to storage/memory hierarchy. That's OK. It's part of the learning and discovery cycle. To one of your other questions, that's how I think about NVDIMM. It's a great way for us to learn about the true potential of non-volatility. It's not likely to win the density/cost award, but still a very valuable capability in the learn/discover cycle. Like any other tool, it will have a few use cases where it's a great choice.
Now, let's get to what's really important and what you didn't ask.
Your questions attempt to create some sort of controversy where none exists. As an industry, we all are working toward SCM and we'll all make progress in different ways. But there's really nothing too controversial about any of it. Here's where you can help make a contribution to our industry by focusing a lens on the issues that really matter:
SCM would allow us to have extremely high-density connections to processors.
WD is interested in SCM
But, today, processor vendors limit the physical connections, or how much memory can be attached. While there are practical reasons for this (like the number of pins you can put on a processor), we can work as an industry to overcome those limits together (e.g. industry-standard optical interconnects).
There's also today a limit to the maximum memory that can be attached to processors (anywhere from 16TB to 64TB, even for the beefiest of processors). In a world where I might want Exabytes of memory – semantically attached SCM, these limits are artificial and need to be overcome.
The other issue related to this are the connection standards where all the memory and storage attach together (combinations of fabrics), and we can bring the compute to the data via the fabric.
In order for us to have fabric-attached storage and memory, the industry needs to galvanize around standards that allow us to connect everything to these fabrics. That's how we bring compute to data rather than the other way around.
There are too many: GenZ, OpenCAPI, C6, RapidIO, etc. None of these are competing directly, but do have overlapping characteristics and target different uses cases. It's probably unlikely that we'll get down to just one, but we need to get together as an industry and get as close to one as possible, and then get everyone aligned around an open industry standard.
This is a place where Western Digital will work with all the industry stakeholders and bring us together to as much commonality as possible. We also need to be clear and vocal that proprietary connections are harmful to the industry overall and ask everyone to push back on any proprietary attempts. You can help by raising the visibility of this challenge and help drive the industry to a common industry standard that helps everybody.
I think this answers your questions with some context, and hopefully helps you expand the industry's thought process around SCM. We can use your help. ®