Flashboys: HEELLLP, we're trapped in a process size shrink crunch
How can we escape the dreaded NAND-woes?
The NAND flash industry is facing a process size shrink crunch and no replacement technology is ready. Unless 3D die stacking works, we are facing a solid state storage capacity shortage.
The NAND flash foundries are pumping out more and more sub-20nm NAND. Previously they'd mostly produced 2Xnm dies – that is NAND dies with a 29-20nm process geometry. Shrinking the die size means that a standard wafer can hold more dies and these each cost less to produce than the previous generation, leading to price falls.
They are also smaller in physical size, so more of them can be used in the same space than the previous generation, leading to capacity increases. A standard 2.5-inch solid state drive (SSD) may store 256GB of data with 35nm NAND and 400GB with 25nm flash. Moving to, say, 15nm flash could mean that SSD could store 600GB, and a sub-10nm die could mean 800GB or more. Everything looks rosy, only it isn't.
Flash foundries take HOW MANY years to build?
The enterprise flash market is set for boom times as all primary data is set to move from spinning disk to flash so as to avoid the disk latency access time tax. There are numerous hybrid flash/disk array start-ups, more all-flash array start-ups, lots of server PCIe flash card start-ups and many companies producing SSDs – yet there is no rush to build more flash foundries.
Yes, Toshiba is opening a fifth flash fab and Samsung expanding its foundry capacity, but there is no general rush to build new flash fabs. Why not?
Gartner storage analyst Valdis Filks said: "A flash foundry takes five years to build and costs $5bn. By the time any new foundries are ready flash could be facing replacement by better technology."
The problem is down at the electron level. As each flash generation comes along the number of electrons available to hold the binary data in each cell decreases. That means the cell's ability to reliably hold data goes down, error rates go up, and the flash controllers error detection and correction technology gets more and more complication and, ultimately, cannot cope.
In NAND's floating gate technology, the number of electrons in a gate decreases as the process geometry size is reduced, as the chart above indicates. He says that below 10nm the number of critical electrons in a gate can be as few as 10 – and that losing 10 electrons could seriously affect the gate's functioning. He says there are a variety of issues with such very small cells, such as bit-line loading, interference and leakage, leading to signal retention and reliability issues, for which, currently, there are no solutions.
These problems may make 10nm NAND technology impractical and sub-10nm impossible. Park suggests that 3D stacking, putting dies on on top of another, could be away out of this trap. He charts various approaches and identifies issues with each one, mentioning yield and retention as overall issues.
Park's conclusion is that new memory types are needed; a post-NAND era is beckoning with replacement memory technology offering DRAM-like speed and addressability and NAND non-volatility.
One implication of this is the effect on NAND fab owners foundry investment plans. Why spend $5bn plus and take five years to build a flash foundry when, in five years time, NAND is at the end of its life and a post-NAND technology is being developed, or even in production.
Another is on flash array vendors whose software manages NAND and looks after wear-levelling, garbage collection, and write cycle reduction issues. If the NAND the software looks after changes to some other technology then their software has to change as well. If they have hardware looking after this, like an SSD FPGA or ASIC then that has to change too. These are big changes, big deals for the companies concerned, such as array vendors Nimbus Data, Pure Storage, Skyera, Violin Memory, Whiptail, and SSD and PCIe flash card vendors like Intel, Hitachi GST, LSI, Micron, OCZ, Samsung, SanDisk, Seagate, STEC and others.
What replacement technologies are possible?
Changing phase, electron spin and resistive RAM
Park focused on three potential NAND and DRAM replacement technologies that Hynix is working on: Phase-Change Memory (PCRAM), Spin-Transfer Torque RAM (STT-RAM) and Resistive RAM (ReRAM) – HP's memristor.
His presentation reviewed all three and mentioned each had issues. PCRAM has cost and power concerns, so a relatively large reset current is needed. STT-RAM has cost-competitiveness issues and there are problems in AC timing, active power needs and bandwidth.
ReRAM (Resistive RAM) for Hynix means the memristor; Park identified the issues here as "[v]arious and unclear switching mechanisms."
Understanding memristor switching
Violin CTO and co-founder Jon Bennett says there is an academic pissing contest going on over the memristor – see here for example – with some boffins asserting that HP has got its memristor ideas wrong.
Violin CTO Jon Bennett
We have shown by means of a thorough analysis in terms of electrochemistry that HP’s “memristor” model is misleading. Our arguments are based upon textbook electrochemistry and can be easily reproduced. There are no real devices which would operate in accordance with HP’s model because the model is by itself in conflict with fundamentals of electrochemistry. There seems to be no way out; otherwise, somebody would have tried to refute our argumentation in the meantime. Thus, HP’s memristor research group does not have found a realistic physical model for a working memristive device Probably, that is one reason why SK Hynix seems still to be in search for the underlying switching mechanism.
Previously, HP senior fellow Stan Williams has said:
Our partner, Hynix, is a major producer of flash memory, and memristors will cannibalise its existing business by replacing some flash memory with a different technology. So the way we time the introduction of memristors turns out to be important. There's a lot more money being spent on understanding and modeling the market than on any of the research.
Meanwhile, Bennett had this to say of this boffin fight-fest over the memristor:
"If it works who cares?"
Gartner's Filks believes the memristor could represent the saving of HP. He said he believes that HP has no really high growth technology apart from the memristor, and it could do for HP now what printers did in the past. El Reg's storage desk reckons CEO Meg Whitman is probably going to have to do what Apotheker tried to do: get rid of low margin, low growth hardware businesses (like PCs) and go for growth to rescue HP. Apotheker just tried to move too quickly and without consultation; the board and other HP execs took fright.
Violin CTO Jon Bennet: "The fab run-out story is wrong."
If memristors become the post-NAND technology then HP stands to earn gigantic revenues.
Three technologies – three roles
Park says Hynix currently sees potentially different roles for the three technologies; PCM could be used for a CPU's working memory; STT-RAM could be both working and cache memory; and ReRAM could be storage memory replacing NAND and disk.
Park's overall conclusion is that all three NAND/DRAM replacement technologies still need work – there is no clear winner, which leaves the solid state storage industry in a hole. NAND development is going to hit a wall, capacity demands are going up, yet there will be no way to meet them – we'll run out of fab capacity.
3D will rule okay
Jon Bennett disagrees with the idea that we face a flash scaling trap. His reading of the situation is that 3D stacking will be the answer and says: "What's coming after process shrinkage runs out is high-rise (3D). I have see wafers of 3D flash. It's not slide-ware. Once you go to 3D you have lots of room to work with."
He cautions: "You don't go vertical at the current small cell size; you step back, and then you can shrink the process size ahead and so shrink the 3D size."
El Reg supposes that we could see 3D being implemented in 2X NAND and then, in two or three years time, shrinking to 1X NAND, or, we guess, do it in 3X NAND now – thus giving us headroom for two shrinks ahead, first to 2X NAND and then to 1X.
Bennett concludes: "Then, in five years time we'll have PCM, IBM's Racetrack or some other technology. ... So the fab run-out story is wrong." ®