Has NetApp solved the TLC for biz puzzle? We look at its MYSTERY SSD Mars sauce
They MUST know something others don't. Right?
Analysis NetApp is pinning FlashRay's future on TLC flash, which everyone else says is not fit for enterprise use. We talked to NetApp to find out how and why it's apparently got hold of secret TLC sauce that no one else has.
Let's reiterate that TLC NAND is a crap enterprise storage technology. It stores 3 bits per cell instead of MLC's 2 bits/cell and SLC's 1 bit/cell.
It's to do with its write cycles.
SanDisk has introduced an Ultra II TLC client SSD while Samsung has actually made a TLC SSD for data centre use, the PM835T. Sammy did not reveal its product's write endurance in its announcement but did say it "expects the adoption of 3-bit [per cell] SSDs in data centres to advance rapidly in replacing the 2-bit SSD market."
SanDisk's SMART Storage unit estimates TLC SSDs with the 1X cell geometry that Samsung is using have an approximate 500 phase/erase cycle limit. That is, you write to them 500 times but after that, game over, the cell's dead.
In 2012 Anandtech reckoned 2X geometry NAND had approximately 100,000 P/E cycles, MLC flash had 3,000 and TLC flash had 1,000 to 1,500. It might be 500-750 cycles with 1X-class NAND.
This could be okay for cold data storage - an almost WORM (Write Once and Read Many)-like way of using flash - but NetApp isn't talking about such a niche use case. It's saying TLC flash will replace MLC flash in the general FlashRay use case.
TLC flash is cheaper than MLC flash. The usual way of extending poor flash endurance is to reduce the number if write cycles through data management and to over-provision, keeping (say) 10 per cent of the flash cells aside and using them to replace failed cells and so extend the SSD's life.
With such low endurance in TLC flash, then, to go from 500 basic cycles to 50,000 would mean over-provisioning to such a level as to render TLC flash's cost advantage null. This is the main reason why SSD and array vendors, other than NetApp, say TLC flash is not for enterprise use.
NetApp's secret sauce
NetApp says that FlashRay uses SSDs and that its Mars OS can make use of TLC flash and other non-volatile technologies that will come along after NAND. But that means, by default, NetApp hands off SD management, meaning write reduction and over-provisioning inside the SSD, to the SSD's controller.
Violin Memory would say you have to have expertise at the flash die level to really understand what is going on with NAND and affect it. Not so fast, says NetApp.
Back in June 2012, Duc Nguyen, associate director at Samsung Semiconductor Europe, said: "The whole stack has to be involved in minimising TLC writes," meaning the host operating system, the application, the file system, and the controllers and then the TLC NAND itself.
This links to what NetApp's Val Bercovici of the Office of the CTO said to El Reg about FlashRay's Mars operating system. He was speaking at NetApp's Insight event in Las Vegas: "We're able to do a ton of stuff [in Mars] that talks through deep, deep interfaces to the SSD ... We have access through the firmware of our SSD suppliers to control flash at a very granular level," and "We don't have to use stock SSD firmware to control over-provisioning."
WAFL underlies SSD FTL
Bercovici says that because Mars' OS designers were long-time NetApp ONTAP engineers, they knew that its disk-based WAFL file system and its log-structured software is the foundation of all the Flash Translation Layer (FTL) firmware in SSDs. That's why NetApp is so confident in going its own way with TLC flash and its use by enterprises.
NetApp's Ty McConney, corporate veep for flash solutions, said the Mars IS sends write patterns to FlashRay's SSDs that minimises what the FTL in the SSD has to do.
Steve Strange, a member of the Mars OS development team, said the FTL won't constantly have to do segment cleaning of its own because of this Mars write patterning.
McConney added that NetApp has close relationships with Samsung, SanDisk and Toshiba, and that FlashRay and Mars are designed to work with their next-generation flash products.
If NetApp is right, and so far it is on its own, then it will be able to leapfrog other flash array suppliers because its Mars OS secret sauce will be able to use cheaper TLC chips more effectively than its competitors and so undercut them on cost and outstrip them on performance, especially if they stay with MLC technology; FlashRay will be fleeter.
The Mars OS software techniques may well be retro-fitted to ONTAP and so give all-flash FAS arrays a boost as well.
Jim Handy of Objective Analysis said: "Once NetApp gets TLC, then so will everyone else, so I can't imagine why that would provide any competitive edge for FlashRay. Same goes for 3D. It's a level playing field."
Is NetApp right? We'll only be able to judge that when suitable, next-generation TLC SSDs come along from Samsung, SanDisk and Toshiba. Late next year is the time we for that, we think. Until then, though, FlashRay may not have as much advantage as NetApp thinks it eventually will over competing all-flash arrays. ®