This article is more than 1 year old

Scale-out storage: Proprietary? Commodity? Or both?

Depends what the object of your quest is...

Scale-out or object – or both?

On the NAS front, there are several open-source options for scale-out storage, including GlusterFS, Lustre, CephFS and OpenStack, plus several proprietary ones based on open-source technologies. Many of these build on ZFS, the file system developed at Sun Microsystems for Solaris and subsequently ported to many other platforms. “File systems take a long time to build, that's why you see so many ZFS-based projects,” comments Tarkan Maner, the CEO of Nexenta Systems, which sells and supports software-based storage appliances which derive from and build upon several open-source technologies, including ZFS, OpenSolaris and Ubuntu.

The mention there of Ceph – which has also been used as a foundation by commercial developers, for example it is at the core of Fujitsu's Eternus CD10000 hyperscale arrays – reminds us again that scale-out is the norm for object storage. Object platforms such as Ceph operate in somewhat different ways from scale-out NAS, though. They distribute and replicate objects (that is, file data plus associated metadata) across the storage nodes available to them, usually using fault-tolerant technologies known as forward error correction (FEC) or erasure coding.

What CephFS does is to provide a file system interface to the Ceph object store – and given that most objects are really just files with extra metadata, and that an object store can be regarded in many ways as merely an enhanced file system, the potential for using one as the other should not surprise. Similarly, Ceph RBD presents a block interface to the object store. This all hints at a degree of layered inefficiency of course, but if the result solves a problem this can often be worked around or absorbed.

In any case, inefficiency of this kind is hardly new: in the early days of unified block/file storage, there were devices which were originally designed as filers, with file systems running on block storage. In order to also provision block storage on the SAN, an extra software layer presented a file as a virtual block device. If another tool then used those SAN blocks to provide file storage – as was perfectly possible – it created absurd levels of nested performance overhead.

That depth of overhead was rare, but to have one storage system running over another is not. Indeed, Mike King argues that one reason why the open-source options for scale-out will always suffer on the performance front is that in order to achieve hardware independence they rely on there being a file system underneath to look after and abstract the hardware. In contrast, if you develop and qualify the hardware, you can address it directly. “It's a question of whether your software knows your storage. Open source can only work because it has the file system underneath and that deals with the hardware,” he says.

The hardware is always present

Alex McDonald, who sits on the SNIA (Storage Networking Industry Association) technical council and also works in NetApp's Office of the CTO, tends to agree. “You can't beat the hardware – at the end of the day, the hardware always pokes through,” he says. “There's virtualisation in everything. It can fix some problems such as load management, and it can manage the environment better, but it can't create CPU cycles that aren't there, and sometimes I think that's what we try to do.”

He continues, “A lot of the discussion is to do with packaging – the industry is having packaging issues. But it's like when people notice Tetrapak has come to dominate the orange juice market – they focus too much on the Tetrapak and not enough on the juice inside. A white box is just packaging. I'm more interested in the contents – the bits in between.”

As McDonald points out, as long as the storage system provides the relevant APIs and standards and delivers the services asked of it, it shouldn't matter what is actually going on inside the box. And while the acquisition costs might be higher for proprietary hardware and software, he argues that their quality will shine through.

Not surprisingly though, others firmly disagree. “The key is to be hardware-independent, because hardware is a commodity,” declares Nexenta's Tarkan Maner. He argues that the cost advantages of commodity hardware considerably outweigh the performance advantages of proprietary hardware.

“We can run on very low-cost hardware, and where you might pay $1000 per TB for enterprise storage, our software with everything included gets up to $250 or $300,” he says. “Our storage system is free and open source, we then sell products and services on top, for instance management, global de-duplication, encryption. We have two exabytes under management.”

Proprietary and commodity?

But proprietary storage software can also take advantage of commodity hardware, says Alex Best, director for technical business development with DataCore Software, and he argues this approach can provide the best of both – less expensive hardware, and stronger integration.

“You can be hardware-agnostic and still address raw drives. For example we have evaluated Sandisk's InfiniFlash all-flash hardware, which has no controller software,” he says. “Our SANsymphony software can scale out to 64 nodes, with a mix of hyperconverged and dedicated storage nodes working together.”

So what is an independent developer of storage software like DataCore doing linking up with a hardware supplier such as Fujitsu in Germany to deliver a packaged and productised Storage Virtualisation Appliance? (DataCore also sells through Cisco, Dell, Lenovo and several others, incidentally, while Fujitsu also has FalconStor-based storage virtualisation packages.) Part of it is of course that, even with commodity hardware, there is considerable potential for differing levels of performance and quality, and as mentioned earlier, the hardware will always poke through.

But in addition, as attractive as the DIY white-box approach might be to some users, for others it isn't appropriate. “With big companies it's often not the pricing, it's meeting their objectives – you need a clear support and sales direction,” Best explains. “For example, Deutsche Bank doesn't talk to some local reseller, it talks direct to Fujitsu.”

He adds, “It is always tricky for a hardware-agnostic software company to make friends with hardware vendors! The key with the Fujitsu appliances is to cover the administrative aspects and provide big companies with one person to talk to.”

Looking forward, what more can we expect to see in scale-out storage? More virtualisation and more scalability, for a start – more nodes and bigger file systems. However, while scale-out NAS can provide high performance, it is limited to perhaps a few PB and that does come at a cost. In particular there is a networking complexity and cost, with some already implementing 40Gbit Ethernet or InfiniBand for storage traffic.

Object storage can certainly provide many of the benefits rather more simply and can scale higher, towards the multi-exabyte level. It can also be more resilient – self-healing erasure coding is faster and more efficient than legacy technologies such as RAID. So for many users a shift to cloud-oriented object storage – perhaps with file and block overlays – will pay dividends for much of their unstructured data.

For outright performance and low latency though, and of course for compatibility with today's applications, scale-out block and file storage is likely to remain king. File storage is also likely to be the better option where you have frequently changing data, because object storage is built with relatively static data in mind. ®

More about

More about

More about

TIP US OFF

Send us news


Other stories you might like