Red Hat Storage Server NAS takes on Lustre, NetApp
A veritable Gluster, fsck
Red Hat Summit Red Hat has a server operating system, middleware, virtualization, and a cloud fabric – and now it has production-grade, scale-out clustered network-attached storage now that it is shipping its Storage Server 2.0 software.
The software is a gussied up version of the GlusterFS file system that was spun out of a project at Lawrence Livermore National Laboratory in 2005 and that Red Hat acquired in October last year for $136m. The GlusterFS file system prototyped in 2007 and its first release came out in May 2009, and at the time of the Red Hat acquisition it was used in production by more than 100 companies, including Pandora, Box.net, Deutche Bank, Samsung, BAE Systems, Barnes & Noble, and Autodesk.
Red Hat ripped out all of the company logos, put GlusterFS through the Red Hat QA mill, and cranked out a beta of Red Hat Storage Server 2.0 back in April, and this time all of the code was opened up instead of the open-core approach that Gluster had used with its code when it was an independent company.
Red Hat Storage Server – what was wrong with the Gluster name, anyway? – runs atop Linux on x86-based server nodes, and has been tested and certified on over 50 different machines from 9 different vendors, according to Ranga Rangachari, general manager of storage, who unveiled the production release of GlusterFS at a press conference at the Red Hat Summit in Boston.
The software was designed to work with SAS or SATA disks, RAID-protected or not, and to ride on top of ext3, ext4, XFS, and other file systems on each server node. The GlusterFS aggregates and exposes the file systems as a global namespace that spans the storage server nodes, and you can mount it as an NFS or CIFS mount point or use the proprietary Fuse native access client, which has about twice the oomph serving up files.
The open source implementation of GlusterFS might run on any Linux and support any Linux operating system, but Storage Server 2.0 uses the XFS file system on the server nodes (which Red Hat has already commercialized) and uses RHEL 6 as the underlying operating system on the nodes in the cluster.
What's in Red Hat Storage Server 2.0
The bundle also includes a technology preview of the Storage Console Management Station, a graphical management tool for GlusterFS that is based on the oVirt virtualization management console. The Hadoop connector, which was available in the beta, is also in tech preview and not yet ready for prime time. With the connector, you can link the Hadoop Distributed File System to GlusterFS, or if you want, you can replace HDFS with GlusterFS. No word on when these tech previews will be available for production use.
Back when Red Hat did the deal to buy Gluster, CTO Brian Stevens said that the interesting bit about the company was that it used a no-metadata server model, based on an elastic hashing algorithm, which lets GlusterFS scale across more than 500 x86 servers to create a distributed NAS that can hold petabytes of data. This is in stark contrast to clustered file systems that have a single metadata server at their heart, which is both a performance and scalability bottleneck.
That wasn't the only reason that Red Hat went for Gluster, however. Stevens said at the press conference today that Red Hat did due diligence on a bunch of different scale-out storage options, including looking at proprietary file systems that it might license or buy and then open source as well as "tin wrapped software" that is buried in NAS hardware appliances that it might set free.
What Red Hat discovered was that all of this software was generally too complex to easily build an open source development community – and there was Gluster, with a community of over 2,000 developers already fired up and 100 production customers.
"It is a very difficult model to take proprietary code and open source it," said Stevens. "We have done that. Our choice is to start open and stay open."
The company also wanted to move fast, with the NAS storage market expected to grow from $4bn this year to $7bn in 2015. Time is quite literally a lot of money here.
Red Hat also needs something that scales and that is compatible with the Amazon Web Services EC2 compute cloud and its related S3 object and EBS block storage services, and Gluster already had a virtualized instance of its clustered file system that could ride atop S3 and EBS to make it a big fat NAS. This code was commercialized as the Virtual Storage Appliance, which launched back in February for AWS ahead of the beta for Red Hat Storage Server 2.0
After having another think about product naming, these two different ways of deploying GlusterFS are called Storage Server for On Premise and Storage Server for Public Cloud.
For the kinds of customers deploying scale-out storage, performance and scalability is what matters as much as the price. A whitepaper (PDF) describing the Storage Server architecture lays out the performance GlusterFS can deliver.
You start with a baseline GlusterFS cluster with two servers and a dozen 1TB disk drives in each server; link them up with Gigabit Ethernet networking between the nodes and you have 24TB of capacity and 200MB/sec of write performance. If you want to boost capacity by 25 per cent, you can tuck in four more drives into each server and not really affect performance. If you want to double performance, you can add two more nodes and distribute the same 24TB across those four nodes, delivering 400MB/sec of write throughput across those nodes using Gigabit Ethernet links.
Now let's get a little crazy. Now you need to get as much performance out of that Ethernet network. Red Hat says you can put eight server nodes together, each with a dozen 1TB drives, and that gives you 96TB of capacity and 800MB/sec of write bandwidth, close to saturation for the network linking the nodes. So, now you upgrade the network.
Keep all of the nodes the same, but link them with a 10GE switch and adapters. Now, you can sustain around 5,000MB/sec of write throughput on the GlusterFS. That's 25 times the performance of the baseline two-node cluster. Red Hat says that GlusterFS has been deployed in scenarios with multiple petabytes of data and with throughout in excess of 22GB/sec. If you switch to an InfiniBand network, you can get even more bandwidth and therefore more scalability.
Red Hat has not yet divulged pricing on the Storage Server 2.0 stack, but Rangachari said that a typical high-end, scale-out NAS vendor could put boxes in the field for something on the order of $1 per GB, but that Red Hat could do it for around 25 cents to 30 cents per GB.
Presuming that Red Hat is counting the x86 iron in there as well as support costs, that's a pretty dramatic reduction in cost. It would be interesting to see how Storage Server 2.0 stacks up against Lustre clusters and a mix of raw S3 and EBS storage. ®