Techies at Microsoft Research, the big brain arm of the software goliath, have taken the crown in the sorting benchmark world. The researchers are thinking about how to implement new sorting algorithms in the Bing search engine to give Microsoft a leg up on the MapReduce algorithms that underpin Google's search engine and other big data-munching applications.
In the data world, the benchmarks against which you measure your performance are collectively known as the Sort Benchmarks. The two original benchmarks proposed by techies in 1994 at Digital Equipment – led by Jim Gray, who administered the Sort Benchmarks until he was lost at sea in 2007 – were collectively known as MinuteSort and PennySort, and they are still used today. (Gray originally worked on the VAX and AlphaServer lines at DEC, but eventually moved to Microsoft.)
The MinuteSort test counts how many bytes of data you can sort using the benchmark code in 60 seconds and the PennySort test measures the amount of data you can sort for a penny's worth of system time on a machine or cluster set up to run the test. There used to be other tests, such as the TeraByte Sort, which was the amount of time it took to sort through 1TB of data, but as servers, storage, and networks have progressed, the MinuteSort has become the new TeraByte Sort. Or at least now it has, with the Microsoft team breaking through the terabyte barrier with their Flat Datacenter Storage scheme.
Microsoft has just stomped the living daylights out of a Hadoop cluster that was the previous record-holder on the MinuteSort test, and did so by substantially beefing up the network capacity between server nodes and storage and essentially chucking the whole MapReduce approach to data munching out the window.
With Hadoop and its MapReduce approach, you have data scattered and replicated around the cluster for both availability and algorithmic reasons, and you dispatch the computing to the server nodes where you need to process data – instead of trying to move data from a particular piece of storage to a server. This approach is what allows search engines like those developed by Google, Yahoo!, and Microsoft (which used to be distinct) to mine massive amounts of clickstream data to serve you the most appropriate web pages and advertisements. But Hadoop's scalability is only about 4,000 nodes and it is a batch-oriented program, not something that looks and feels like real time.
Microsoft is not releasing the details of its Flat Datacenter Storage approach yet; it may never do so because it gives the company a competitive advantage. But the company did provide some clues to how it was able to beat a Hadoop cluster configured by Yahoo!, which cloned Google's 2004 MapReduce methodology and file system to create Hadoop and the Hadoop Distributed File System, by nearly a factor of three in terms of performance – and, according to a blog post, using one-sixth the number of servers.
The Flat Datacenter Storage effort is headed up by Jeremy Elson, who works in the Distributed Systems and Networking Group at Microsoft Research. The MinuteSort run that Elson's team ran on 250 machines configured with 1,033 disk drives was able to rip through and sort 1,401 gigabytes of data in 60 seconds, handily beating a Yahoo! Hadoop configuration from 2009 that had 1,406 nodes and 5,624 disks that could process 500GB in a minute.
It is not clear that Yahoo! could have fielded a better result using a more modern version of the Apache Hadoop software and shiny new x86 iron. (By the way, this sort was on the "Daytona" version of the benchmark, which is based on using stock code, not the "Indy" version of the test, which allows for more exotic algorithms. A team at the University of California San Diego won the Indy MinuteSort race last year with a 52 node cluster of HP ProLiant DL360 G6 servers and a Cisco Systems Nexus 5096 switch. This TritonSort machine at UCSD (PDF) was able to sort 1,353GB of data in 60 seconds.)
To get its speed, the Flat Datacenter Storage team grabbed another technology from Microsoft Research, called full bisection bandwidth networks, and specifically, each node in the cluster could transmit data at 2Gb/sec and receive data at 2Gb/sec without interruption. "That’s 20 times as much bandwidth as most computers in data centers have today," Elson explained in the blog, "and harnessing it required novel techniques".
And the Daytona car beat the Indy car this time around.
Microsoft used an unnamed remote file system that was linked to that full bisection bandwidth network to feed data to all of the nodes in the cluster to run the MinuteSort test, which is the way such sorting benchmarks were done before the MapReduce method came along. MapReduce is great for certain kinds of data-munching, like when a set of data can fit inside of a single server node. But, as Elson points out, what happens when you have two very large data sets that you want to merge and then chew on? How do you do that on MapReduce? The data has to move, and it has to move to somewhere that the systems doing the sort can get access at very high speeds.
The Microsoft Research team that developed the Fast Datacenter Storage algorithm is presenting its results at the 2012 SIGMOD/PODS Conference in Scottsdale, Arizona this week.
The research behind the new sorting method was sponsored in part by Microsoft's Bing team because it can be applied to search engine results as well as to gene sequencing and stitching together aerial photographs. The company is pretty keyed up that it can get a factor of 16 improvement in the efficiency of sorting per server using a remote (and unnamed file system) compared to Hadoop and its HDFS. ®