Big Blue is going to Red Hat for a Linux environment for its largest supercomputers, and it is mothballing its own LoadLeveler workload manager for x86 clusters in favor of the Platform LSF control freak that it acquired a little more than a year ago.
It is no surprise that IBM has chosen Red Hat Enterprise Linux 6 as the Linux of choice for its massively parallel BlueGene/Q supercomputers and the Power 775 behemoth that was to be the "Blue Waters" machine at the University of Illinois and that is now being positioned as a big data muncher. (Cray eventually got the Blue Waters contract.)
Both RHEL 6 and SUSE Linux Enterprise Server 11 are supported on Power-based machines, so they can in theory both run on these specialized boxes. But IBM and Red Hat got together and tuned up RHEL 6 for the 18-core PowerPC A2 in the BlueGene/Q and the four chip, 32 core Power7 multichip module used in the Power 775 server nodes, exploiting not only their processors and memory but also the proprietary interconnects that these machines both employ to scale out to 100 petaflops in the case of BlueGene/Q and several tens of petaflops with the Power 775s.
IBM and Red Hat are not just getting together to tune up RHEL 6 for these non-standard machines, but have also come up with special per-rack pricing for support contracts for Shadowman's Linux, which are only available through Big Blue.
This special packaging is called RHEL 6 High Performance Computing, appropriately enough, and it is available on half or full racks with BlueGene/Q machines and for each quad-core module in the Power 775 nodes, and contract terms run from two to five years, which is the practical useful life for a capacity-class supercomputer. And it is considerably cheaper than buying RHEL 6 for plain vanilla Power7-based servers, too.
A rack of BlueGene/Q boxes has 32 processor nodes in a node card, 16 node cards in a midplane, and two midplanes in a rack. That is 1,024 nodes in a rack, as you can see here.
A standard subscription to RHEL 6 for a two-socket Power7-based server is $2,700 and for a four-socket server is $5,400, so a single-socket node like that used in the BlueGene/Q system would cost $1,350. So a rack would run you $1.1m per year at list price.
That is nuts, obviously, and no cheapskate HPC shop with people who understand Linux as well as either IBM or Red Hat is going to pay that. But a two-year contract costs only $90,000, or $45,000 per year, or 24.4 times less money as list price for the Power-based machines. (It is $43.95 per node per year.)
If you go all the way to a five-year contract, you can get it for a rack of BlueGene/Q for $225,000, which works out to the same $45,000 per year per rack. So customers are not getting a discount for a longer term contract, as is common given the time value of money. But look at the discount rate off list for RHEL support.
On the Power 775 servers, the special RHEL 6 edition from IBM costs $1,066 for each quad-core module for a two year contract, or $533 per socket per year. That works out to 2.5 times less money per socket than a regular RHEL for Power support contract, and if you normalize it for core counts, it is more like ten times cheaper.
In other HPC software news, IBM has announced that it is going to put all of its weight behind the Platform LSF workload scheduler on x86-based clusters and withdraw its own Tivoli-branded LoadLeveler program for x86-based machines. IBM will sell LoadLeveler for x86-based machines until March 15 of next year and support the software until April 30, 2015.
The LoadLeveler V5 for both AIX and Linux on Power will continue to be sold and supported on Power Systems servers and the variant for the BlueGene/Q will also still be available, too. That said, IBM is telling customers that Platform LSF is the workload scheduler of choice for its System x, PureFlex, and Power Systems clusters and grids, so take that into consideration when you are planning. ®