Supercomputer maker Cray doesn't talk much about the systems software that runs on its massively parallel, midrange, or entry HPC gear, but it probably will start doing so more because of the work it has done to make its non-standard XT boxes look a little less proprietary as far as Linux applications are concerned.
The company has just started shipping the third generation of its Cray Linux Environment, and this one has a new feature called Cluster Compatibility Mode that is sure to get the interest of HPC shops that might not even give Cray's gear more than a passing thought because of the perception they have of being exotic and expensive. While the largest supercomputer labs have enough money and manpower to create a parallel super using schools of fish with OS/2 grafted onto their gills if they decided this was a good idea, entry and midrange HPC shops have all the same budget and skills constraints that SMBs have in the "real world" of commercial computing.
So exotic and expensive will not sell down there, and that's why Cray rolled out the CX1 baby super running Windows in conjunction wit Intel and Microsoft a year and a half ago and just last month debuted a new midrange lineup called the CX1000 based on Intel's Xeon 5600 and 7500 processors and graphics co-processors from Nvidia.
What Cray customers want is something that is as simple to use as these baby and midrange clusters, which run kosher Red Hat Enterprise Linux or Windows HPC Server 2008 and which have MPI software extensions that run over Ethernet or InfiniBand, but which work on massively parallel Opteron-Linux XT machines and their proprietary SeaStar interconnect.
The SeaStar interconnect at the heart of the XT line started out as the "Red Storm" project at Sandi National Laboratories and was only gradually commercialized by Cray. This interconnect doesn't look or smell anything like either Ethernet or InfiniBand, so applications have to be tweaked to run atop it, which turns a lot of customers off and which, quite frankly, has forced Cray to invent whole new product lines like the CX1 and CX1000 to chase revenue opportunities.
According to Barry Bolding, vice president of scalable systems at Cray, when the Red Storm system was first designed, the system's creators were sure that if they wanted to create a system that scaled to hundreds of thousands of processors in parallel, they would have to create a skinnied-down microkernel based on Linux to be able to squeeze more performance out of the box than would be possible based on a full-blown Linux distribution. This home-grown Linux distro was known as the Cray Linux Environment 1.0, and it ran on the Red Storm super and the XT3 commercialized versions of the boxes.
About four years ago, Cray looked at processor and Linux roadmaps and decided that it would take another approach to putting Linux on its Opteron-based parallel machines, one that would make it more compatible with plain vanilla Linux distros and one that would allow it to support more processors and peripherals besides those based into its homegrown Linux. And so, the company took Novell's SUSE Linux Enterprise Server 10 distro and did a little semi-homemade cooking like Sandra Lee, New York State attorney general Andrew Cuomo's girlfriend, is famous for doing, quickly turning processed foods into something that looks like it is homemade and took hours in the kitchen slaving over.
Cray took SLES 10 and locked it down and hardened it in various ways, and then added some tweaks for HPC shops in general and specifically to support the SeaStar interconnect at the heart of the XT machines, which are not supported on SLES 10. Cray also disabled a whole bunch of features in the Linux that are not useful on XT machines and that just end up causing the supers to get less work done. The resulting modified SLES 10 was called the Cray Linux Environment 2.0, and it was supported on the XT3 and XT4 supers that were shipping at the time. The current XT5 machines also can run CLE 2.0.
With the Cray Linux Environment 3.0 operating system, Cray is moving up to the Novell Linux stack to support SLES 11, which has been out for a little more than a year and which is due to get its first service pack soon. Bolding says that most of the current installed base has upgraded its iron and therefore its software to CLE 2.0, so they are ready and eager for the enhancements that come with CLE 3.0.
The big change with CLE 3.0, and something that is not part of the standard SLES 11, is the Cluster Compatibility Mode. In the past, because the SeaStar interconnect is not standard, Cray required HPC shops that wanted to run parallel Linux applications and MPI stacks on their XT machines to compile them on CLE, which had tweaks so the compiler could see the SeaStar interconnect and squeeze the absolute most performance out of it. With Cluster Compatibility Mode, Cray is adding an emulation layer to the SeaStar drivers so as far as Linux is concerned, it looks like a normal TCP/IP driver and a regular set of x64-based applications that are tuned to run on Ethernet-based clusters using the MPI protocol for linking nodes can run unchanged atop CLE 3.0 on the XT iron. No more recompiling applications to run on XT iron.