There are many ways to gang up machinery to scale applications on groups of servers or provide a measure of disaster recovery or fault tolerance for those applications. Supercomputer customers are known for spending big bucks on exotic technology, but they're also notorious cheapskates. That's why Linux and the clustering of commodity x86 servers took off a decade ago, essentially wiping out the market for vector supercomputers and nearly knocking out RISC architectures.
So, it may come as some surprise to many of you that Stratus Technologies - one of the venerable vendors of fault tolerant servers for commercial applications - is now trying to get its x64-based ftServer machinery into supercomputer sites, thanks to the charge of Microsoft into the high performance computing arena with its Windows HPC Server 2008 edition.
Here's the deal. Fault tolerant servers are really aimed at commercial transaction processing - and they're meant for the kinds of workloads where a system crash is a big problem. (Think banking transactions - if you can bear to think of such things right now). Tandem and Stratus were two of the early sellers of fault tolerant servers, the latter being a big partner of IBM and the former having disappeared into Compaq more than a decade ago and now part of Hewlett-Packard.
Stratus doesn't make its own processors and systems any more, like it did in the old days, but it does take Xeon servers from Japanese partner NEC and then equip them with chipsets and firmware that allows for the absolute lockstepping of applications running on two distinct physical machines. These machines are identically configured, including identical processors, memory, disks, and such, and they can provide 99.999 per cent uptime for Windows or Linux operating systems.
Supercomputer centers haven't generally worried about fault tolerance, basically because they have much larger issues to worry about concerning their Linux and, Microsoft hopes, Windows clusters. There are a lot of ways to lash together server nodes to create parallel supercomputers, which are used to model everything from car crashes to the weather to nuclear explosions to the interaction of subatomic particles on an absurdly small scale and the Universe on the large scale.
The typical parallel supercomputer today uses a protocol called Message Passing Interface, or MPI, to link server nodes together so they can pass information between the nodes. This information passing is necessary since calculations that define a simulated state in a region within that simulation - say a 3D chunk of air in the atmosphere - depend on the state of the regions that surround it. So, in simulating stuff, parallel supercomputers chop the job up into pieces, model what's going on in those pieces and their interactions, and show how the whole system changes over time based on initial conditions.
While parallel supercomputer clusters have lots of clustering for scalability, they do not generally scale for high availability. This could be done, and in fact, Microsoft and Stratus will be making the argument that for key nodes in an HPC cluster, it should be done.
This is something of a surprise position for Stratus to be taking, and the company is up front about it. "Generally, Stratus has had this aversion to clusters, and clusters have been the enemy when it comes to availability," explains Denny Lane, director of product and marketing management at Stratus.
And for commercial data processing, despite the headache of setting up and maintaining high availability clusters, which have one server's applications switch over to a backup set of servers in the event of a failure on the primary machine, there is an order of magnitude (or two) difference in adoption of HA clusters over fault tolerant machines out there in the data center. Lane said that Stratus had sold 10,000 fault tolerant servers worldwide. The world consumes more than 8 million servers a year.
According to Lane, Microsoft approached Stratus with the idea that the marriage of key nodes in a supercomputing cluster running Windows HPC Server (which manage the workloads running on the cluster and access to the nodes) with fault tolerance like that provided by ftServers would make Windows supercomputers more resilient. (And since ftServers support Linux, the idea applies equally well).
In terms of Windows-based clusters, Microsoft and Stratus are suggesting that ftServers should be used in what is called the head node, as well as in the broker nodes that run the Windows Communication Foundation (WCF) stack. And perhaps file systems in baby Windows clusters could also use the ftServers too, Stratus believes, now that it is coming around to Microsoft's thinking. That leaves the workstation, an Active Directory server, a System Center Server, and maybe a mail server within the company network where the Windows HPC cluster sits running regular Windows on regular x64 servers.
So why bother using fault tolerance with Windows HPC? "We do a lot to harden the operating system," explains Lane. "We do a lot of work with the I/O vendors to allow Windows and Linux to ride out transient errors." The kind of thing you don't want to have happen to the head and broker nodes in a supercomputer cluster. No one wants to restart a job that takes days, weeks, or months to finish.
Microsoft and Stratus are targeting Windows clusters with around 50 compute nodes, which they reckon is the sweet spot for what they are offering together. That would include one head node and maybe three broker nodes running Windows HPC Server on an ftServer setup (which is actually two servers working in lockstep). A two-socket ftServer running Windows costs somewhere between $20,000 to $25,000 in a reasonable configuration. This is not cheap, of course, but neither is HA clustering and neither is losing work.
Stratus has no plans to take the idea of fault tolerant nodes in HPC clusters to the Linux market yet, but this is something that the company can - and probably will - do in the long run. Right now, Microsoft's marketing muscle is important to get the idea out there. ®