Appro, one of the surviving boutique HPC vendors, has been jazzed about GPU coprocessors for years, and John Lee, vice president of advanced technology solutions at the company, has no qualms about saying that the advent of GPU coprocessing for HPC clusters is as significant - and perhaps more significant - of a technology change as was the introduction of low-latency, high-bandwidth InfiniBand networking a decade ago. It was InfiniBand that killed off proprietary interconnects, and it may be GPUs that kill off the idea that CPUs designed to run general-purpose workloads as well as doing calculations are the right kind of machine for doing massive amounts of math in parallel.
Appro has cooked up two different hybrid machines using the M2050 GPU coprocessor, one a rack design and the other a hybrid blade design.
The rack machine is the Tetra 1U server, which comes in flavors using x64 processors from Intel or Advanced Micro Devices. The Tetra 1426G4 server is a two-socket machine sporting Intel's latest six-core Xeon 5600 processors and also cramming four - yes, I said four and I triple checked it because it was hard to believe - M2050 GPU coprocessors in the chassis.
Appro's Tetra 1U CPU-GPU box: Looks like a normal rack server, but packs a floppy punch
According to the spec sheet, this machine has eight DDR3 memory slots for the Xeon chips, supporting up to 96GB of capacity. That can't be right, since there are no 12GB memory sticks as far as I know. It has to be either 96GB with a dozen memory slots or 128GB with eight slots using 16GB sticks.
The Tetra chassis has room for six 2.5-inch SATA disks. The GPUs plug into two PCI Express x16 slots with a riser card, but there is one x4 slot left over with a riser card if you need to add something else. The Tetra 1326G4 is the AMD version of this server, and it sports the eight-core and twelve-core Opteron 6100 processors in a two-socket configuration. This AMD box has the same PCI Express 2.0 slots, two out of three of them plugging in the M2050 GPUs as in the Intel machine. The spec sheet for this machine says it supports 128GB in eight slots. The Tetra machines will offer 80 teraflops in a rack with 40 machines, and will only take a dozen racks to break a petaflops.
The Tetra hybrid machines offer about twice the compute density of rack servers compared to prior CPU-GPU machines, according to Lee. It will be shipping at the end of May and a base configuration with all four Fermi GPUs will sell for under $13,000. A beefier configuration with faster processors, more memory, and other options could cost as much as $20,000. Still, $10 per gigaflop is pretty good - provided your applications know how to speak GPU.
Among the blades
If blades float your boat, then Appro has a GreenBlade hybrid setup for you. Or rather, two. One is based on a two-socket Xeon 5600 blade lashed to a GPU blade with two M2050s, while the other uses a two-socket Opteron 6100 with a GPU blade. The GreenBlade system was announced in February 2009 and was tapped by the San Diego Supercomputing Center, Appro's flagship customer, as the basis of a flash-heavy super nicknamed "Gordon" that SDSC said it would build last fall.
The Intel/Nvidia combo blade marries the gB222X blade, which supports up to 96GB of memory and two six-core Xeon 5600 processors, to the GXB100 GPU blade, which has two M2050 GPUs on it. The AMD blade is gB322H, and is a two-socket machine with up to 96GB of memory and supporting either the eight-core or twelve-core Opteron 6100 processors. It also links directly to the GXB100 GPU blade over PCI Express links.
The GreenBlade is based on a 5U chassis with ten slots, so you can put five of these hybrid blades in the box for about 5 teraflops of aggregate number-crunching performance. With all five blades in the box and a relatively light amount of memory, such a GreenBlade will run you about $30,000, or about $6 per gigaflop. On skinny memory configurations, the Tetras are running about $6.50 per gigaflops. That's not much of a premium for twice the density. The GreenBlade CPU-GPU hybrid boxes will be available in the middle of May.
Appro's GreenBlade: not as dense on the flops as the Tetra racks.
Lee says that Appro is not just thinking about Nvidia GPUs, but is also planning a similar set of products using AMD's FireStream GPUs. But the FireStream GPUs have some issues - such as not having multi-level caches or error correction on the cache and main memory cards, as the Nvidia machines do. Moreover, a lot of people have lined up behind the CUDA environment that Nvidia has created for the Tesla GPUs, and OpenCL is not quite there yet as far as Lee is concerned. But he is convinced it will gain traction over time, and it's hard to ignore the single-precision flops advantage that the latest FireStreams (those used in the Radeon 5870 graphics cards) have over the Fermis: almost three times as high at 2.72 teraflops per GPU, and 544 gigaflops at double precision.
Whitebox server maker Super Micro is also among the first companies to ship the M2050 embedded GPU coprocessors inside x64 systems aimed at HPC shops.
Super Micro's 1U Tesla 20–based hybrid box has the tongue-twisting name 6016GT-TF-FM205. It's a two-socket box based on Intel's Xeon 5600s and has two PCI Express slots for linking up two M2050 GPUs. Super Micro also has a personal supercomputer, the 7046GT-TRF-FC405, that can be converted into a 4U rack server that supports four C2050 cards, which have the fans to keep them cool.
You will no doubt note that Appro is getting twice as many M2050s into its 1U server as is Super Micro. Engineering still matters. And perhaps so does halon gas. ®