GTC Server makers are swarming the GPU Technical Conference put on by graphics card and GPU co-processor maker Nvidia in San Jose this week. They smell money – HPC money in its many flop-happy variants, including traditional HPC simulation as well as electronic design automation, data analytics, financial modeling, machine vision, and digital media rendering. Many applications are only now getting GPU extensions, and now the appropriate iron has to be brought into the field to do it.
The fanless M2050 and M2070 GPUs announced in June were an important piece of the hybrid CPU-GPU puzzle, since server makers need a smaller GPU and want to use the server's own cooling fans to keep the GPU from frying. (The GPU cards with fans on them are too fat and can be packed into pizza box servers easily). It is no wonder that now that the M series GPUs are out from Nvidia - as well as their FireStream 9350 and 9370 fanless GPU co-processor counterparts from Advanced Micro Devices, also announced in June - that server makers are finally weaving them into their hard wares.
Supercomputer maker Cray said that the GPU Tech Conference that it would be creating a variant of its blade servers in the new XE6 massively parallel supers that would allow Tesla GPU co-processors from Nvidia to be mixed in with the x64 processors and offload floating point work.
Thanks to its OctigaBay acquisition back in February 2004, Cray has engineering expertise is weaving field programmable gate array (FPGA) and other types of accelerators into massively parallel x64 machines. Speaking to El Reg, Barry Bolding, vice president of products at Cray, said that Cray was "very picky" about the components that it puts into its supers, but that the GPU co-processors had matured and that supercomputer customers were saying they were interested in GPU acceleration. No doubt about that, then a petaflops of massively parallel x64 server capacity runs on the order of $45m, based on Cray's own sales of the XE6 systems.
Bolding didn't say much more about exactly how the Tesla 20 GPU co-processors would be put into the XE6 supers, which are comprised of eight-socket Opteron blades and the "Gemini" XE interconnect and which made their debut as a complete system in May. He did confirm that the Tesla GPUs would be put on blades, would like to the Opteron blades using PCI-Express links, and would be able to use the Gemini interconnect to share data and work.
Cray is planning to base the blades on the next generation of Tesla GPUs, which are code-named "Kepler" and which are due in 2011. Bolding said that Cray, being a partner of Advanced Micro Devices for CPUs, was looking at the GPUs coming out of AMD and is in discussions to see how they might be used in Cray massively parallel supers as well. Just like Cray has learned to have two sources of x64 chips, it will no doubt want to have two sources of GPU accelerators.
Over at Silicon Graphics, the company announced today that its high-end Altix UV 1000 parallel supers, which use SGI's NUMAlink 5 interconnect to scale to 256 of Intel's Xeon 7500s in a shared memory parallel super, will also be equipped with Tesla 20 GPUs. According to Bill Mannel, vice president of product marketing at SGI, the company will plug in the 1U GPU S2050 chassis, which sports two GPUs in the chassis. The Altix UV blade server has a PCI-Express riser card and the S2050 links to the blades through it.
You can't do a one-for-one pairing of GPUs and CPU sockets on the Altix UVs, however. You can only hook four of these S2050s into each 256-socket Altix box. The thing that is important is that the single memory space of the Altix UV design and the high-speed NUMAlink 5 interconnect means that an application using GPU co-processors can gather up its data into main memory and feed it directly at very high speed to the GPUs, making them run at a kind of efficiency that Mannel says is not possible in a normal CPU-GPU cluster.
SGI is also supporting Tesla 20 GPU co-processors in its Octane III personal supercomputer, the Altix XE workgroup servers, and the Altix ICE x64-based clusters. SGI is also, in a tip of the hat to its Rackable Systems heritage, doing some bespoke server designs for customers in the HPC space as it has always done for hyperscale Web customers. In this case, the designs include compact chassis design, low thermals, and GPUs, says Mannel.