Supercomputer maker Cray has finally jumped on the GPU coprocessor bandwagon, and it looks like someone is going to have to hitch Belgian draft horses to that wagon and reinforce its axles once the XK6 hybrid super starts shipping in the fall.
Cray made a name for itself as a provider of vector processors back in the 1970s and morphed into a maker of massively parallel x64 machines with proprietary interconnects in the 2000s. With GPUs – whether they are made by Nvidia or AMD – being more like vector engines than not, the adoption of a GPU as a coprocessor is a return to its past. Or more precisely, considering that Cray is really Tera Computer plus Cray Research plus Octiga Bay, one of its pasts.
Cray didn't need GPUs to break through the petaflops barrier, but it is going to need GPUs or some kind of coprocessor to break through the exascale barrier. Barry Bolding, vice president of products at Cray, tells El Reg that "customers are a little dissatisfied that scalar performance has flattened out" in recent years, referring to the clock speeds of the x64 processors used inside of generic supercomputer clusters (usually linked by Ethernet or InfiniBand networks) or the monster machines created by Cray and Silicon Graphics using their respective "Gemini" XE and "UltraViolet" NUMAlink 5 interconnects.
"Cray has not been the first to the GPU party, but we have a very good understanding of petascale applications," Bolding boasted, adding that "putting together a box that has both CPUs and GPUs is the easy part."
In fact, says Bolding, Cray has spent more money on integrating its software stack – a custom Linux environment, its Ethernet emulation layer for the Gemini interconnect, and various development tools for parallel environments – with GPUs than it has spent redesigning the blade servers at the heart of its "Baker" family of XE6 and XE6m machines so they can adopt GPU coprocessors.
"We think that our vector experience helps," Bolding says. "The codes that were good for vectors will generally perform well on GPUs. And we really do view this as a stepping stone to exascale. GPUs are today the most effective accelerator that is available." Bolding added that Cray's future designs will not be locked into either AMD's HyperTransport or Intel's QuickPath interconnects, but rather will hang accelerators off PCI Express links. (Very soon, PCI Express 3.0 links, but not this time around.)
Cray's XK6 ceepie-geepie hybrid supercomputer
The Cray XE6 machines are based on eight-socket blade servers, complete with main memory and two Gemini interconnect ASICs. The prior generation of machines, the XT6 supers, were based on the same "Magny-Cours" Opteron 6100 processors that are used in the XE6 blades, but used the much slower and less scalable SeaStar2+ interconnect. The SeaStar2+ interconnect is the great-grandson of the "Red Storm" interconnect that Cray developed for Sandia National Laboratory, delivered in 2003, and later commercialized as the XT3.