SC10 If the June edition of the bi-annual ranking of the Top 500 supercomputers in the world represented the dawning of the GPU co-processor as a key component in high performance computing, then the November list is breakfast time. The super centers of the world are smacking their lips for some flop-jacks with OpenCL syrup and some x64 bacon on the side.
China has the most voracious appetite for GPU co-processors, and as expected two weeks ago when the Tianhe-1A super was booted up for the first time, this hybrid CPU-GPU machine installed at the National Supercomputer Center in Tianjin has taken the top spot on the Top 500 list with a comfortable margin. Tianhe-1A's final rating on the Linpack Fortran matrix math benchmark test is 4.7 petaflops of peak theoretical performance spread across its CPUs and GPUs (with about about 70 per cent of that coming from the GPUs) and 2.56 petaflops of sustained performance on the Linpack test.
The Tianhe-1A machine is comprised of 7,168 servers, each equipped with two sockets using Intel's X5670 processors running at 2.93 GHz and one Nvidia Tesla M2050 fanless GPU co-processor. The resulting machine spans 112 racks, and it would make a hell of a box on which to play Crysis.
While 47 per cent of the floating-point oomph in Tianhe-1A disappears into the void where all missed clock cycles go (it's also where missing socks from the dryer cavort), the GPU's flops are relatively inexpensive and the overall machine should offer excellent bang for the buck - provided workloads can scale across the ceepie-geepie of course. The Tianhe-1A super uses a proprietary interconnect called Arch, which was developed by the Chinese government. The Arch switch links the server nodes together using optical-electric cables in a hybrid fat tree configuration and has a bi-directional bandwidth of 160 Gb/sec, a latency for a node hop of 1.57 microseconds, and an aggregate bandwidth of more than 61 Tb/sec.
The Tianhe-1A GPU-GPU hybrid super
This is not the first ceepie-geepie machine that the National Supercomputer Center has put together. A year ago, the Tianhe-1 machine broke onto the Top 500 list using Intel Xeon chips and Advanced Micro Devices Radeon HD 4870 GPUs (no Tesla GPUs, but actual graphics cards). This initial "Milky Way" box (that's what "Tianhe" translates to in English) had 71,680 cores and had a peak theoretical performance of 1.2 petaflops and a sustained performance of 563.1 teraflops. The efficiency of this cluster was 53 per cent, sustained over peak performance.