Top 500 supers: China rides GPUs to world domination
The People's Republic of Petaflops
GPU is my co-pilot
On the November 2010 list, there are 28 HPC systems that use GPU accelerators, and the researchers who put together the Top 500 for the 36th time - Erich Strohmaier and Horst Simon, computer scientists at Lawrence Berkeley National Laboratory, Jack Dongarra of the University of Tennessee, and Hans Meuer of the University of Manheim - consider IBM's Cell chip a GPU co-processor. On this list, there are sixteen machines that use Cell chips to goose their floating point oomph, with ten using Nvidia GPUs and two using AMD Radeon graphics cards.
The Linpack Fortran matrix benchmark was created by Dongarra and colleagues Jim Bunch, Cleve Moler, and Pete Stewart back in the 1970s to gauge the relative number-crunching performance of computers and is the touchstone for ranking supercomputers.
There are three questions that will be on the minds of people at the SC10 supercomputing conference in New Orleans this week. The first is: Can the efficiency of ceepie-geepie supers be improved? The second will be: Does it matter if it can't? And the third will be: At what point in our future will GPUs be standard components in parallel supers, just like parallel architectures now dominate supercomputing and have largely displaced vector and federated RISC machines?
To get onto the Top 500 list this time around, a machine had to come in at 31.1 teraflops, up from 24.7 teraflops only six months ago. This used to sound like a lot of math power. But these days, it really doesn't. A cluster with 120 of the current Nvidia Tesla GPUs with only half of the flops coming through where the CUDA meets the Fortran compiler will get you on the list. The growth is linear, then on the June list next year, you will need something like 40 teraflops or about 150 of the current generation of GPUs. And with GPU performance on the upswing, maybe the number of GPUs in a ceepie-geepie to get onto the Top 500 list might not require so many GPUs.
As has been the case for many years, processors from Intel absolutely dominate the current Top 500 list, with 398 machines (79.6 per cent of the boxes on the list). Of these, 56 machines are using the Xeon 5600 processors, one is still based on 32-bit Xeons, one is based on Core desktop chips, five are based on Itanium processors, and three are based on the new high-end Xeon 7500s.
In the November 2010 rankings, there are 57 machines using AMD's Opteron processors, while there are 40 machines using one or another variant of IBM's Power processors. While the machine counts are low for these two families of chips, the core counts sure are not because of the monster systems that are based on Power and Opteron chips.
There are 1.41 million Power cores on the Top 500 list this time around, which was 21.5 per cent of the total 6.53 million cores inside of the 500 boxes and which represented 7.35 aggregate petaflops or 11.2 per cent of the total 65.8 petaflops on the list. There are 1.54 million Opteron cores (23.5 per cent of cores) on the aggregate list for 14.2 peak petaflops (21.6 per cent of total flops)
None of these core counts include the GPU core counts, which is something that the Top 500 people should reconsider, even though in all cases the flops are counted.
Across all processor architectures, there are 365 machines using quad-core processors and 19 already are using CPUs with six or more processors per socket. It is safe to say that the HPC market will eat whatever number of cores the chip makers can bake.
There are two Sparc-based supers on the current Top 500 list and the Earth Simulator super built by NEC for the Japanese government is still barely on the list (and will probably be knocked off on the next list in June 2011).