We passed through the 10 petaflops barrier in the supercomputer racket last year, and the next station on the train to exaflops is 100 petaflops.
China already admitted at last year's International Super Computing '12 shindig that it was working on a kicker to Tianhe-1A hybrid CPU-GPU supercomputer, with the goal of having the Tianhe-2 machine reaching 100 petaflops of peak performance by 2015. And now, the Chinese government, which is flush with trillions of dollars in cash, could be moving the schedule forward by as much as a year – and perhaps with a totally different machine.
A report in Singapore-based VR-Zone by Theo Valich claims inside information on a 100 petaflopper is being commissioned by the Chinese Ministry of Science to be used in space exploration and healthcare research. It will consist of around 100,000 of Intel's "Ivy Bridge-EP" Xeon E5-2600 v2 processors and the same number of the next-generation "Knights Landing" Xeon Phi multicore x86 coprocessors.
Nvidia which helped build the Tianhe-1A ceepie-geepie hybrid machine back in 2010 based on a rack server design created by the National University of Defense Technology (NUDT). The machine had 86,016 Xeon cores on the CPU side and 100,352 Tesla M2050 cores on the GPU side for a total of 7,168 CPUs and GPUs. This machine delivered a peak theoretical performance of 4.7 petaflops and 2.57 petaflops on the Linpack Fortran benchmark.
Tianhe-1A uses a proprietary interconnect, and is the machine that put Chinese HPC on the petaflops map, even though many thought at the time that this initial box was just a publicity stunt to get a CPU-GPU box on the top of the list and not from an American or European institution. China has also created its own homegrown variants of Sparc (which are in the Tianhe-1A cluster) and MIPS processors called Godson 3B that are aimed at everything from mobiles to supers.
Nvidia refused to talk about this prospective 100 petaflops machine or the Tianhe-2 box, if they are indeed different machines at all. Intel would not talk about this Chinese 100 petaflops box, either. "Intel does not comment on rumors and speculations," was what an Intel spokesperson told El Reg when we brought up the rumors about this Chinese machine and the further rumor that the processor and coprocessor technology would only cost $100m combined.
This is ridiculously cheap, and is almost certainly something that has gotten lost in translation somewhere. Either that, or Intel is making the chips in a Chinese fab and is getting all kinds of breaks from the Beijing government. Or is using a supercomputer as a loss leader for some other effort in China like convincing cell phone and microserver makers to use Atoms instead of ARMs.
Assuming that the Ivy Bridge-EP processors shift to ten cores, up from eight with the "Sandy Bridge-EP" Xeon E5-2600s, you would need a mere 5,000 two-socket server nodes to get to 100,000 cores. These nodes would not contribute much in the way of performance of the system. If the clock speeds of the two Xeon E5 families are the same, then these 5,000 nodes would be on the order of 2.1 petaflops, tops. Now, the story in VR-Zone says "approximately 100,000 Ivy Bridge-EP based Xeon E5s," which is a staggering 50,000 server nodes and a mind-blowing 2 million Xeon cores for a total of 21 petaflops of aggregate performance. That leaves another 80 petaflops or more that you need to get from the Xeon Phi coprocessors.
Let's do some math. The current "Knights Corner" Xeon Phi coprocessors have 60 active cores (on a die with 64 cores) running at 1.05GHz delivering just over 1 teraflops of oomph. To get 80 petaflops of peak number-crunching oomph, you would need 75,973 Xeon Phi cards, which would work out to around 1.5 Xeon Phi cards per node. Call it two Xeon Phi's in the Knights Corner generation per node just for fun, and that alone gives you 105.3 petaflops with the current generation.
Now, you know Intel won't sit still, and it will likely add more cores to the Knights Landing Xeon Phi. The Knights Corner chips are already etched in 22 nanometer processes, so Knights Landing has to either use the same process and have architectural improvements or move to a new process and do a shrink and a core count boost.
We think it will be the former rather than the latter, and so let's be optimistic and say that Intel can goose the performance of the Xeon Phis by as much as 25 to 30 per cent without moving to 14 nanometers. So with 100,000 Xeon Phi v2 coprocessors, you'd be at somewhere around 135 petaflops on the Xeon Phis and another 21 petaflops on the server nodes. Now you are pushing up to 156 petaflops peak.
It is possible that China is working on such a machine, but it is hard to imagine that it will cost as little as $100m for the processing elements. If you bought 100,000 Xeon Phi v1 coprocessors at list price based on 1,000-unit trays, you'd pay $265m, and 50,000 server nodes would run you maybe another $300m depending on the memory and networking if you bought them as onesies online.
Assuming China is using its own proprietary interconnect, you might be able to get the base servers at $250m list. Call it a cool $515m at list for both the servers and the Xeon Phis, and maybe with a 45 per cent discount and some rounding, you could get it down to $285m.
Whatever China is doing, it will make it known in its own good time. ®