Intel has scored its first big win for its Many Integrated Core (MIC) x86 coprocessor, code-named "Knights Corner," in a hybrid supercomputer that will be installed at the Texas Advanced Computing Center at the University of Texas in January 2013.
The 10-petaflops supercomputer will be nicknamed "Stampede", and will be right down the street from Dell, which was founded by Michael Dell in his University of Texas dorm room in 1984. Stampede follows TACC's Opteron-based "Ranger" system built by Sun Microsystems, which has 62,976 cores and weighs in at 579.4 teraflops. A third machine, built by Dell and called "Lonestar", has 22,656 Xeon cores delivering 302 teraflops.
The Stampede cluster is not being built by Intel (which does build systems for customers, particularly in Asia), nor is it being built by HP, which hails from Texas – well, its Compaq and Tandem parts, in any case. According to sources at Intel, Dell got the TACC gig.
Both Dell and HP have ways of packaging up servers and GPU coprocessors in a dense way, and there's no reason why either could not be the contractor to build Stampede to pack those 10 petaflops into the smallest space possible.
For example, Dell could take PowerEdge C2100 servers based on the Xeon E5, which can put four half-width two-socket servers into a 2U box, and lash one or two of those PowerEdge C410x PCIe expansion chassis to it. This enclosure can house 16 GPU coprocessors – and some day in the second half of 2012, MIC coprocessors – in a 3U chassis without having them melt. A ratio of two GPUs per server might be best (one per socket), particularly if Dell can offer a direct link from each PCIe slot in the chassis to each PCIe 3.0 slot on the Xeon E5 server. That would give you four servers and 16 MICs in 5U of space.
If you wanted to go with the HP line (which TACC apparently did not), the SL390sG7 cookie sheet server comes in a variety of heights, but the 4U chassis has room for two nodes and supports eight (yes, eight) GPUs per node. That would give you two servers and 16 MICs in 4U of space – less Xeon computing than the Dell alternative above but the same MIC count. But that is also twice as many MICs per socket, which might not be the right balance for the applications TACC wants to run.
Dell is not divulging that server the Stampede cluster is built from, since the Xeon E5 processors have not yet been launched yet. All we know is that the Dell machinery used in Stampede will be Intel Inside.
Intel was talking up the MIC coprocessor at its Developer Forum last week, saying that it would be delivered in the second half of 2012 with more than 50 Pentium cores linked together by a ring interconnect that is being deployed in Xeon and Itanium processors. MIC is the resurrection of Intel's failed "Larrabee" GPU effort, when Intel thought it could take on Nvidia and Advanced Micro Devices in the discrete graphics market by making GPUs that spoke fluent (if somewhat Middle) x86 instead of whatever foreign bit-twiddling that GPUs speak.
The Stampede system will get about 2 petaflops of its aggregate floating point power from the Xeon E5 chips, and Intel has confirmed that the machine will use the eight-core variant. The other 8 petaflops of oomph will come from the MIC coprocessors.
Assuming we are talking about double-precision math, a MIC coprocessor with somewhere north of 50 cores (but not all 64 because the yields on the 22 nanometer processes might not be great on so large of a chip) should be on the order of 1 teraflops a pop. So you are talking about maybe 8,000 of these MIC puppies. Presumably Intel's flops ratings are based on double-precision math, but if not, then it is only 4,000 of the MICs.
If you put a gun to my head, I would say that the machine will have around 2,000 server nodes and around 8,000 MIC coprocessors. This would fit perfectly into the existing combination of the Dell PowerEdge C2100 tray servers and the C410x expansion chassis. The thing to keep in mind is this: if TACC uses a tweaked version of the PowerEdge C iron, those 2 petaflops of Xeon E5 power will take up about 4,000 aggregate rack units of space, while the MIC coprocessors will offer 8 petaflops in 1,500 rack units. The MICs deliver a factor of 11 better performance per unit of rack space.
This is all hypothetical, of course, since Intel, Dell, and TACC have not revealed the configuration of the Stampede machine. ®