As we had been anticipating, IBM is indeed dropping the clock speeds of the Power7 chips down, and IBM now confirms that for this Power7 IH MCM, clock speeds will range from 3.5GHz to 4GHz. The current Power6+ chips top out at 5 GHz, but only have two cores per chip and fewer execution units. IBM says that this single package will deliver just north of one teraflops of number-crunching power using just the floating point units.
At 800 watts, the package is not cool by any means, but the Power7 IH MCM is delivering performance at 1.28 gigaflops per watt at the package level. A Xeon 5500 chip from Intel can do four floating point operations per core, or 16 across four cores in a single die, and that means the top-speed X5570 running at 2.93GHz and rated at 95 watts can deliver 46.9 gigaflops, or only 493 megaflops per watt at the chip level.
Dropping down to the 80-watt E5540 helps a bit, delivering 506 megaflops per watt, and stepping down to the 60-watt L5530 running at 2.4 GHz gives you 640 megaflops per watt. The Power7 module is precisely twice as good, but you can damned sure bet it will cost a lot more than twice as much.
The Power7 IH node, as you can see in this picture, is not small. It is 39 inches wide by 6 feet deep, including space for cables. The IH node drawer is 2U high and it has room for eight of these Power7 IH MCMs, for a total of 256 cores. There are two monster motherboards underpinning the processors and their memory and the hub/switch and its interconnects. These mobos are manufactured by Japanese server maker Hitachi and Benner said that these were one of the largest motherboards ever made.
The Power7 IH HPC server node
The IH nodes are completely water-cooled, with water blocks on the Power7 MCM packages, on the 8 GB DDR3 memory modules IBM had specially designed for the box, and on the Power7 IH hub/switches, which were not given a proper name yet.
The memory modules include buffers on the DIMMs, which IBM also designed, to help accelerate their performance. There are 16 DIMM slots per socket in the Power7 IH node, and IBM is using 8GB DIMMs, yielding 4GB per core.
A total of 1TB of main memory is on each drawer, and the fully loaded Blue Waters box will have 2PB of main memory. IBM is being a bit cagey about the memory architecture, but the Power7 chips have some features to implement a kind of global address space (not cache coherent shared memory like in SMP and NUMA servers). It will be interesting to see how this global address space memory is architected and how it performs.
In this picture, the power supplies are to the right, and moving leftward are banks of DDR3 memory, the eight Power7 IH MCM sockets, another bank of memory, the IH hub/switch modules with optical links going out to the left and right (the orange cables), which route out to the back and come out as optical links to other server nodes in a cluster. The left side of the chassis (which is the back of the rack in the Blue Waters machine) is also where there are 16 PCI-Express 2.0 x16 slots and an extra x8 slot just for the heck of it.