Hot Chips Big iron sales are still generating $6bn to $7bn a year for IBM - which is enough to justify designing its own Power processors and building its own wafer baker.
At the Hot Chips conference at Stanford University on Monday, some of the chief architects behind the Power8 electronics were on hand to show off the feeds and speeds of the next-generation motor for the company's Power Systems lineup.
Significantly, the Power8 chip is also the foundation for Big Blue's OpenPower consortium - an effort to make it easier to hook networking, accelerators and other features into Power processors by allowing third parties to license chunks of intellectual property in the style of ARM Holdings and its RISC cores.
IBM announced the OpenPower effort earlier this month, with GPU maker Nvidia, network chip maker Mellanox Technologies, motherboard maker Tyan, and advertising moneymaker Google all lending their support to the cause.
Whether or not the OpenPower effort gains traction remains to be seen; the Power8 is so clearly engineered for midrange and enterprise systems for running applications on a giant shared memory space, backed by lots of cores and threads. Power8 does not belong in a smartphone unless you want one the size of a shoebox that weighs 20 pounds. But it most certainly does belong in a badass server, and Power8 is by far one of the most elegant chips that Big Blue has ever created, based on the initial specs.
IBM's Power8 processor
Jeff Stuecheli, who has the title of chief nest architect for the Power8 processor, gave the presentation at Hot Chips going over the feeds and speeds. If the cores on a Power chip are the eggs, then the chief nest architect worries about all of the other things that surround the cores - what Intel calls the uncore regions when it talks about chips.
The Power8 nest is lined with L3 caches, PCI-Express and DDR memory controllers, various other accelerators to speed up functions that might otherwise run on the cores, and the NUMA interconnects for implementing shared memory across multiple sockets.
The evolution of the Power chips for the last decade
With the Power8 chip, IBM has a few goals. First, the company is shifting from the 32-nanometer processes used for the relatively recent Power7+ chips to a 22-nanometer process. The shrinking of the transistor gates allows IBM to add more features to a die, cranks the clocks, or do a little of both.
Judging from the Power8, it looks like IBM is content to keep in the same clock speed range as the Power7+ chips - around 4GHz, give or take a little. It'll also move PCI-Express 3 controllers into the chip package to keep those hungry little Power8 cores fed; these controllers will offer a coherent memory protocol to external accelerators as well as a new cache hierarchy that goes all the way out to the L4 cache.
As expected, IBM is also goosing the number of processor threads per core with Power8, doubling it up to eight per core. IBM has been vague about how many cores it might squeeze onto a die with the 22-nanometer shrink, and it could have probably done as many as sixteen cores if it had not added so much eDRAM L3 cache memory with the Power7+ and then boosted it even further with the Power8.
On the workloads that Big Blue is targeting with its Power Systems iron, having more cache and cores running at near peak utilisation is more important than having lots of cores on a die. Just as is the case for mainframes, at the prices that IBM has to charge for Power Systems servers, the chip has to be architected to run at close to full-tilt-boogie in a sustained manner. If IBM can do that, then it can garner the prices it commands and the profits we all presume it gets from Power Systems.
The Power8 core looks a lot like the Power7+ core, with some tweaks
The Power8 chip is implemented in IBM's familiar high-k metal gate processes, which include copper and silicon-on-insulator technologies in a 22-nanometer process. The precise transistor count was not given during the presentation, but the Power8 chip weighs in at 650 square millimetres; this is a bit bigger than Power7+, which used a 32-nanometer process, had 2.1 billion transistors, and a surface area of 567 square millimetres.
The Power8 core has a total of sixteen execution pipes. These include two load store units (LSUs) and a condition register unit (CRU), a branch register unit (BRU), and two instruction fetch units (IFUs). There are two fixed-point units (FXUs), two vector math units (VMXs), a decimal floating unit (DFU), and one cryptographic unit (not labeled in the core diagram above).
Each core now has eight threads implemented using simultaneous multithreading (what IBM calls SMT8), instead of four threads per core with the Power7 and Power7+ chips. And like earlier Power chips, this SMT is dynamically tuneable so a core can have one, two, four, or eight threads fired up.