Would you like to share my socket?
The Poulson chip has a combined 54 MB of on-die memory, including L1 and L2 caches, tags and registers, and directory caches. 50 MB of this is in static RAM caches. There is 256 KB of "mid-level" data cache and 512 KB of "mid-level" instruction cache (what you and I would call L2 but for some reason Intel did not) on each core, plus 32 MB of shared L3 cache. That L3 cache looks like it is broken into two 16 MB segments, and in fact, Poulson looks like two four-core chips that have been interconnected (as you would expect). It is not clear how much L1 cache is on each Poulson core and how much is used for tags, registers, and directories. (We'll try to find out at ISSCC.)
One of the delays in getting the modified Tukwila Itanium 9300s into the field in 2008 and 2009 was that server makers wanted Tukwila, Poulson, and Kittson to share the same socket. And as promised, Poulson chips will plug into the LGA 1248 sockets used by Tukwila, and so will Kittson. So upgrades will be easy. Hopefully, Intel has built some bandwidth headroom into the Itanium platform.
McInerney said that Intel did, in fact, have some headroom in the "Boxboro" chipsets and memory boards that are shared by Itanium 9300 and Xeon 7500 systems when Tukwila chips came out last year. That is why Intel has been able to crank up the QPI speeds from the 4.8 GT/sec of the Tukwilas to the 6.4 GT/sec of the Poulsons. Assuming that the future Xeons and Itaniums will need more bandwidth, then the kicker to the Boxboro chipset will go even higher. Base 2 math would suggest that 9.6 GT/sec is the next stop on the QPI bus. For all we know, this is already cooked into the Boxboro chipsets, but just not activated.
Here's what the new Poulson core looks like:
The layout of the Poulson Itanium core
The big architectural change with the Poulson Itaniums is that the EPIC very large word instruction parallelism packaging mechanism has been made into a double-wide, moving from six-wide instruction processing to twelve-wide. In theory, and providing the application's mix of instructions works out right, this should come close to doubling the performance of Poulson cores compared to Tukwila cores, clock for clock and core for core. Which is why I don't think Intel is going to boost clock speeds on the Poulson Itaniums compared to the 1.33 GHz to 1.73 GHz of the Tukwilas. The TurboBoost speed could go up, and well beyond the 1.46 GHz to 1.86 GHz range of the Tukwilas.
With twice as many cores, processing twice as many instructions, and possibly with twice as many HyperThreads, the Poulson chips should yield anywhere from three, four, or five times the performance of the Tukwilas at the socket level. It depends on the threads and the efficiency of the twelve-wide EPIC instruction packaging. The eight other Itanium chips to date have all been six-wide chips, and it is unclear how software will take to twelve-wide pipes.
What I can tell you is that customers will not have to recompile their applications when they move to Poulson chips. "We are not anticipating that people will need to do a recompile," explains McInerney. He did add that just as is the case with any new processor, recompiling is often necessary to squeeze every drop of performance out of a system. But the performance comparisons that Intel will be making when Poulson gets closer to launch will be for code that was compiled on prior generations of Itaniums and plunking it on the Poulson systems unchanged.
The Poulson cores also have new data and instruction pipelines, a new floating point pipeline, and a new instruction buffer. The chip also has a number of dynamic power management features that gate power usage on elements of the Itanium chip and now the memory controllers and memory subsystems. Leakage current, power draw when idle, and power draw under load have all been reduced on the Poulson chip. Take a look:
Tukwila and Poulson power management (lower is better)
In this chart, Intel shows the ratio of Tukwila to Poulson on several power scaling metrics. The blue bars show Tukwila and the red bars show what would happen if the Tukwila chip was unchanged and just implemented in a 32 nanometer process. The green bars show the effect of the design changes inside Poulson on these same metrics. While Poulson only reduces power leakage by 30 per cent better than a 32 nanometer Tukwila, the Poulson chips cut back on idle power usage by 70 per cent better and cut back on power used under load (that's the TDP Activity data) by 60 per cent more. In general, the power lost or consumed for the Poulsons for these metrics is about a fifth of what it is on the real 65 nanometer Tukwilas.
Finally, Poulson will include a slew of new error detection, correction, and prevention technologies not in the current Tukwila Itanium chips. Intel has added error detection for floating point instructions and expended soft error correction and boosted cache error coverage. The chip also allows for the logging of more information about errors in the chips to improve recovery, sometimes automagically.
Intel and its main Itanium partner, HP, are no doubt hoping that the Poulson specs will put to rest any talk about the impending death of Itanium.
"Intel's commitment, as evidenced by this development effort, is strong and it is unwavering," McInerney said on the call.
Don't expect for some in the IT market to believe it. They never will.
Intel is not talking about when Poulson chips will be delivered, but it seems likely that it will show up in early 2012, with Kittson in early 2014. ®