Analysis A few weeks ago, El Reg told you that IBM was getting ready to start talking about its future Power7+ and System zNext processors at the Hot Chips conference at the end of August. Like you, I am an impatient sort when it comes to getting some insight into future processors from any vendor, and I like to poke around and see what I can find out about these chips as soon as possible because it is interesting and useful to know as much as possible as soon as possible.
I like rummaging around the Internet for processor roadmaps and such as well, and occasionally I find stuff that at least assures us there is a future for any particular technology - in this case the Power processors and the IBM i, AIX, and Linux systems that depend upon them as their compute engines.
Here's a tidbit I found about investment in Power iron. In a presentation from February 2011 (PDF) by Peter Nimz, product manager for Power Systems at IBM Deutschland, Big Blue said that it had invested $3.2bn in Power7 systems over the past 3.5 years. That's an average of $915m a year for a product line that ranges from $3.5bn to $4.5bn a year in total sales, and that is a pretty substantial investment. It no doubt includes chip design, the Power Systems division's allocation for overhead for chip fabrication development, and the actual server engineering, too.
IBM could do a better job putting out a long-term public roadmap for its Power and mainframe processors, but as the dominant player in the Unix racket now, Big Blue doesn't feel it needs to do that: while Oracle, which bought former Unix leader Sun Microsystems, has to put out a roadmap to demonstrate its commitment to the Sparc architecture. Hewlett-Packard has left its Unix and proprietary systems future largely in the hands of Intel - which it has apparently been paying for years to continue Itanium development as well as manufacturing - and Oracle, which has put a serious damper on Itanium by not supporting its current or future software on future Itanium processors.
We know that Power7+ is coming sometime toward the end of this year, and that it will be implemented using 32 nanometer processes at IBM's East Fishkill, New York, chip fab.
As you can see from the roadmap, the move from Power7 to Power7+ involves a process shrink from 45 nanometers, which means IBM can cram a lot more transistors onto the same area or shrink the chip a bit and also boost the cycle time on the processor. As you can see from the roadmap above, IBM is promising faster clocks, a very large cache, and accelerators to boost the performance of certain workloads; but it is not promising more than the four, six, and eight core variants it already peddles with the Power7 chips. And with the move to Power8, sometime around the end of 2013 or early 2014, IBM will shift to 22 nanometer processes and add more cores, reliability enhancements (including perhaps spare cores is my guess) boosted accelerators, and its fourth generation of simultaneous multithreading. It is hard to imagine IBM would go from four to eight threads per core with the Power8 chips, but Sun and Oracle did it with the Sparc T series chips and got some benefits from the high thread count for parallel workloads.
As I have said before, I think IBM will probably boost the clock speed on the Power7+ chips by between 25 and 30 per cent, with the top bin parts spinning at above 5GHz and in the same range as the current z11 engines used in the System zEnterprise 114 and 196 machines - a quad-core chip that spins at 5.2GHz. That's just a guess on my part, but there are plenty of workloads where single-threaded performance is important and IBM cannot forget these customers if it wants to maintain an edge over its X86 and Sparc rivals.
I wasn't sure how much IBM would boost the on-chip embedded DRAM cache size, but as you can see in this performance document published on IBM's DeveloperWorks site (PDF), the L3 cache size will increase from 4MB for each local core segment on the Power7 chip (for a total of 32MB) to 10MB per core on the Power7+ chip (for a total of 80MB). If the core count on Power7+ remains the same at a maximum of eight per chip, then that will be 80MB of L3 cache, a truly huge amount and four times what Intel can put on its eight-core Xeon E5-2600 chip. (I know that the core count for Power7+ stays at eight, which I will show you in a second.)
The point is that the combination of cache and clocks could significantly increase the single-thread and multithread performance of the Power7+ chip compared to Power7. How much remains to be seen, but the performance boost moving from Power6 to Power7 was much larger than you would think, and largely thanks to the eDRAM cache moving on chip and being as almost as fat as the external L3 cache on the Power6 and Power6+ chips (it was 36MB). Of course with a larger cache there will be fewer cache misses and consequently potentially lower benefit of SMT because multithreading takes advantage of stall time in the CPU when it has a cache miss. This is a good tradeoff, boosting on-chip L3 cache, or chip makers would not be making it all the time.