They may be coming a little bit later than expected, but the next generation of Intel's server processors, code-named "Poulson" and sold under the Itanium 9500 brand, are out. Intel has also finally disclosed its plans to more fully converge the Itanium and Xeon server platforms, giving Itanium a more secure footing in the data center.
The installed bases of HP-UX, NonStop, and OpenVMS users can breathe a sigh of relief if they were hitting a performance ceiling, and so can other server makers such as Bull, NEC, and Fujitsu that have proprietary operating systems that also run on Itanium iron.
The bigger sigh of relief is that Intel is converging the Xeon and Itanium processor and system designs such that future Xeon E7 chips and "Kittson" Itanium processors will share common elements – and, more importantly, share common sockets.
This is something that Intel has been promising for years, and something that HP – the dominant seller of Itanium-based systems – has been craving, as evidenced by its Project Kinetic. Convergence was the plan of record for HP in June 2010 – nine months ahead of the Oracle claim that Itanium was going the way of all flesh – and HP wanted to converge its ProLiant and Integrity server platforms, which used x86 and Itanium processors, respectively. A common socket helps that effort in a big way.
The new Itanium 9500 chips arrive in the wake of Oracle being told in August by the Judge James Kleinberg of the Santa Clara County Superior Court that Oracle was under contractual obligation to continue to write code and support it on Itanium platforms, and the following month Oracle did indeed hit reboot on software development for the Itanium platform.
Oracle has said it will appeal Judge Kleinberg's ruling, and a jury trial will determine what damages HP might be due if and when the lawsuits between the two companies ever get that far.
As the Itanium 9500s come out on Thursday, the Oracle 11g database is certified to run on HP-UX 11i v3, which runs on systems using either "Tukwila" Itanium 9300s or Poulson Itanium 9500s. The future Oracle 12c database, previewed at the OpenWorld event in October, will be available on HP-UX running on either Itanium processor concurrent with Oracle's support on IBM's AIX running on its Power processors.
Poulson not exactly a surprise
Over the past year and a half, Intel has let out a lot of data about the Poulson Itaniums, as much to show off the core-out processor design, with a high-speed ring interconnect, similar to that used in the Xeon E5 family of chips, as to demonstrate to companies still using Itanium based processors that it is still doing real engineering work on the line, not just shrinking the chip as its fab processes progress down the Moore's Law curve.
Die shot of the Itanium 9500
The big core dump on the eight-core Poulson Itaniums came ahead of the IEEE's International Solid-State Circuits Conference in February 2011, when Intel's chip techs discussed the basic design of the chip. At the ISSCC event, Intel actually divulged a whole lot moreabout the Poulson design. Intel bragged at the Hot Chips 23 conference in August 2011 about the much-better core design in the Poulson Itaniums compared to their Tukwila predecessors.
This summer, Intel accidentally published the Itanium 9500 Series Reference manual, outing the name of the Poulson family and the four SKUs it would have. This document was quickly pulled off the web, and only had a few more details about the chip, but it is back live on the intertubes now (PDF) if you really want to read it.
Here's the cut and dry of it. The Poulson chips have a new core and microarchitecture that only takes 89 million transistors per core, compared to 109 million for the Tukwilas. The Poulson chip has twice the cores and a twelve-wide instruction issue that has twice the instruction throughput as well, and thanks to a shrink to 32 nanometer processes, Poulson can crank the clocks up as high as 2.53GHz, 46 per cent higher than the fastest Tukwila.
Add it all up, and Intel says socket-for-socket you can get something on the order of 2 to 2.4 times the performance based on its lab tests, with an 8 per cent lower thermal design point (TDP). And thanks to microarchitecture changes that gate power to the cores, caches, and other elements of the chip, the idle power of the Poulson chip is 80 per cent lower than that of the Tukwilas.
Performance of Poulson versus Tukwila on various workloads
Incidentally, those performance figures above are on code that was compiled for the six-wide issue instruction pipeline used in the Tukwilas. If you recompile those applications to take advantage of the twelve-wide pipeline, you would see even greater performance leaps.
That's what Rory McInerney, vice president of Intel's Architecture Group and director of its Server Development Group, said during a launch event in San Francisco on Thursday. McInerney also walked through some of the more important features in the new Itanium 9500s, and talked about the converged platform that Intel was cooking up.
The Instruction Replay Technology, detailed in August 2011, is a big one, moving a function that was formerly done in firmware into the chip itself. IRT allows the Itanium processor to automatically retry instructions instead of just failing and therefore crashing an application, or possibly the whole system.
Cache Safe Technology, which allows for a Xeon chip to block out bad areas in the memory hierarchy so they are not used, has been ported into the Poulson chips – and is an example of the kind of convergence that Intel will accelerate in the future between Xeons and Itaniums.
The Poulson chips also have QuickPath Interconnect (QPI) buses, which are used to link multiple processors to each other and to other parts of the system, that run at 6.4GT/sec, 33 per cent faster than what was on the Tukwilas. The sockets for the Tukwila chips, which support the new Poulsons, already had this speed bump designed into them.
There are four different Itanium 9500s: two have eight cores, and two have four cores. They have varying amounts of L3 cache memory, different clock speeds, and are aimed at different customers.
Itanium 9500s versus Itanium 9300s
The top-end Itanium 9560 is the full-on chip, with all eight cores a-blazin' at 2.53GHz, and 32MB of L3 cache feeding those cores. This is the chip that is labeled the "performance" model in the Intel specs.
The Itanium 9550 has only four cores and drops down to 2.4GHz, but is characterized as giving the best performance per core in the Intel docs, presumably because the kinds of workloads that run on Itanium chips are cache sensitive, and with double the L3 cache available per core (8MB instead of 4MB with the Itanium 9560) more than makes up for the lower clock speed. (It could be that the Intel docs have it backwards, but this is what it says.) The peculiar thing is that both the 9560 and 9550 processors have a TDP of 170 watts.
The Itanium 9540, with eight cores, yields the best price/performance, and the four-core Itanium 9520 is the value chip, with only four cores and running at only 1.73GHz. That is the top-end speed of the Tukwila chip, by the way, and it fits into a 130 watt TDP instead of the 185 watts of the equivalent Tukwila part. And at $1,350, this chip is about a third the price of the Itanium 9350 in the Tukwila line.
That said, Intel is not giving all that performance away for free, with the per-socket prices going up, 9500 SKU for equivalent 9300 SKU. But on a raw aggregate clock basis (the number of cores times the clocks), the Poulsons are roughly 45 to 65 per cent cheaper. But remember, those Poulson clocks can do more work, too.
While the new Itanium 9500s support Turbo Boost, it's a slightly different variant than we are used to with prior Itanium generations. Here is how an Intel spokesperson described it to El Reg: "Itanium 9500 continues to support Intel Turbo Boost Technology, but in a different derivation. Itanium 9500 features a concept called sustained boost, which dynamically allocates power to the area where power is needed most, to deliver the maximum overall performance under a given power envelope. Itanium 9500 can therefore always run at the maximum Intel Turbo Boost Technology frequency for all cores at all times."
We'll figure out how that works some other time.
The Itanium-Xeon mashup
Intel has been chatting and sometimes whispering about the convergence of the high-end Xeon and Itanium processors since AMD launched the original "SledgeHammer" Opterons back in April 2003. Chipzilla has already been moving in the direction of convergence with the Xeon 7500/E7 processors on the x86 side, and with the Itanium 9300s on the Itanium side. But now Intel is going a bit further.
Xeon and Itanium will share sockets and common chip elements
McInerney did not release any precise details, but the concept is simple enough: the Itanium and Xeon E7 processors shared the same "Boxboro" chipset and the same main memory architecture and buffer chips.
In a future generation of Xeon E7 processors timed to coincide with the future Kittson Itaniums, Intel will move to a common processor socket and common packaging for the chips, and will go even further and use common elements on the chip. McInerney said it could involve sharing memory and I/O controllers or other reliability features, for instance, and reminded everyone that the precise plan has not yet been set.
"It is a sustainable path forward," explained McInerney, and it will not in any way involve converging the Xeon and Itanium instruction sets. "We want to invest where we need to to drive the instruction performance of Itanium and leverage as much of the volume economics of the Xeon as we can."
IBM has faced the same issue with its Power and System z mainframe engines, and several generations ago started using common transistor blocks on both types of processors while retaining their very different instruction sets and architectures. The gap between Xeon and Itanium is not as large as between a Power machine and a System z mainframe, so Intel's job should be quite a bit easier.
So will HP's job of converging its ProLiant and Integrity systems over the long haul, too, which is all wrapped up in its "Project Odyssey" effort, announced a year ago.
Intel is making no commitments, but if El Reg had to guess, the socket convergence will appear with the "Haswell-EX" processor due several years hence. There is an outside possibility that the "Ivy Bridge-EX" Xeon E7 processor due next year will sport a new socket, and if it does the convergence could happen a little earlier. We'll have to wait and see.
Beyond Kittson, Intel is making no commitments except to say that the common-platform approach allows Intel and its partners to continue to invest in Itanium technology. The Itanium is "on a two to three year timeline" now, McInerney explained, and that puts whatever might be after Kittson out in the second half of the decade.
Just like Intel is not talking about Xeons that far out, it's not going to talk about Itaniums that far out. ®
Sponsored: Webcast: Simplify data protection on AWS