Poulson Itaniums hit 'Replay' for reliability

New instructions, better HyperThreading

Hot Chips The future eight-core "Poulson" Itanium is not just a process-shrink of the current four-core "Tukwila" Itanium 9300. Intel has been working to add new features to Poulson to make it useful running enterprise workloads – and to do so more reliably.

Intel already released a lot of Poulson details back at February's IEEE's International Solid-State Circuits Conference in San Francisco, and at the Hot Chips conference at Stanford University late last week, the company lifted the veil a little more – and continued to keep its head down in the legal spat between Oracle and HP over Itanium's long-term fate, .

Steve Undy, technical lead design engineer for Poulson at Intel, gave a presentation that walked through some of the chip's new features. And perhaps more important than any feature, Undy confirmed that Poulson was in its post-silicon validation and has been booted and tested on multiple operating systems and running in different system topologies.

HP's HP-UX, OpenVMS, and NonStop operating systems are expected to be available on the Poulson chips, as is SUSE Linux Enterprise Server and a number of proprietary operating systems from Fujitsu, NEC, and Bull. Poulson is on track for shipment in 2012.

A statement from Intel that was released late last week in conjunction with Undy's Hot Chips presentation said that the new Poulson instructions are intended "to help take future Itanium performance to the next level and to lay the foundation for the future of Itanium computing." The statement ended by saying that the follow-on "Kittson" Itanium processor is under development.

Intel Poulson Itanium Chip

Intel's Poulson Itanium processor, scheduled for 2012 (click to enlarge)

Like the Xeon processors, the Poulson Itaniums have a "core out" design that puts the cores on the outside edges of the chip with a shared L3 cache in the center, all linked together by a fast ring interconnect. Poulson's, the L3 cache weighs in at 32MB, and the chip has two integrated DDR3 main-memory controllers with a total of four Scalable Memory Interface (SMI) links out to memory boards.

Poulson has four full-width and two half-width QuickPath Interconnect (QPI) links, which run at 6.4GT/sec. The chips are baked in a 32-nanometer process, have an area of 544 square millimeters, have 3.1 billion transistors, and have a maximum thermal design point of 170 watts with all cores humming along.

Intel has not yet talked about clock speeds, but the speculation is that the clock speed won't change much from the current Itanium 9300s, which were launched in February 2010 and which run at between 1.33GHz to 1.73GHz. These Tukwila Itaniums are made in Intel's 65-nanometer processes, have just over 2 billion transistors, and peak out at 185 watts across their four cores.

Intel Poulson Itanium chip block diagram

Schematic of Intel's Poulson Itanium chip

The Poulsons will offer twice the cores of the Tukwilas, QPI and SMI links that run 50 per cent faster, plus 33 per cent more L3 cache on-chip. The Poulsons will not scale beyond eight sockets in symmetric multiprocessing configurations – the same level as the Tukwilas, which could also scale to eight sockets. Presumably the faster QPI and SMI links will help SMP performance, however.

The Poulsons will plug into the same sockets used by Itanium 9300 servers, and that might mean customers running HP's Integrity servers will do processor upgrades before they do system upgrades. This may or may not be good news for HP, but at this point, HP has admitted that Oracle's decision back in March to stop development of its database, middleware, and application software has adversely impacted Integrity server sales. In some cases, customers are putting off buying machines, and in others they've canceled orders.

There is more to the Poulson chips than just adding cores to the die and hooking them up with a ring interconnect. The Poulson cores themselves are different. Here's what they look like, schematically:

Intel Poulson Itanium core schematic

Block diagram of the Poulson Itanium core

The first interesting thing to note is that the Poulson core has fewer transistors than the Tukwila core (89 million versus 109 million) and occupies less than a third of the area, while at the same time maintaining application compatibility and doubling the instruction pipeline width to 12 instructions.

One of the new features in that updated Itanium pipeline is called Instruction Replay Technology, which is designed to improve system uptime. With the IRT feature, Intel has put an instruction buffer in the pipeline and if an instruction goes haywire as it moves down the Poulson pipeline, rather than crash the system or corrupt data, an errant instruction is re-executed from the instruction buffer.

This instruction buffer in the Poulson pipeline has another important role to play in an improved HyperThreading scheme that will debut with these future Itanium chips. The buffer breaks the pipeline into a front-end and a back-end, creating a dual-domain multithreading that allows for the front-end and back-end parts of the pipeline to be independently threaded.

Intel's chip engineers have also added pipeline-specific thread switch mechanisms to deal with this more complex and wider Poulson pipeline, as well as dual-threaded register files, dual-threaded data side translation buffers (TLBs), and a new fairness mechanism.

Intel is also adding a number of new instructions with the Poulson Itaniums to have better thread control, expanding prefetching of data and instructions for the pipeline, and adding hints for data access for L1 caches. The Poulson also has three new integer operations to boost the performance of legacy Itanium code without requiring for applications to be recompiled. ®

Similar topics

Broader topics

Other stories you might like

  • Linux Foundation thinks it can get you interested in smartNICs
    Step one: Make them easier to program

    The Linux Foundation wants to make data processing units (DPUs) easier to deploy, with the launch of the Open Programmable Infrastructure (OPI) project this week.

    The program has already garnered support from several leading chipmakers, systems builders, and software vendors – Nvidia, Intel, Marvell, F5, Keysight, Dell Tech, and Red Hat to name a few – and promises to build an open ecosystem of common software frameworks that can run on any DPU or smartNIC.

    SmartNICs, DPUs, IPUs – whatever you prefer to call them – have been used in cloud and hyperscale datacenters for years now. The devices typically feature onboard networking in a PCIe card form factor and are designed to offload and accelerate I/O-intensive processes and virtualization functions that would otherwise consume valuable host CPU resources.

    Continue reading
  • AMD to end Threadripper Pro 5000 drought for non-Lenovo PCs
    As the House of Zen kills off consumer-friendly non-Pro TR chips

    A drought of AMD's latest Threadripper workstation processors is finally coming to an end for PC makers who faced shortages earlier this year all while Hong Kong giant Lenovo enjoyed an exclusive supply of the chips.

    AMD announced on Monday it will expand availability of its Ryzen Threadripper Pro 5000 CPUs to "leading" system integrators in July and to DIY builders through retailers later this year. This announcement came nearly two weeks after Dell announced it would release a workstation with Threadripper Pro 5000 in the summer.

    The coming wave of Threadripper Pro 5000 workstations will mark an end to the exclusivity window Lenovo had with the high-performance chips since they launched in April.

    Continue reading
  • Qualcomm wins EU court battle against $1b antitrust fine
    Another setback for competition watchdog as ruling over exclusive chip deal with iPhone nullified

    The European Commission's competition enforcer is being handed another defeat, with the EU General Court nullifying a $1.04 billion (€997 million) antitrust fine against Qualcomm.

    The decision to reverse the fine is directed at the body's competition team, headed by Danish politico Margrethe Vestager, which the General Court said made "a number of procedural irregularities [which] affected Qualcomm's rights of defense and invalidate the Commission's analysis" of Qualcomm's conduct. 

    At issue in the original case was a series of payments Qualcomm made to Apple between 2011 and 2016, which the competition enforcer had claimed were made in order to guarantee the iPhone maker exclusively used Qualcomm chips.

    Continue reading
  • Intel says Sapphire Rapids CPU delay will help AMD catch up
    Our window to have leading server chips again is narrowing, exec admits

    While Intel has bagged Nvidia as a marquee customer for its next-generation Xeon Scalable processor, the x86 giant has admitted that a broader rollout of the server chip has been delayed to later this year.

    Sandra Rivera, Intel's datacenter boss, confirmed the delay of the Xeon processor, code-named Sapphire Rapids, in a Tuesday panel discussion at the BofA Securities 2022 Global Technology Conference. Earlier that day at the same event, Nvidia's CEO disclosed that the GPU giant would use Sapphire Rapids, and not AMD's upcoming Genoa chip, for its flagship DGX H100 system, a reversal from its last-generation machine.

    Intel has been hyping up Sapphire Rapids as a next-generation Xeon CPU that will help the chipmaker become more competitive after falling behind AMD in technology over the past few years. In fact, Intel hopes it will beat AMD's next-generation Epyc chip, Genoa, to the market with industry-first support for new technologies such as DDR5, PCIe Gen 5 and Compute Express Link.

    Continue reading

Biting the hand that feeds IT © 1998–2022