Intel accidentally outs 'Poulson' Itanium specs

The Intertubes giveth and taketh away

The webmeisters of the world have given once again, with Intel accidentally outing some of the feeds and speeds of the impending "Poulson" Itanium processors for midrange and high-end servers. Some of the data has already been taken down, and all of it will probably follow shortly.

An intrepid reader of El Reg pointed out to the systems desk that the Itanium Processor 9500 Series: Reference Manual was available on the Intel web site, which gives details on the microarchitecture, performance, and other aspects of the chips.

The manual is mostly useful for coders working on HP NonStop, OpenVMS, and HP-UX machines at this point, plus a smattering of Bull, Fujitsu, and NEC mainframes.

A lot of information about the Poulson chips has already been divulged. Intel briefed press and analysts before the IEEE International Solid-State Circuits Conference in February 2011, confirming that Poulson would have eight cores and that it would be implemented in the 32 nanometer process. That process is being phased out this year for desktop chips and will be replaced by the 22 nanometer Tri-Gate process. Intel then gave even more details about the Poulson chips during its two big ISSCC presentations.

Here's a die shot of the Poulson chip and a quick refresher, and then we'll dig into the new data from the reference manual.

Intel Poulson Itanium core schematic

Poulson has eight cores, two directory caches, five QuickPath Interconnect (QPI) links, two Scalable Memory Interconnect (SMI) DDR3 memory controllers that support up to 512GB per socket, two shared L2 caches, and a bunch of system logic all on the same piece of silicon, which has a combined 3.1 billion transistors packed into a 588 square millimeter die size. The top bin part is expected to throw off 170 watts, which is a bit better than the 185 watts of the top-bin four-core Itanium 9300 processor it will replace.

As you can see, the Poulson uses a "cores out" design, just like the current "Sandy Bridge" Xeon E5s, and similarly has a ring interconnect that lashes together the cores, caches, and other elements of the processor.

A Poulson chip has a total of 54MB of SRAM memory: 32MB of L3 cache (broken up into eight blocks, with 4MB for each core), 2MB of total L2 data cache, 4MB of total L2 instruction cache, 2.2MB of directory cache (two chunks), 3.6MB of last level tags, and 169KB of L2 instruction tags. (Intel sometimes calls L2 cache "mid-level cache.")

Each Poulson core has 16KB of L1 data cache and 16KB of L1 instruction cache. All of the caches have ECC scrubbing. The core itself has six arithmetic logic units, two integer units, two floating point units, two memory units, and three branch units that are distributed across twelve ports.

At the very center of the chip is a ten-port crossbar router, which manages memory and I/O across the chip, and significantly links the two half-QPI links at the bottom of the chip to the four QPI links at the bottom.

The links at the top are used to create servers with two or four sockets, and the two half-links at the bottom are used to create glueless eight-socket machines. They could also be extended to support SMP or NUMA processing in conjunction with another chipset (such as a modified HP sx3000 chipset) to make even larger shared memory configurations.

A half-QPI link on Poulson will deliver 3.2GT/sec of bandwidth, compared to a full port on the Itanium 9300 that comes in at 4.8GT/sec; the full Poulson links at the top of the chip run at 6.4GT/sec. You get an aggregate of 45GB/sec of bandwidth out of those SMI memory ports and 128GB/sec out of those combined QPI ports; the L3 cache ring has 700GB/sec of bandwidth.

El Reg updated you on the Poulson chips' much-improved threading model last August, as well as their much better use of the transistor budget. The Poulson core has 89 million transistors – fewer than the 109 million in the "Tukwila" core used in the Itanium 9300s – and occupies less than a third of the area. At the same time, it maintains application compatibility while doubling the instruction pipeline width to 12 instructions.

That's what we already knew about what we now know will be called the Itanium 9500 series, and it is a lot. We now know there will be four Poulson Itaniums: The 9520, 9540, 9550, and 9560. The manual does not supply the model numbers, clock speeds, activated L3 cache sizes, or prices, though it does explain lots about the inner workings of the chip.

Still, the roadmap watchers over at CPU World think they know more. They spotted four chips with just those names in an Intel Product Change Notification document, which has since been scrubbed of the Itanium 9500 data.

The Itanium 9300 processors, launched in February 2010 came in a two-core version that ran at 1.6GHz and a four-core version that ran at between 1.33GHz and 1.73GHz. (Turbo Boost can push those cores higher, even there is enough thermal room.) If the data that CPU World stumbled upon is right, then Poulson clock speeds will range from 1.73GHz to 2.53GHz.

That is a a very big bump in processing speed, ranging from 30.1 to 46.2 per cent over the Tukwilas and a lot more than the 20 per cent or so we were expecting. If these are the real SKUs and processing speeds, that is certainly something that HP and Intel will eventually be crowing about.

The Poulson chips are expected to be launched sometime this year, and they are probably in production right now. It would be a fair guess that Intel plans to launch them at the Intel Developer Forum in San Francisco in early September. ®

Similar topics

Broader topics

Other stories you might like

  • Microsoft fixes under-attack Windows zero-day Follina
    Plus: Intel, AMD react to Hertzbleed data-leaking holes in CPUs

    Patch Tuesday Microsoft claims to have finally fixed the Follina zero-day flaw in Windows as part of its June Patch Tuesday batch, which included security updates to address 55 vulnerabilities.

    Follina, eventually acknowledged by Redmond in a security advisory last month, is the most significant of the bunch as it has already been exploited in the wild.

    Criminals and snoops can abuse the remote code execution (RCE) bug, tracked as CVE-2022-30190, by crafting a file, such as a Word document, so that when opened it calls out to the Microsoft Windows Support Diagnostic Tool, which is then exploited to run malicious code, such spyware and ransomware. Disabling macros in, say, Word won't stop this from happening.

    Continue reading
  • Linux Foundation thinks it can get you interested in smartNICs
    Step one: Make them easier to program

    The Linux Foundation wants to make data processing units (DPUs) easier to deploy, with the launch of the Open Programmable Infrastructure (OPI) project this week.

    The program has already garnered support from several leading chipmakers, systems builders, and software vendors – Nvidia, Intel, Marvell, F5, Keysight, Dell Tech, and Red Hat to name a few – and promises to build an open ecosystem of common software frameworks that can run on any DPU or smartNIC.

    SmartNICs, DPUs, IPUs – whatever you prefer to call them – have been used in cloud and hyperscale datacenters for years now. The devices typically feature onboard networking in a PCIe card form factor and are designed to offload and accelerate I/O-intensive processes and virtualization functions that would otherwise consume valuable host CPU resources.

    Continue reading
  • AMD bests Intel in cloud CPU performance study
    Overall price-performance in Big 3 hyperscalers a dead heat, says CockroachDB

    AMD's processors have come out on top in terms of cloud CPU performance across AWS, Microsoft Azure, and Google Cloud Platform, according to a recently published study.

    The multi-core x86-64 microprocessors Milan and Rome and beat Intel Cascade Lake and Ice Lake instances in tests of performance in the three most popular cloud providers, research from database company CockroachDB found.

    Using the CoreMark version 1.0 benchmark – which can be limited to run on a single vCPU or execute workloads on multiple vCPUs – the researchers showed AMD's Milan processors outperformed those of Intel in many cases, and at worst statistically tied with Intel's latest-gen Ice Lake processors across both the OLTP and CPU benchmarks.

    Continue reading
  • Apple’s M2 chip isn’t a slam dunk, but it does point to the future
    The chip’s GPU and neural engine could overshadow Apple’s concession on CPU performance

    Analysis For all the pomp and circumstance surrounding Apple's move to homegrown silicon for Macs, the tech giant has admitted that the new M2 chip isn't quite the slam dunk that its predecessor was when compared to the latest from Apple's former CPU supplier, Intel.

    During its WWDC 2022 keynote Monday, Apple focused its high-level sales pitch for the M2 on claims that the chip is much more power efficient than Intel's latest laptop CPUs. But while doing so, the iPhone maker admitted that Intel has it beat, at least for now, when it comes to CPU performance.

    Apple laid this out clearly during the presentation when Johny Srouji, Apple's senior vice president of hardware technologies, said the M2's eight-core CPU will provide 87 percent of the peak performance of Intel's 12-core Core i7-1260P while using just a quarter of the rival chip's power.

    Continue reading

Biting the hand that feeds IT © 1998–2022