Intel releases eight-headed* beast

* - misbehaving heads may be severed


ISSCC Intel offered only a few new details about its latest processor offerings at the International Solid-State Circuit Conference (ISSCC) today, even though it essentially had the stage to itself.

AMD, IBM, and Sun chose to sit out the Microprocessor Technologies session, forcing the hundreds of engineers hungry for CPU info to dine only on the few crumbs doled out by Intel.

That said, the major announcement was a big one - but of no surprise to chip watchers: the eight-core enterprise-level Nehalem-EX Xeon processor. With its 2.3 billion transistors, this new 45nm chip has the largest number of functional units of any commercial processor ever released, according to Intel's Stephan Rusu.

Although Rusu provided no product-ID numbers - or clock rates or ship dates - the new Xeon is almost certain to be in the company's 74xx line of enterprise-server chips, unless Intel defines it as the beginning of a new line, which is unlikely.

The processor's eight cores will each employ the company's implementation of simultaneous multithreading (SMT) technology to support two simultaneous threads per core. Managing SMT extracts a performance penalty, but it should be less than 10 per cent, according to Intel's Rajesh Kumar.

In addition to its eight cores, the Nehalem-EX also has eight cache slices combined into one L3 cache of up to 24MB, shared by all eight cores as allocated by a central hub router.

This large number of cores and caches led Intel to introduce a new technology called Core and Cache Recovery. This post-manufacturing, pre-sale technique permanently shuts down misbehaving cores and caches before a Nehalem-EX leaves the factory.

According to Rusu, the reason for Core and Cache Recovery is simple: if one part of the complex chip is disabled, the entire chip doesn't have to be thrown out. Instead, it will simply be sold as a core-disabled and/or cache-disabled SKU at a lower price.

Importantly, if a core or cache is disabled, it will be locked down so that it won't contribute to current leakage. A core shutdown will result in an 83 per cent leakage-power savings, while a cache shutdown's savings would be in the 35 percent range.

Rusu went to pains to assure attendees, however, that the reason the leakage savings of a cache shutdown would be less on a percentage basis was because a fully operational cache has so little leakage to begin with - "1000 times" less than the company's previous cache-transistor design, he explained.

Other news about the Nehalem-EX was, well, not really news. Memory controllers, for example, will be on-chip - an inclusion that Kumar referred to as "relatively trivial," possibly because AMD did it first. The non-trivial part of placing the memory on-chip, according to Kumar, was "how to do it cheaply." He assured his audience that Intel had found a way to do just that.

Like other Nehalem-class processors, this eight-core beast will use Intel's unfortunately named Turbo Mode to borrow power from idle cores and use it to boost the clock speed of active ones. Also, it will use the 20-lane-each-way, point-to-point QuickPath Interconnect (QPI) used in other Nehalem chips to communicate among CPUs and I/O controllers at 6.4Gt/s (gigatransfers per second), achieving a throughput of up to 25.6GB/s for each of the Nehalem-EX's four QPI ports.

QPI is smart enough to shut itself down when a socket contains a CPU that's either idle or absent - a useful, power-saving trait, seeing as how Rusu showed socket set-ups with up to eight CPUs.

Speaking of sockets, the Nehalem-EX's package is a 14-layer organic substrate and has a hefty 1,567 lands with a 40mm pitch. And although Rusu refused to give final die-size information, he did say that one advantage of the Nehalem-EX's wide-open spaces will be that "a large die with spread-out cores is relatively easy to cool."

And speaking of cooling, the Nehalem-EX will have nine thermal sensors, one for each of the cores and one in the uncore, which is chipspeak for elements that are not part of an actual processing core. In the Nehalem architecture, the uncore includes that hefty L3 cache, the memory controllers, QPI ports, and one-per-core phase-locked loop (PLL) circuitry, which keeps everything in step.

In keeping with the "leaner and greener" theme to be found in sessions throughout the conference, Rusu proudly pointed out that the Nehalem-EX is "100 percent lead and 100 per cent halogen free."

Prices for the Nehalem-Ex were not mentioned, but they'll certainly be nowhere near "100 per cent free." ®

Broader topics


Other stories you might like

  • Microsoft fixes under-attack Windows zero-day Follina
    Plus: Intel, AMD react to Hertzbleed data-leaking holes in CPUs

    Patch Tuesday Microsoft claims to have finally fixed the Follina zero-day flaw in Windows as part of its June Patch Tuesday batch, which included security updates to address 55 vulnerabilities.

    Follina, eventually acknowledged by Redmond in a security advisory last month, is the most significant of the bunch as it has already been exploited in the wild.

    Criminals and snoops can abuse the remote code execution (RCE) bug, tracked as CVE-2022-30190, by crafting a file, such as a Word document, so that when opened it calls out to the Microsoft Windows Support Diagnostic Tool, which is then exploited to run malicious code, such spyware and ransomware. Disabling macros in, say, Word won't stop this from happening.

    Continue reading
  • Linux Foundation thinks it can get you interested in smartNICs
    Step one: Make them easier to program

    The Linux Foundation wants to make data processing units (DPUs) easier to deploy, with the launch of the Open Programmable Infrastructure (OPI) project this week.

    The program has already garnered support from several leading chipmakers, systems builders, and software vendors – Nvidia, Intel, Marvell, F5, Keysight, Dell Tech, and Red Hat to name a few – and promises to build an open ecosystem of common software frameworks that can run on any DPU or smartNIC.

    SmartNICs, DPUs, IPUs – whatever you prefer to call them – have been used in cloud and hyperscale datacenters for years now. The devices typically feature onboard networking in a PCIe card form factor and are designed to offload and accelerate I/O-intensive processes and virtualization functions that would otherwise consume valuable host CPU resources.

    Continue reading
  • Apple’s M2 chip isn’t a slam dunk, but it does point to the future
    The chip’s GPU and neural engine could overshadow Apple’s concession on CPU performance

    Analysis For all the pomp and circumstance surrounding Apple's move to homegrown silicon for Macs, the tech giant has admitted that the new M2 chip isn't quite the slam dunk that its predecessor was when compared to the latest from Apple's former CPU supplier, Intel.

    During its WWDC 2022 keynote Monday, Apple focused its high-level sales pitch for the M2 on claims that the chip is much more power efficient than Intel's latest laptop CPUs. But while doing so, the iPhone maker admitted that Intel has it beat, at least for now, when it comes to CPU performance.

    Apple laid this out clearly during the presentation when Johny Srouji, Apple's senior vice president of hardware technologies, said the M2's eight-core CPU will provide 87 percent of the peak performance of Intel's 12-core Core i7-1260P while using just a quarter of the rival chip's power.

    Continue reading
  • AMD bests Intel in cloud CPU performance study
    Overall price-performance in Big 3 hyperscalers a dead heat, says CockroachDB

    AMD's processors have come out on top in terms of cloud CPU performance across AWS, Microsoft Azure, and Google Cloud Platform, according to a recently published study.

    The multi-core x86-64 microprocessors Milan and Rome and beat Intel Cascade Lake and Ice Lake instances in tests of performance in the three most popular cloud providers, research from database company CockroachDB found.

    Using the CoreMark version 1.0 benchmark – which can be limited to run on a single vCPU or execute workloads on multiple vCPUs – the researchers showed AMD's Milan processors outperformed those of Intel in many cases, and at worst statistically tied with Intel's latest-gen Ice Lake processors across both the OLTP and CPU benchmarks.

    Continue reading

Biting the hand that feeds IT © 1998–2022