Uncore power gating
With the Nehalem family of chips, Intel was able to power gate the transistors in each core to shut that core down when it wasn't used. The core state was saved in the on chip cache and the uncore region kept running at full power. With the Westmere family, there is power gating for each core, but now the uncore is also gated.
The two-core Westmere mobile chips also have a dedicated and power-sipping static RAM on the chip saves the state of the cores so on chip caches can be powered down when not in use. (Why the server variants of Westmere do not also have this SRAM state cache is unclear, but apparently it does not).
The Westmere-EP chips implement Intel's HyperThreading variant of simultaneous multithreading, which gives each core two virtual threads to present to the operating system or hypervisor running atop the chip. The Westmere chips also have new cryptographic instructions that implement the Advanced Encryption Standard (AES) algorithm for encrypting and decrypting data.
Another new twist with the Westmere-EPs is that the memory controllers embedded on the chips can support low-voltage DDR3 main memory, which runs at 1.35 volts as well as standard DDR3 memory, which runs at 1.5 volts. The net effect of this change is that memory DIMMs run about 20 percent cooler when using the low voltage parts without sacrificing performance.
The Westmere-EP chips used in servers will very likely be called the Xeon 5600s when they start shipping.
Another system-related paper that Intel will be presenting next week at ISSCC that looks like it might have immediate and practical benefits for high-throughput systems is a new kind of chip-to-chip interconnect that looks like it beats the pants off of QuickPath Interconnect, the processor and memory linkage scheme that Intel debuted with the Nehalem chips last year. This experimental interconnect, which was not given a name, has about ten times the power efficiency of moving data from chip to chip than the current scheme.
According to Randy Mooney, an Intel Fellow and director of I/O research at Intel Labs, the traditional interconnect (like QPI) has to go from a chip, down through the package, out over the motherboard and back up through the socket and package to reach the cores on the other side of the mobo.
Using QPI , moving a terabyte of data between chips in different sockets might take 150 watts of juice, but the direct link - which is bolted on top of the chip package and links the chips more or less directly to each other - was able to move a terabyte of data between the chips only burning 11 watts.
Perhaps more significantly, when this interconnect drops into sleep mode, it only burns 7 per cent of the juice it needs when it is running, and it can wake up from the sleep state 1,000 times faster than QPI does. ®