IDF13 Companies with workloads that like to ride on lots of threads and cores are going to be able to get a lot more bang for a two-socket box thanks to the launch of the "Ivy Bridge-EP" Xeon E5-2600 v2 processors by Intel.
Those with pesky applications that like faster clocks to get more work done, well, the process shrink is giving the Ivy Bridge Xeons more cache and microarchitecture tweaks as well as modestly faster cores. But as we all know, clock scaling is a lot more difficult than core scaling and that is ultimately making this a software problem for companies to solve.
The transition to 22 nanometer TriGate manufacturing processes for the workhorse two-socket server platform from Chipzilla will start a whole new refresh cycle out there in the data centers and data closets of the world. Or, that is the hope at least.
The most eager buyers will be those shops that have much older Xeon 5500 and 5600 systems out there, which have largely burned up their economic life and just do not offer the compute density and memory and peripheral expansion of a current "Romley" server platform sporting an Ivy Bridge-EP processor.
Server shipments and revenues have been on the wane in recent quarters. But x86 systems have fared better than other platforms in fighting the lowering tide caused by the shift to cloud computing (cloud operators tend to buy vanity-free and cheaper machines than service providers of years gone by did), the increasing use of server virtualization, the still awesome pace of Moore's Law (which allows an ever-increasing more computing capacity per chip), and the skittishness in certain parts of the global economy.
Now, with the Xeon E5-2600 v2 processors shipping, we get to find out in the coming quarters if there is pent up demand for x86 server capacity. If anything, the fact that x86 server shipments were more or less flat in the second quarter, by IDC's reckoning, would seem to indicate that demand is holding up pretty well. Some companies just can't wait to buy servers, even if new and presumably better stuff is coming soon.
Die shot of the ten-core Ivy Bridge-EP processor
The top-end twelve-core Xeon E5-2600 v2 chip has around 4.3 billion transistors and has an area of 541 millimeters square. The Ivy Bridge-EP processors are going to pack a pretty big punch compared to the Sandy Bridge-EP processors they replace in the Intel lineup.
Based on early benchmark test results from server makers that will be divulged in the coming days, Intel executives tell El Reg that customers can expect for the new Xeon E5-2600 v2 socket to deliver up to 50 per cent more performance and up to 45 per cent more performance per watt than the Xeon E5-2600 v1 line that was announced in March 2012. Those chips were also known as "Jaketown" by the server techies inside of Intel, and they call the new chip "Ivytown" sometimes just to be consistent. Somewhat.
Those performance figures are based on SPECVirt_sc2013 tests and the bang for the watt numbers come from SPECpower_ssj2008 tests. The performance that customers will see with the Xeon E5-2600 v2 processors will vary, of course.
The basic features of the Xeon E5 v1 and v2 processors
The performance, enabled by the shrink from the 32 nanometer processes used in the Xeon E5-2600 v1 processors, is enabled by a balance of more cores and more L3 cache memory on the chips, as you can see in the table above comparing the two chip families. The top-bin Ivy Bridge-EP parts have 50 per cent more cores, at a dozen per die, and 50 per cent more L3 cache, at 30MB, compared to the Sandy Bridge-EP chips.
The top base frequency and Turbo Boost maximum frequencies on the new Xeon E5-2600 chips only go up by 200MHz, which is only a 6.1 per cent jump in clock speed. That increased clock speed is basically added to the chip to make up for the extra latencies in taking the processor design up to a dozen cores from eight cores.
The increased performance is also enabled by some other tweaks. Main memory now runs at 1.6GHz for 1.35 volt memory (up from 1.33GHz) and at 1.87GHz for 1.5 volt sticks (up from 1.6GHz). The PCI-Express 3.0 controllers run at the same speed (8GT/sec) and there are the same 40 lanes of bandwidth coming into the on-die controllers as with the Sandy Bridge-EP parts. Main memory is also doubled up to a maximum of 1.5TB (through the use of 64GB sticks in the 24 slots in a two-socket system).
Different SKUs in the Ivy Bridge-EP line support different QuickPath Interconnect point-to-point interconnect speeds, as did their predecessors, and the QPI port count stands at two as it did with the Sandy Bridge-EP chips. Both families of chips support HyperThreading, Intel's implementation of simultaneous multithreading.
SMT virtualizes the instruction pipeline in the processor so, in this case, it can juggle two instruction streams at the same time and therefore get a slightly higher amount of work done than it might have been possible to otherwise do. (Provided your workloads are HT-friendly, of course.) Not every memory speed is supported on every chip - again just like its predecessor, the Sandy Bridge-EP.
Here's another new and interesting thing. There is not one Ivy Bridge-EP processor, but rather there are three different variants of the chip, each one tuned for specific workloads and each sporting different numbers of cores, memory controllers, cache sizes, frequencies, and thermal envelopes.
Block diagrams of the three Ivy Bridge Xeon E5 processors
The first variant has four or six cores active and the PCI-Express and QPI links as well as a single memory controller with four channels. It is, explains Ian Steiner, a processor architect at Intel's Beaverton, Oregon facility, aimed at both low-power uses as well as at workloads than need higher frequencies.
This one has 15MB of L3 cache and has a thermal envelope of between 40 to 80 watts. The cores, cache segments, QPI links, and PCI controllers are hooked to each other by double rings, just as was the case with the Sandy Bridge-EPs.
The second variant, which addresses the belly of the two-socket server market, offers six, eight, or 10 cores and has 25MB of L3 cache on the die. The same double rings link the core components together. In this case, the thermals range from 70 to 130 watts and again, there is a mix of low-power and higher frequency variants to target different kinds of workloads.
The third type of Ivy Bridge-EP processor is the full-on twelve-core beast, which comes in 115 watt and 130 watt options. Intel has killed off the 135 watt SKU for servers, but there is a 150 watt part for workstations, as in the past.
This chip has three rings linking the cores and cache segments to other components on the die, and as you can see, the memory controller is also broken into two but has half as many channels hanging off each controller to yield the same four channels per socket as the other Xeon E5-2600 v2 variants.
In the past, these might have been two or even three different processors, possibly with different sockets. But they are one processor family all sharing the same socket, and one that is identical to the earlier Xeon E5-2600 v1 processors from March 2012.
The Romley server platform from designed to take Ivy Bridge-EP parts
"The general goal is to do everything well," explains Steiner, and that cannot be accomplished with a single variant of the Ivy Bridge-EP processor. "We are interested in having some high frequency, low core parts." And the middle variant from six to ten cores was designed explicitly so it would have 25MB cache against six cores – again, precisely to match the needs of particular (and unnamed) customers.
"This is sort of right in the middle. You get good power efficiency, you get peak performance and you can push it all the way up to 130 watts if you want. The twelve-core is mostly targeted at peak performance, there are just 115 watt and 130 watt offerings, and there are no low power options. But I don't want to pretend that it is not a power-efficient SKU. It can actually be very power-efficient in a full rack deployment," he said.
The Ivy Bridge-EP core is identical to that used in the desktop Ivy Bridge Core parts from last year, and it sports a number of microarchitecture improvements. Steiner says that Intel is not just focused on single-thread performance improvement with each generation, but also boosting the instructions per clock and the power efficiency of the core.
"Long story short, we have added a bunch of stuff to make performance work better," says Steiner. The new Ivy Bridge core has a floating point 16-bit to single-bit precision converter, which won't be a "huge performance thing but it is nice for certain workloads," according to Steiner.
Programmers using earlier generations of Xeon have had to write routines to do copy/fill operations, and now there are a set of instructions with the unwieldy name of REP MOVSB/STOSB that means coders don't have to monkey around in assembler and they can just invoke these instructions to do copy/fill. The core also now has fast access to sets of registers by user threads, which is an optimization aimed precisely at server workloads running on machines with higher thread counts.
The Ivy Bridge core also includes the "Bull Mountain" random number generator, known as SecureKey. Other server-class chips already had random number generators, and now Intel has caught up.
The "Avoton" Atom C2000 chip also has the random number generator, and it also sports the OS Guard supervisor mode execution protection circuits that are embedded in the Ivy Bridge core.
You have to change the operating system code to make use of OS Guard, which protects against hacks that hijack kernel execution by preventing execution of user mode pages while in supervisor mode, such as the method used by Stuxnet.
It is supported in the Linux kernel already, and presumably support for OS Guard will come to Windows at some point. (It was not yet ready when Intel gave its briefings on the new Ivy Bridge chips for servers.)