SC11 Advanced Micro Devices was expected to launch its "Interlagos" Opteron 6200 server processors about now in conjunction with the SC11 supercomputing conference in Seattle.
But what wasn't known was that AMD was going to kick out the entry eight-core "Valencia" Opteron 4200 processors now, too, rather than do a two-step launch.
AMD took a two-step approach with its prior generation of server chips, rolling the twelve-core "Magny-Cours" Opteron 6100s, the big guns, onto the field of the ongoing x86 server chip war in March 2010 for two-socket and four-socket servers and following up with the six-core "Lisbon" Opteron 4100s for machines with one or two sockets in June 2010. The Opteron 6100s got a deep bin sort and a speed boost in February this year and otherwise it has been all quiet on the Opteron front.
AMD has been giving the Opteron 4100 and 6100s air support before they entered the field, talking about the new design of the "Bulldozer" core and how it will make for better server chips that can meet a widening array of workload, performance, and thermal requirements.
The Bulldozer core: share some things and reduce power draw
The Opteron server chips using the Bulldozer cores are implemented in GlobalFoundries' 32nm, 11-metal layer, high-k metal gate, silicon-on-insulator wafer-baking processes. The former AMD foundry, which was spun out three years ago, has had some trouble ramping up this 32 nanometer process, giving AMD headaches and also meaning it could not meet demand for the PC and server chips based on the Bulldozer cores.
As El Reg explained in detail earlier this year when AMD's techies divulged some secrets about the core design at the IEEE's International Solid-State Circuits Conference, the Bulldozer core module has some components shared across two cores, but also gives each core its own thread (with no simultaneous multithreading). AMD refers to this as having "two strong cores" in contrast to the HyperThreading virtual cores Intel puts in its Core and Xeon processors. Each core – which means an integer unit and a floating point unit – in the Bulldozer module has its own integer unit scheduler and L1 data caches, but the cores share fetch and decode units as well as a floating point scheduler and L2 cache memory.
Each integer unit in each Bulldozer core has four pipelines, capable of executing one instruction per cycle. A Bulldozer core module has two 128-bit floating point units, which can do two 64-bit double-precision operations per clock or four 32-bit single precision operations. If one core is not using its floating point unit during a cycle, then the other core can take all 256 bits and do four double-precision or eight single-precision ops in a single clock cycle.
AMD was originally calling this feature an AVX mode, but is now on announcement day calling it Flex FP. Flex FP does support AVX operations. The floating point unit has new multiply-accumulate functions and also supports a bunch of new instructions, including SSE3, SSE4.1, and SSE4.2 SIMD extensions, on-chip AES encryption/decryption, and PCLMULQDQ, which is used to perform a carry-less multiplication of two 64-bit integers. AMD has also added new instructions called XOP and FMA4, which are tweaks to 128-bit SSE5 and SIMD instructions that is more compatible with Intel's AVX implementations.
The Bulldozer module has 2MB of L2 cache memory and has a total of 213 million transistors; it has an area of 30.9 square millimeters and is designed to run at between 0.8 to 1.3 volts. Each core in the Bulldozer module has 16KB of data cache and there is 64KB of shared instruction cache per module. The module has 1MB of L2 cache per core (twice that of the prior Opteron 4100 and 6100 chips), and the four-module chip package has a third more L3 cache per chip, at 8MB.
The Bulldozer cores have a new memory controller that can support up to 384GB of memory per socket (up from a too-skinny 128GB with the prior controller) as well as DDR3 memory running at 1.6GHz. AMD says that the new controller can support load-reduced (LR-DIMM) main memory, which allows more memory chips to be packed onto a memory stick, and 1.25 volt (ultra-low-volt) memory will also be supported in addition to the 1.5 volt (regular) 1.3 volt (low-volt). The new memory controller has "aggressive power down" and "partial power down" settings as well as memory power capping to keep systems within the thermal envelopes set by administrators.
Here's what the Bulldozer module looks like:
To make an eight-core Valencia Opteron 4100, you put four of these Bulldozer modules on a single piece of silicon and wrap them up with a shared DDR3 main memory controller and 8MB of L3 cache, like this:
To make a 16-core Opteron 6200 processor, you put two of these in a single package, like this:
The one thing that the new Opteron processors do not have is support for PCI-Express 3.0 peripherals, either on the chip itself or in the chipset. The forthcoming "Sandy Bridge-EP" Xeon E5 will have on chip PCI-Express 3.0 controllers, as El Reg revealed back in May.
"If you ask our competitor, PCI-Express 3.0 is a big deal," says John Fruehe, director of product marketing for servers and workstations at AMD. "If you ask anyone else, it doesn't make a stinking difference. The important thing is that PCI-Express 3.0 forces a platform change that only benefits a few select applications. We'll be there when it is relevant. For us, it is more important to time it right than to be first to market."
That is precisely why AMD didn't rush to support DDR3 main memory with the Opterons, or goose the memory controllers with more capacity.
The Interlagos chip has a total of 2.4 billion transitions, which means the Valencia chip has 1.2 billion.