This article is more than 1 year old
AMD's 96-core Epyc CPUs leapfrog Intel to put DDR5, PCIe 5.0 in the datacenter
Its also twice as fast as Milan, AMD execs claim
AMD's status as scrappy underdog trailing in Intel's wake has been upended. The chipmaker has managed to pull out ahead of rival Intel with the launch of its fourth-generation Epyc "Genoa" processors this week.
The latest evolution of AMD's server platform not only boosts core counts to 96 and clock speeds as high as 4.4GHz, it also beats Intel's Sapphire Rapids to market as the first x86 CPU in the datacenter with support for the DDR5, PCIe 5.0, and Compute Express Link (CXL) standards.
The CPUs are based on the same Zen 4 microarchitecture and TSMC 5nm process nodes we saw in AMD's Ryzen 7000-series desktop chips earlier this year. This, according to AMD Fellow and the creator of Zen, Mike Clark, contributes to a 14 percent instruction per clock (IPC) improvement over Zen 3.
However, the IPC gains are only part of the story. Combined with higher core counts and clock speeds, Ram Peddibhotla, AMD VP of Epyc product management, claims its flagship 96 core Epyc 4 CPUs are twice as fast as last year's 64-core Milan parts in a variety of cloud, high-performance compute (HPC), and enterprise benchmarks. As usual we recommend taking these claims with a healthy grain of salt.
More cores, higher power consumption
Taking a peek under Genoa's now even larger heat spreader — yep, there's a new socket too — reveals just how AMD has managed to cram so many cores into a single package. You guessed it, more chiplets.
The larger package makes room for four additional Core Complex Dies (CCDs), bringing the total to 12. The actual core layout of these chiplets, however, remains largely unchanged from Milan, with eight cores sharing 32MB of L3 cache between them. What is new is the move from TSMC's 7nm to its more advanced 5nm process and the use AMD's Zen 4 cores, which doubles the L2 cache to 1MB per core.
AMD also juiced the clock speeds of the chips by several hundred megahertz across the board, albeit apparently at the expense of higher thermals. Default thermal design power (TDP) core-for-core hasn't changed too much from last generation — hovering around 280W for AMD's 64-core parts — but now customers that want to extract the highest frequencies possible from these chips can now configure them up to 400W. That's 120W increase in power consumption compared to Milan.
Genoa's higher configurable TDPs are hardly surprising given the industry-wide trend toward hotter, more power dense CPUs and GPUs over the past few years. Just this week we learned that Intel's 56-core HBM-stacked Sapphire Rapids CPUs — now called "Intel Xeon CPU Max" — will consume around 350W, putting it just a hair below AMD's 96 core Epyc 4 9654 at a default TDP of 360W. Meanwhile, in the GPU space, vendors like Nvidia are already pushing 700W of power consumption on a single SXM module.
While AMD will tell you Epyc 4 squeezes more work out of each watt, that doesn't change the fact that the higher power envelope poses a challenge for server builders tasked with finding a way to dissipate all of that heat, and the datacenter operators that have to power those systems.
Memory however you want it
Looking beyond raw performance Epyc 4 also delivers a number of memory and I/O improvements over Milan. The CPUs are AMD's first datacenter chips with support for DDR5 memory.
Genoa's I/O die — which is now based on a TSMC 6nm process as opposed GlobalFoundry's 14nm tech — supports 12 channels of DDR5 4,800 MT/sec up to 6TB per socket. According to AMD, this works out to a maximum theoretical memory bandwidth of 460GB/sec when all 12 channels are populated with 4,800MT/sec DDR5 memory.
Of course, even at one DIMM per channel populating all those channel could prove tricky, especially in traditional dual socket systems.
Genoa also increases the number of interfaces to 160 lanes of PCIe 5.0 and adds 64 lanes dedicated to CXL. Of the PCIe lanes 32 can be dedicated for SATA connectivity, while dual socket systems gain an additional 12 "bonus" lanes of PCIe 3.0 connectivity.
- Intel takes on AMD and Nvidia with mad 'Max' chips for HPC
- AMD refreshes desktop CPUs with 5nm Ryzen 7000s that can reach 5.7GHz with 16 cores
- After spate of delays, Intel promises Sapphire Rapids Xeons for early 2023
- AMD's Epyc 4 will likely beat Intel Sapphire Rapids to market
Speaking of CXL, Genoa is the first x86 platform with support for the cache-coherent interface. While future iterations of CXL will enable full composable infrastructure, early implementations of the tech are focused squarely on memory expansion.
This is where AMD is focusing its attention for its first foray into CXL. Genoa supports a modified version of CXL 1.1 that backports support for tier-memory configurations. And AMD clearly expects CXL will be a hit in the datacenter as it's already extended its memory encryption tech used in confidential computing — called SEV-SNP — for these memory expansion modules out of the box.
Even though AMD supports CXL, that doesn't mean the ecosystem is necessarily ready to take advantage the new tech. While some vendors like Samsung and Astera labs have announced CXL memory modules, the standard is still in its infancy.
And those hoping to take advantage of more advanced CXL accelerators will have to wait until AMD ships a CPU that fully supports the CXL 2.0 spec required for technologies like memory pooling.
Eating Intel's lunch
While AMD may have a head start on CXL, PCIe 5.0, and DDR5 in the datacenter, it won't be long before Intel brings its Xeon processors back into feature parity.
Intel had hoped to beat AMD to market with its 4th-Gen Xeon Scalable processor, codenamed Sapphire Rapids, by more than a year. Unfortunately repeated delays to the CPU have put it, and Argonne National Laboratory's Aurora Supercomputer, woefully behind schedule.
As of the latest delay earlier this month, Intel expects the first volume shipments of the chip to hit the market in Q1 2023.
And of course, AMD didn't miss the opportunity to capitalize on Intel's struggle bringing the chip to market. Pointing to Intel's 40-core Xeon Platinum 8380 CPUs — for the moment the chipmaker's fastest available — AMD claims Genoa is anywhere from 2.5x to 3x faster in the popular SPECrate 2017 floating point and integer benchmarks respectively.
Of course that's with more than twice the cores per socket. In a core-for-core comparison Peddibhotla estimates Genoa outperforms Intel's Ice Lake generation by closer to 50 percent in SPECrate 2017's integer benchmark and 78-96 percent in the floating point benchmark.
And while Intel has held an advantage over AMD in workloads that rely on the AVX-512 instruction set for things like deep learning and AI inferencing. With the move to Zen 4, AMD has closed the gap with native support for large-vector workloads.
As such, we'll have to wait until Intel's Sapphire Rapids launches early next quarter to get a better sense of how the chipmaker's long-delayed chips stack up against Genoa.
Go deeper... For a full analysis, details of configurations, connectivity, and more, check out our friends at The Next Platform.
At launch Genoa can be had in 18 flavors ranging from 16 cores on the low end to 96 cores at the top of the stack.
As with previous generation Epyc processors, AMD will offer many of these parts in configurations that prioritize per-core performance, core density, or a combination of the two.
The first Epyc systems from AMD's OEM partners are available for order starting today, with systems making their way into customer's hands as early as December.
It's also worth noting that Genoa is only the first of four Zen 4-based datacenter CPUs slated to launch over the next year. AMD's cloud-focused Bergamo CPUs will bump up the core count again to 128, though we're told at the expense of smaller caches. These chips appear to be aimed at combatting Ampere's 128 Core Altra Max processors, which have seen widespread adoption in among public cloud providers, including Microsoft Azure, Google Cloud, and Oracle Cloud Infrastructure.
AMD also has another cache-stacked Epyc in the works, codenamed Genoa-X, that should compete directly with Intel's HBM stacked Xeon Max processors, as well as a telecom and edge focused chip called the Siena that goes after Intel's stronghold at the edge.®