Rome wasn't built in a day, wasn't teased in a day, either: AMD's 7nm second-gen 64-core Epyc server chips finally land
After what feels like months of drip-fed info, here comes some much needed competition in the data center world
Updated Chip biz AMD today, after months of teasing, officially debuted the second generation of its Epyc server processor family in San Francisco, promising performance, efficiency, throughput, and security improvements.
At a keynote presentation on Wednesday afternoon, Lisa Su, president and CEO of AMD, claimed the second generation Epyc, code-named Rome, is "the highest performance x86 processor in the world." Pointing to Intel's Cascade Lake line as a baseline, Su said, "We are almost double their performance." That translates to somewhere between 40 per cent and 50 per cent lower operating expenses, according to Su.
"We are the smaller player in the industry but we are passionate about what we do," said Mark Papermaster, CTO and EVP of technology and engineering at AMD, during a technical session for press and analysts. "You will see no let up from AMD in this race."
To compete against Intel's Xeon line, the second-gen Epyc 7xx2 series processors will be built as nine-die packages, rather than the four-die approach used in the first generation. Eight chiplets fabricated using a TSMC 7nm process, with up to eight x86-64 CPU cores each (running one or two hardware threads), surround a 14nm-process central IO controller die. This is what AMD calls hybrid multi-die architecture, which stands in contrast to the traditional monolithic design approach favored by Intel (though Chipzilla is now following AMD's lead and moving to multi-die).
Eight chiplets with eight 7nm CPU cores each adds up to 64 CPU cores, and 128 hardware threads, per Rome socket versus the maximum 56 cores at 14nm Intel right now offers. TSMC's 7nm is comparable to Intel's super-delayed 10nm process node, and Intel isn't shipping 10nm server-class processors until next year, giving AMD, and its 7nm chips available now, a healthy lead in this respect.
"Multiple dies have an inherent yield advantage," explained Kevin Lepak, chief architect for AMD Epyc processors, "because if you get a defect in one you don't have to throw the whole thing away."
AMD is using this yield advantage and other technical innovations to compete more effectively against Intel. The rise of its stock price since early this year suggests investors believe the company has the goods to make headway against the world's number one chipmaker. It was generally accepted that IT buyers would try out the first-generation Epyc family before committing to the second generation. That second generation is now here, and reach to eat into Intel's 95 per cent stranglehold of the data center compute market.
According to AMD benchmarks, the latest Epyc series performs well compared to the competition. Where Intel's second-gen Cascade Lake SP gets 282GB/s using 12 DDR4-2933 memory modules, Epyc 7xx2 can manage 410GB/s using 16 DDR4-3200 chips – a 45 per cent advantage in memory bandwidth, or so AMD claims.
Epyc 7xx2 provides up-to-18GT/s socket-to-socket connectivity via Infinity Fabric, AMD's interconnect system. Its predecessor topped out at 10.7GT/s. And it is apparently the first PCIe-4.0-ready x86 server system-on-chip, with 128 PCIe 4.0 lanes (plus an extra lane for the motherboard BMC) in a single socket and peak PCIe bandwidth of 512GB/s. Its eight chiplets each have x16 links supporting 64GB/s bid-directional bandwidth.
Epyc Rome includes a secure processor within the system-on-chip – a 32-bit ARM Cortex-A5 – with off-chip non-volatile storage for firmware and data, to handle cryptographic functions, secure key generation, and key management. Its memory encryption has been expanded to support 509 memory keys (the number of guest VMs supported), up from 15 in gen-one.
Overall, Epyc's Zen 2 architecture promises 15 per cent more instructions-per-clock cycle than the previous generation, twice as much AVX2 floating-point throughput per clock cycle, three times as much load/store bandwidth per cycle, twice as much L3 cache per core, and four times as much L3 cache per socket.
Epyc v2 also brings new architectural features, including an APIC extension for high-core-count systems, quality of service monitoring and enforcement of memory bandwidth, user mode instruction prevention, nonvolatile memory enhancement (cache-line writeback), an instruction to write back the cache without invalidating it (WBNOINVD), and an instruction to read the processor register at the user level (RDPRU).
The Epyc series will be offered in eight, 12, 16, 24, 32, 48, and 64 CPU core configurations, as single socket or dual socket. The TDP ranges from 120W to 225W, total cache from 256MB to 128MB, and clock speeds from 3.4GHz max down to 2.25GHz, depending on how many cores are functionally present. Full specifications can be found here, on AMD's website, or summarized here.
To make its performance claims a bit less abstract, AMD turned to the vendors it works with – not quite neutral third parties. According to Su, the new chip family has set 80 performance records. Hewlett Packard Enterprise CTO Mark Potter said HPE has recorded 37 world-record results running benchmarks with its Epyc-powered ProLiant DL325 and ProLiant DL385 servers.
In the TPC Express Benchmark V involving single processor server performance running virtualized databases, HPE claims a 321 per cent improvement. In terms of power efficiency, the new Epyc moved the needle 28 per cent.
Jen Fraser, senior director of engineering at Twitter, took a turn on stage to tout the power and cost savings of second generation Epyc chips, which have been undergoing testing at the social network.
"The performance we're seeing on Rome processors is actually reducing the power consumed per core," said Fraser, noting that Twitter is able to run 40 per cent more cores per rack (from 1240 to 1792) while maintaining the same power and cooling. The result, she said, is a 25 per cent reduction in total cost of ownership.
Next, Bart Sano, veep of engineering at Google, showed up to announce that the web giant has already deployed Epyc second-generation processors in its data centers. “We’re already seeing some great performance for a variety of workloads, ” he said, adding that Rome will be available to Google’s cloud customers via Google Compute Engine.
Time to Ryzen shine, Intel: AMD has started shipping 7nm desktop CPUs like it's no big dealREAD MORE
"AMD took a big step forward today in the datacenter with its launch of the second generation Epyc processor and platform," said Patrick Moorhead, founder of consultancy Moor Insights & Strategy, in an email to The Register. "It is a bigger leap forward than I had expected."
Moorhead said AMD had improved the shortcomings of its first generation Epyc, with 15 per cent better single-thread performance and core scaling, as well as the addition of new RAS (uncorrectable DRAM error entry) and security (Secure Memory Encryption, Secure Encrypted Virtualization, 509 keys) capabilities, not to mention multi-core performance gains.
AMD gained low, single-digit market share with its initial Epyc offering, said Moorhead, and the second generation should continue that trend. "Enterprises don't mass deploy any first gen product, they didn't deploy first generation Epyc, but they will deploy the second generation Epyc," he said.
Moorhead expects AMD to excel in some but not all applications, pointing to Hadoop RT analytics, Java throughput, fluid dynamics, and virtualization. Intel, he said, should have advantages on low latency machine learning inference workloads, because customers can utilize Intel's DLBoost instructions, and in-memory database workloads that use Optane DC.
The IT industry is eager for more hardware competition, said Moorhead, but AMD, lacking the investment Intel has made in the enterprise value chain, still needs to lean on vendors like HPE, Dell and Lenovo, none of which have much of a recent track record creating demand for AMD kit.
"AMD has already proven itself to public cloud providers and need to shift that momentum to enterprises," said Moorhead. "As part of that future roadmap, AMD needs to disclose how it will optimize latency-sensitive, ML inference workloads as well as traditional big data and how it stays ahead of Intel."
We're told systems featuring Epyc 7xx2 processors are available now. ®
Updated to add
You can grab the latest analysis of Rome over on our HPC sister site, The Next Platform, right here.