This article is more than 1 year old
AMD's 128-core Epycs could spell trouble for Ampere Computing
We still have more cores, exec sniffs
Analysis With the unveiling of its 128-core Epycs, codenamed Bergamo, AMD has put forward a challenge to Ampere Computing's tentative footing in the cloud and hyperscale arena.
Despite the impending threat of another cloud-native chip, Ampere Computing chief product officer Jeff Wittich isn't too concerned. "I remain confident that we have a leadership position in this space," he told The Register.
Since the launch of its Altra processor family back in 2020, Ampere has found success building power-efficient, core-dense parts aimed at cloud-native, scale-out workloads. The strategy won the chipmaker a place in nearly every major public cloud and drove even higher core count components like the 128-core Altra Max and the 192-core AmpereOne.
But where Ampere once filled a hole in the market for workloads that prioritized core density over all else, the fledgling chipmaker now has Intel, AMD, and the full momentum of the x86 architecture to contend with.
Even still, Wittich says this competition is a sign that the company is on the right track. "We fully expected, as we were successful, others would follow us down that path," he said. "We're the one with several years of experience with a pretty robust group of customers that have been using our processors."
How do the chips stack up?
At first glance, AMD's Bergamo looks well positioned to compete with Ampere's Altra Max. Both chips feature 128 cores that have been stripped down and optimized for a variety of popular cloud workloads, including Nginx, Memcached, Redis, and FFmpeg to name a handful.
However, this is where the comparison starts to fall apart. Beyond core count, the two chips couldn't be more different. For one, Ampere's Altra Max is more than two years old.
Launched in 2021, Ampere's 128-core Altra Max was essentially a scaled-up version of the 80-core Altra processor introduced a year earlier. As such, it used the same off-the-shelf Arm Neoverse N1 core, which itself was already two years old at the time, and was fabbed using TSMC's already mature 7nm process tech.
Bergamo by comparison is using TSMC's 5nm and 6nm nodes across its compute and I/O dies, as well as a shrunken version of its Zen core architecture called Zen 4c. The latter allowed AMD to pack 16 cores into eight compute dies. So not only does Bergamo have the benefit of more efficient process tech, it is also sporting a brand new core design that's backed by faster memory and I/O – PCIe 5.0 and 12 lanes of DDR5 vs PCIe 4.0 and eight lanes of DDR4 on Altra.
So it shouldn't come as a surprise that AMD is claiming a pretty substantial lead over Ampere's Altra Max. In AMD's internal benchmarks, Bergamo claimed 2.9x higher performance on average in a variety of cloud-native workloads compared to Ampere's 128 core chip.
Short of running our own benchmarks in a controlled environment, it's hard to say how they actually compare, so we recommend taking AMD's claims with a grain of salt. With that said, we can get an idea of how the two chips stack up core-for-core by looking at their SPECrate Integer Base scores.
Single-socket submissions show that AMD's 128-core, 256-thread Epyc 9754 scores right around 922 in the benchmark, about 2.58x higher than Ampere's top-specced Altra, the M128-30, which comes in at 356.
While a clear win for AMD's Bergamo, it doesn't take into consideration other elements like power consumption. AMD's part is rated for 360W and can be configured up to 400W, while Ampere's has a TDP of just 182W. So yes, it may be 2.5x times faster, but it potentially uses 2-2.2x more power.
This is why Ampere has long preferred to look at performance for a given rack-power budget. In fact, Wittich claimed its Altra Max processors can beat Bergamo in the SPECrate Integer benchmark in rack-level performance.
The idea here is that for a given power budget, Ampere can fit more systems, and therefore more cores into the rack than AMD, resulting in higher rack-level power. However, when we asked Ampere to back those claims up, they informed us this was an estimate based on the available information. Another dash of salt please.
Bergamo's competition isn't Altra
While we can't blame AMD for making performance comparisons to Ampere's Altra family, since they're what you can buy and benchmark today, it's not exactly a great comparison.
Realistically cloud providers and hyperscalers aren't going to be cross shopping Bergamo Epycs against Ampere's two or three-year-old parts. Instead, the more interesting comparison is going to be against the chipmaker's second-gen AmpereOne lineup.
Announced in late May, AmpereOne picks up where Altra left off, offering SKUs ranging from 136 to 192 Arm cores of the chipmaker's own design. While Ampere hasn't said much about performance – well, apart from a handful of heavily cherry-picked and questionable benchmarks – they have promised instruction per clock (IPC) improvements over Altra, alongside improvements in virtualization, mesh congestion management, branch prediction, security, and power management.
Some of this performance is likely down to the chip's new cache configuration, which boosts per-core L2 cache to 2MB – that's twice Altra or Bergamo.
- Intel successfully ships an updated datacenter roadmap
- Ampere heads off Intel, AMD's cloud-optimized CPUs with a 192-core Arm chip
- AMD's latest Epycs are bristling with cores, stacked to the gills with cache
- Amazon isn't sold on AMD's tiny Zen 4c cores in manycore Bergamo processors
The cores themselves are housed in a single 5nm compute die, while I/O and memory functionality are broken out into multiple 7nm chiplets. Basically, the opposite of what AMD did with Epyc.
Despite the move to a more efficient node, Ampere's top-specced chips are now rated for 350W, putting it in the same ballpark as Bergamo. Unfortunately, we'll have to wait and see just how well AmpereOne holds up against AMD's cloudy Epycs.
Cloud contention
It doesn't matter how great your chip is if nobody wants it. And when it comes to the highly specific category of core-dense, cloud-centric chips, Ampere certainly has had the market cornered for the past three years.
With the notable exception of Amazon Web Services, nearly every public cloud provider – including Oracle, Microsoft, Google, Tencent, Alibaba, and Baidu – has deployed Ampere's Altra or Altra Max parts.
By comparison, AMD was rather quiet about which cloud providers planned to deploy Bergamo. However, among the hyperscalers, the chipmaker has notched at least one victory with Facebook parent Meta planning to deploy both its Genoa and Bergamo parts, but it remains to be seen in what quantities.
With that said, it's not uncommon for cloud providers to take their time with these things. While Oracle was among the first cloud providers to throw their weight behind Ampere, it wasn't until last summer that Google joined the party.
It's also not like AMD doesn't already have deep relationships with cloud providers and hyperscalers. Before Ampere showed up with Altra and Altra Max, AMD was the go-to chipmaker if you wanted to maximize on core density. Remember back in 2019, Intel's highest core count parts topped out at 28 cores, while AMD had just launched its 64-core Epyc 2 CPUs.
Bergamo may be able to compete against AmpereOne on sheer core count, but for cloud providers, native x86-64 support could very well be worth a 30 percent deficit in cores. We'll note that AMD isn't the only one promising ultra-core-dense x86 parts either. Intel's Sierra Forest Xeons, due out early next year if Intel's notoriously unreliable roadmap is to believed, will split the difference with 144 cores.
While Arm has made considerable progress certifying popular workloads for use on its cores, under its SystemReady Certification program, the fact remains that ISA is a relative newcomer to the datacenter space.
While there's plenty of software out there that runs just as well on an x86 core as an Arm, there's also plenty of software that doesn't. For reference, VMware's ESXi hypervisor remains an unsupported project, what they call a "Fling," after five years of development.
Because of this AMD and Intel may see victories simply due to the lower barrier to entry and architectural familiarity compared to Arm alternatives. ®