2024 sure looks like an exciting year for datacenter silicon
Loads of chips from Nvidia, AMD, Intel on the way – and very probably some surprises along the way as well
Comment The new year is already shaping up to be one of the most significant in terms of datacenter silicon we've seen in a while. Every major chip house is slated to refresh their CPU and/or GPU product lines over the coming twelve months.
Nvidia has a slew of new accelerators, GPU architectures, and networking kit planned for 2024. Intel will launch arguably its most compelling Xeons in years alongside new Habana Gaudi AI chips. Meanwhile AMD, riding high on its MI300-series launch, is slated to bring its 5th-gen Epyc processors to market.
In no particular order, let's dig into some of the bigger datacenter chip launches on our radar in 2024. Oh, and if you think we missed one, let us know in the comments or email.
Nvidia's HBM3e-toting H200 AI chips arrive
Among the first new chips to hit the market in 2024 will be Nvidia's H200 accelerators. The GPU is essentially a refresh of the venerable H100.
You might expect the latest chip to offer a performance uplift over its older sibling, yet it won't in the conventional sense. Dig through the spec sheet and you'll see the floating point performance is identical to that of the H100. Instead, the part's performance uplift — Nvidia claims as much as double the perf for LLMs including Llama 70B — is down to the chip's HBM3e memory stacks.
We're promised the H200 will be available with up to 141 GB of HBM3e memory that's good for a whopping 4.8TB/s of bandwidth. With the rise in popularity of LLMs – such as Meta's Llama 2, Falcon 40B, Stable Diffusion and others – memory capacity and bandwidth have an outsized impact on inference performance — namely how big a model can you fit into a single accelerator or server, and how many requests can you handle simultaneously.
As we recently explored in our analysis of AMD and Nvidia's benchmarking debacle, FLOPS aren't nearly as important as memory capacity and bandwidth when it comes to these kinds of AI workloads.
Hopper's replacement arrives with 'Blackwell' architecture
According to an investor presentation in 2023, the H200 won't be the only GPU we see from Nvidia in 2024. In order to cement its leadership, Nvidia is shifting to a yearly release cadence for new chips and the first new part we see from Team Green will be the B100.
As we understand it, the "B" here is short for the microarchitecture's name, Blackwell — presumably a nod to American statistician David Blackwell. We still don't know much about the part other than it's coming in 2024. Our colleagues at The Next Platform have a few thoughts on the B100 here.
As it stands, AMD's newly launched MI300X GPUs not only push more FLOPS than the H200, they have more and faster memory to boot. We can't imagine Nvidia is happy about this, especially considering how defensive the US giant is of late. Because of this we fully expect the B100 to deliver more FLOPS and more stacks of HBM3e that will push the accelerator's memory capacity and bandwidth to new heights.
Alongside the GPU itself, Nvidia's roadmap includes more CPU-GPU superchips called the GB200 and GB200NVL. Whether these processors will continue to use Arm Neoverse V2-based CPU cores found in the current crop of Grace and Grace-Hopper Superchips, or whether they'll feature some next-gen core, remains to be seen.
Then there's the B40. Historically these sorts of cards have targeted smaller enterprise workloads that can run within a single GPU. The part will replace the L40 and L40S and consolidate Nvidia's enterprise GPU lineup under a single overarching architecture.
Arguably the most interesting component of Nvidia's accelerated roadmap has to do with networking. Nvidia is looking to move to 800Gb/s connectivity with Blackwell, though as we've previously explored, this presents some unique challenges on account of PCIe 5.0 being nowhere near fast enough and PCIe 6.0 still being a little ways off.
When we might see these Blackwell cards is still up in the air, but, if history is anything to go by, we may not have to wait that long. Nvidia has a long history of pre-announcing accelerators months (and admittedly sometimes years) before they're actually available for purchase.
Nvidia was teasing its Grace-Hopper Superchip in early 2022, but as we understand it, those parts are only now making their way into customers hands. So, we could have more details on the Blackwell-based parts as early as GTC.
Intel rings in new year with an all-new accelerator of its own
Keeping with the topic of accelerators, Intel is slated to reveal its third-gen Gaudi AI chips sometime in 2024.
The part is significant as with the cancellation of Rialto Bridge, the successor to Ponte Vecchio, Habana Lab's Gaudi3 represents the best Intel has to offer the AI training and inference arena — at least until Falcon Shores arrives in 2025.
While Nvidia and AMD have a habit of teasing and hyping their product releases for months, Intel has been exceptionally tight-lipped about the part. Most of what we've seen so far comes from this presentation slide, which it has been showing off since at least its Innovation event in September:
The slide claims Gaudi3, a 5nm chip, will have 4x the Brain Float 16 (BF16) performance of the 7nm version 2, plus twice the network bandwidth and 1.5x the HBM bandwidth.
Usually these figures would give us a starting point to extrapolate relative performance figures. Unfortunately to do that, Intel would have to tell us what Gaudi2's BF16 performance actually was. We've asked, and they don't want to talk about it, despite claiming that 4x improvement in Gaudi3. Intel instead wants to focus on real-world performance rather than benchmark comparisons.
It's a frankly baffling marketing decision as the claim is essentially meaningless without a frame of reference. Also, by the looks of it, the x86 giant is using eight HBM stacks this time around rather than six.
In addition to Gaudi3, we're told there'll be a version of the Gaudi2 tuned down — again — for the Chinese market (to stay on the right side of US export restrictions to the nation) and Intel claims it'll ship before Nvidia's rumored H20 chips made it to the mainland.
Intel joins the cloud CPU cadre with Sierra Forest
Meanwhile on the CPU front, Intel has a double feature planned for 2024 that'll use its long delayed Intel 3 process tech. To be clear, Intel didn't suddenly make the jump to 3nm. It has been working on this node, which it used to call 7nm, for years. It was eventually rebranded Intel 4 and Intel 3 to bring it closer in line, marketing-wise, with competing nodes' transistor densities.
We'll get the first of these Intel-3-based Xeon processors in the first half of 2024. Codenamed Sierra Forest, the chip can be equipped with a pair of 144-core dies for a total of 288 CPU cores per socket. Naturally, these are not the same class of cores we've seen in past Xeons. They’re an evolution of Intel’s efficiency-core architecture that started showing up in PC and notebook processors back in 2021 with the launch of Alder Lake.
But while those chips are usually accompanied by a set of performance cores, Sierra Forest is all e-cores and is designed to compete with Ampere, AMD, and the slew of custom Arm CPUs being deployed by cloud providers like AWS and Microsoft.
The e-cores used in Intel's Sierra Forest Xeons will feature a streamlined core architecture optimised for efficiency and throughput
Intel's claimed advantage is that it can pack more cores into a single socket or chassis than anyone else while maintaining compatibility with the majority of x86 binaries. We say the majority because the e-cores don't have the same feature set as past Xeons.
Two of the biggest differences is the outright lack of AVX512 and Advanced Matrix Extension (AMX) support. The argument here is that many of the workloads we see broadly deployed in the cloud — stuff like Nginx — don't necessarily benefit from these features, so, rather than dedicating a large amount of die space to big vector and matrix calculations, that space can instead be used to pack more cores onto each die.
Not every chip house agrees with this approach, however. AMD took a very different approach with its Bergamo Epycs, launched in spring 2023. Those server processors used a compact version of AMD's Zen 4 core called Zen 4c which traded clock speeds for a smaller footprint. This allowed AMD to pack 128 cores into eight compute dies per processor package without sacrificing functionality.
Both approaches have merit. Depending on the hypervisor, the lack of certain CPU features can make migrating workloads from one box to another problematic. Intel hopes to overcome this with AVX10, which we took a deep dive into over the summer. In a nutshell it's designed to back port many of the more attractive features, such as FP16 and BF16 support, from AVX512 to AVX2. The result is you're less likely to run into this kind of migration trouble unless you really do need 512-bit wide vector registers.
Intel gets down to earth with Granite Rapids
Moving on into lesser known territories, there's Intel's Granite Rapids Xeons, due out sometime later in 2024. While Sierra Forest prioritizes loads of tiny cores, Granite Rapids is a more traditional Xeon server processor built around the x86 giant's performance cores.
We still don't know how many cores it'll have or how fast the top binned parts will be clocked, but we're told it'll be more than Emerald Rapids. We do know that the chip will feature a more modular chiplet architecture than either Sapphire or Emerald Rapids, with up to five dies — three compute and two I/O — per package.
Depending on the SKU, the chip will feature more or fewer compute dies allowing Intel to take advantage of the modularity that AMD has enjoyed for years. Previously 2023's Xeons either had one large medium-core-count (MCC) die or two large (Emerald) or four smaller (Sapphire) compute dies on the so-called "extreme core count (XCC) dies.
Intel's 6th-Gen Xeon Scalable processors – Sierra Forest and Granite Rapids – will come in e-core and p-core versions and support up to 12 channels of DDR5, or so we're promised
Intel's next-gen Xeons disaggregate the I/O functionality into a pair of dies that sandwich the compute. These I/O dies are important as they help to close the gap with AMD, which has not only held a core count advantage for the past five years, but has usually offered more, faster PCIe lanes and memory channels.
As we learned during the Hot Chips conference in 2023, Granite Rapids will feature 12 memory channels — the same as AMD's Epyc 4 — and will support 8,800MT/s MCR DIMMS. MCR is rather cool as it'll allow the chip to deliver 845GB/s of memory bandwidth to the chip. That's not quite the 1TB/s Intel's 4th-Gen Xeon Max parts are capable of with their onboard HBM, but MCR DIMMS will get close and allow for substantially higher capacities.
The chip family will also sport up to 136 PCIe / CXL lanes though only at PCIe 5.0 speeds. PCIe 6.0 may be coming in 2024, but not in time for Intel's "next-gen" Xeons.
AMD's Zen 5 arrives
Then of course, AMD is due to launch Turin, its fifth-generation of Epyc server processors powered by new Zen 5 cores. At this point there's not much that we can say about this part other than it's coming sometime in 2024.
Given the timing, we can make a few assumptions. We'd wager the chip will use either TSMC's 4nm or 3nm process tech in its compute tiles, but it's hard to say whether the I/O die will get a process shrink just yet.
Beyond this we can only point to recent leaks shared via Xitter that suggest AMD could once again boost core counts across its lineup. If the leaks hold true, we may be looking at Epyc processors with up to 128 Zen 5 cores or 192 Zen 5c cores.
The core complex dies (CCDs) themselves don't appear to have changed much from Genoa and Bergamo with eight or 16 cores per chiplet, respectively. Instead AMD will reportedly use 16 compute dies on its general purpose and 12 compute dies cloud-centric platforms to achieve claimed core counts. Having said that, we'll have to wait and see whether the leaks prove accurate.
AMD's Eypc product line has grown more complex in recent years, now spanning general purpose, high performance compute, cloud, and edge applications. AMD has traditionally rolled out these chips over the course of about a year. Epyc 4 launched in November 2022, Bergamo and Genoa-X arrived in June 2023, and its edge focused Siena parts didn't show up until September.
- Why Nvidia and AMD are roasting each other over AI performance claims
- How thermal management is changing in the age of the kilowatt chip
- AMD thinks it can solve the power/heat problem with chiplets and code
- Facing stiff competition, Intel's Lisa Spelman reflects on Xeon hurdles, opportunities
Surprises await
To be absolutely clear, this is by no means an exhaustive list of datacenter processors coming in 2024. We fully expect there to be more than a few surprises over the next twelve months, especially as the AI hype train gains speed and cloud providers continue to embrace custom silicon.
Microsoft recently ventured into the custom AI and CPU space, while Google already has several generations of tensor processing units and is rumored to be working on a CPU of its own.
We'll also be watching Arm's efforts to push its Neoverse core architecture and Compute Subsystems (CSS) IP stacks. The latter is the closest we've seen Arm come to designing an entire processor itself in modern times.
There's also the slew of semiconductor startups, like Ampere, Graphcore, Cerebras, SambaNova, Groq, and others looking to carve out a niche in the AI new world order. We'd hardly be surprised to see new silicon, products, and systems from any one of these suppliers in 2024. ®