Using its own sums, AMD claims it's helping save Earth with Epyc server chiplets
Smaller dies, less wafer loss equals lower emissions, exec claims
Comment AMD says its decision to ditch monolithic datacenter chips seven years ago in favor of a chiplet architecture has helped cut global greenhouse gas (GHG) emissions by tens of thousands of metric tons a year.
"Producing 4th-gen Epyc CPUs with eight separate compute chiplets instead of one monolithic die saved [about] 50,000 metric tons of CO2 emissions in 2023, through avoidance of wafers manufactured, approximately the same as the annual operational CO2 emissions footprint of 2022," Justin Murrill, AMD's director of corporate responsibility, said Monday.
To be clear, those GHG reductions are hypothetical; it's an estimate calculated on the back of an envelope using AMD's own methods, rather than directly measured. And anyway AMD hasn't produced a monolithic datacenter processor since Opteron was discontinued in early 2017. Instead, this whole thought experiment appears to have been contrived by AMD to show just how inefficient and wasteful monolithic designs, like those used by Intel and Nvidia, are compared to AMD's modular ones.
The core claim here isn't that AMD's chiplet architecture is more efficient than Intel or Nvidia's. Instead the House of Zen's argument is based entirely on yield rates for big chips vs little ones.
"The smaller the area of the chip, the more chips we can get per wafer and the lower the probability that a defect will land on any one chip. As a result, the number and yield percentage of good chips per wafer goes up and the wasted cost, raw materials, energy, emissions, and water goes down," AMD's Murrill explained here.
In other words, big chips are prone to defects resulting in lots of wasted silicon.
This is particularly true of leading edge process nodes, which tend to experience higher defect rates at launch and improve as the process becomes more mature. This is why we usually see smaller system-on-chips, like those found in mobile phones, tablets, and notebooks, embrace leading edge process nodes before larger datacenter silicon.
As we've previously discussed AMD was among the first to adopt a chiplet architecture with the launch of its first-gen Epyc processors in 2017. Rather than building one big expensive die, AMD stitched together multiple smaller ones, now called core complex dies (CCDs), using a high-speed interconnect fabric that allowed them to function as a single chip.
With its Rome and Milan generations, these designs employed up to eight CCDs and a single central I/O die responsible for memory, peripherals, and other functions. In addition to allowing AMD to modularly scale the compute density of the chips, adding more CCDs as core count demanded, it also allows the biz to use the optimal process tech for each die.
Beginning with AMD's 4th-gen Epyc processors, codenamed Genoa, it moved to an N-1 approach with a 6nm I/O die and up to a dozen 5nm CCDs. If you're wondering why Murrill's example specifically highlighted eight compute chiplets, we're told that's because when combined with the I/O die, that's roughly the same silicon surface area as a large monolithic die.
- AI cloud startup TensorWave bets AMD can beat Nvidia
- Standardization could open door to third-party chiplets in AMD designs
- TSMC boss says one-trillion transistor GPU is possible by early 2030s
- Nvidia turns up the AI heat with 1,200W Blackwell GPUs
Today, chiplets and/or multi-die chip assembly isn't unique to AMD. Intel and others have grown wise to its advantages and are increasingly employing it within their own designs.
Intel has been particularly bullish on the technique after initially teasing AMD's approach as a bunch of desktop chips "glued together" when first-gen Epyc first launched. However, it wouldn't be until 2023, when Intel's long-delayed Sapphire Rapids Xeons that its first chiplet-based processor would come to market.
Intel's chiplet architecture was later extended to the x86 giant's mobile line with the launch of its Core Ultra parts, code-named Meteor Lake, in December. Meanwhile, Ampere, Amazon Web Services, Apple, and Nvidia have employed some form of multi-die architecture in their products.
But while multi-die architectures have become more common, the chiplets themselves aren't always smaller, and thus don't necessarily benefit from improved yield rates. Intel's 5th-gen Emerald Rapids Xeon, Gaudi3 accelerators, and Nvidia's Blackwell GPUs are still using large, often reticle limited dies.
There are a couple of reasons for this. For one, the advanced packaging tech and interconnect fabrics necessary to stitch multiple chiplets together adds complexity. Fewer chiplets means fewer interconnect bridges and therefore less complexity and potentially better performance. You see, every time data moves from one chiplet to another it incurs a performance and/or latency penalty compared to a monolithic die.
AMD, for its part, seems to have figured the latter out, with its Instinct MI300-series chips, which use a combination of 2.5D and 3D packaging to stack GPUs and CPUs on top of the I/O dies capable of communicating with each of the chiplets at as much as 2.1TB/s.
Whether the additional complexity and cost associated with stitching so many chips together outweighs the improved yield rates from using smaller chiplets, we'll have to wait and see. ®