A mix-and-match chiplet marketplace for processor makers is still a long way off
Universal Chiplet Interconnect Express is on the rails
Analysis As Moore's so-called Law continues to slow, many chipmakers are turning to advanced packaging and chiplet techniques to drive greater efficiencies and performance than what's possible with process shrinks alone.
AMD's Instinct MI300 family of accelerators, which it showed at its datacenter and AI event in June, is just the latest example of the shift. The GPU-version of the chip uses 3D packaging to stack eight 5nm CDNA 3 GPU dies on top of four 6nm I/O chiplets attached via TSMC's 2.5D packaging tech to a series of HBM3 modules around the periphery. The APU variant swaps two of those GPU dies for a trio of eight-core chips straight off AMD's Epyc Genoa platform.
Intel has employed similar packaging techniques throughout its CPU and GPU lineup. Its Ponte Vecchio GPUs use a combination of its EMIB and Foveros packaging tech to stitch together 47 chiplets — or tiles because Intel insists on being different — into a part that behaves like a single accelerator.
"It's the biggest inflection point in semiconductors since the dawn of RTL and synthesis way back in the late '80s," semi-con biz Synopsys VP John Koeter said of multi-die architectures in a recent interview with The Register.
Multi-die architectures offer a lot of advantages with scalability and modularity, but there are other benefits like being able to place memory closer to compute cores and matching process tech to optimize for various components. For example, it's not uncommon for the I/O dies to use older process tech because the analog components don't benefit from smaller nodes the same way that CPU cores might.
In many respects, the chip package is becoming a complete system unto itself. Rather than spreading discrete components like memory controllers, CPUs, and GPUs across a motherboard, they can all be packaged together and communicate over a low-power, high-bandwidth fabric.
This is one of the goals behind the Universal Chiplet Interconnect Express (UCIe) consortium, which is developing a standard interface to do just that. But before you get too excited about the prospect of stitching an AMD GPU to an Intel CPU or vice versa, there are still a lot of problems that need to be solved first, according to Koeter.
The reality is multi-die parts like AMD's MI300 or Intel's GPU Max series represent a pretty small fraction of semiconductor designs today — by Koeter's estimate only about 10 percent. "The people doing multi guide designs today are system companies or semiconductor companies that are owning all the different components," he said.
In other words, these companies can tune each component to ensure everything works together and one component doesn't introduce unexpected results. While Intel, AMD and other large chipmakers can get away with this, some hope to side step this headache by establishing a chiplet marketplace where companies can pick and choose components from various vendors.
Today, most multi-die applications are centered around high-performance computing, AI, and networking, but Koeter expects that to change rapidly over the next couple of years as new use cases arise. The adoption of multi-die architectures by the automotive industry is one area he is particularly optimistic about.
- AMD's 128-core Epycs could spell trouble for Ampere Computing
- Intel abandons XPU plan to cram CPU, GPU, memory into one package
- Unless things change, first zettaflop systems will need nuclear power, AMD's Su says
- Intel ships multi-die chips ahead of schedule – to the US military
Before that can happen, the industry as a whole has to overcome some hurdles, Koeter said. "One challenge is simply having a common language that describes all the different components."
And even if you can get everyone to agree on what to call these components and what they should do, you can still run into situations where a chiplet might not function as intended.
"So you get a known good die…, you integrate it into this package, and all of a sudden it doesn't work correctly," Koeter said. "Who's responsible for that?"
It could be a problem with the chiplet, how it was manufactured, or how it was integrated into the package, he explained. Making matters worse, flaws in any one of these processes may not be immediately apparent.
"When you're talking about having multiple dies crammed into a package, you're really going to want to look at what the yield and the reliability is during tests, but also during the field deployment," he said. "You really want to have the module raising its hand and saying, 'Hey, one of the chips is going bad right now,' and you want it to do that before it becomes a failure in the field."
This means developing new testing and debug capabilities for multi-die designs — something Synopsys is unsurprisingly already working on, as are others.
There are also no shortage of technical challenges to overcome. According to Koeter, one of the more pressing ones is memory coherency. To take full advantage of a chiplet architecture, they need to be able to address the same cache and memory to prevent the unnecessary copying from one chip to the other.
This is one of the problems that UCIe, which we mentioned earlier, looks to address. The industry group is developing an open interface standard based on Compute Express Link for heterogeneous chiplet architectures. In other words, it's a common language that the chiplets can use to talk to each other.
The Open Compute Project is also exploring a similar interconnect standard called Bunch of Wires. However, it doesn't appear to be anywhere near as popular as UCIe, which has already gained support from major chipmakers including AMD, Arm, Qualcomm, Intel, and Synopsys to name a handful.
While progress is being made toward making heterogeneous chiplet designs not only possible but commonplace, Koeter says it's going to be a while before they're ready for prime time.
"A true chiplet marketplace is probably a few years out from being a reality," he said. "Some people think it'll be much faster than that, but I think there are real-world problems that need to be worked out that are industry-wide problems." ®