The House of Zen joins the co-packaged optics race with Enosemi buy
Light is faster than copper, which is important with rack scale architecture
Analysis AMD officially entered the co-packaged optics race with the acquisition of photonic chip startup Enosemi, announced this week.
The House of Zen aims to integrate the technology into its next-gen rack-scale systems to better compete with rival Nvidia in the AI arena.
Co-packaged optics offer a number of benefits over copper interconnects or traces, including higher bandwidth, lower latency, and less power consumption.
And as the name suggests, these improvements are typically achieved by packaging a photonics chiplet or interposer alongside the compute die, which carries signals over fiber strands rather than copper traces.
Interest in this technology has surged amid the AI boom as chip designers and systems builders have grappled with the limited reach and bandwidth of conventional copper cabling and the growing power demand of high-performance pluggable optics.
AMD is a little late to the co-packaged optics party. Intel and Broadcom have been playing with the tech for years, while at GTC this spring, Nvidia unveiled a pair of network switches that'll leverage the tech starting later this year.
Lighting the way to rack scale
AMD likely plans to use Enosemi's IP in future rack-scale designs. However, we don't yet know how or where the photonics tech will be integrated.
But executives at the House of Zen have previously discussed integrating photonic chiplets into chips like its MI300-series parts to boost bandwidth.
Modern GPUs often feature extremely high-performance interconnects, like Nvidia's NVLink or AMD Infinity Fabric, and enable a rack full of chips to behave like one great big one. However, for this to work, these interconnects need to shuttle data around at hundreds or even thousands of gigabytes per second.
Because these scale-up interconnects rely on copper traces or cabling, their reach is limited to a couple of feet at most. If you've ever wondered why Nvidia's NVL72 systems' NVLink switches divide the compute blades rather than having them all up top, this is why.
Optical interconnects don't suffer from this limitation. Instead of your scale-up network being limited to a rack, you could have an entire row of GPUs acting as one.
The tricky bit is making the photonics fast enough to justify their higher power consumption.
"You bolt optical on because you want massive bandwidth. So, you need low-energy per bit for that to make sense and in-package chiplets are the way to get the lowest energy interfaces," AMD SVP and Fellow Sam Naffziger explained in a video last year in which he argued that the move to co-packaged optics was "coming."
So unless you really need the bandwidth and the reach, copper may still be preferable.
A CPO power play
This is why Nvidia is sticking with copper interconnects inside its rack-scale systems. Opting for optics would have added another 20 kilowatts to the power budget.
Instead, Nvidia aims to use CPO in the scale-out networks used to stitch multiple HGX GPU nodes or NVL72 racks together into a large-scale cluster for training.
At its GTC conference, the GPU giant teased its next-gen Spectrum Ethernet and Quantum InfiniBand switches, which will ditch pluggable optics for integrated photonics. But instead of longer reach or higher bandwidth, these designs aim to curb the amount of power consumed by the optical pluggables used to convert electrical signals into optical ones and vice versa.
Each of these pluggables can pull between 20W and 40W of power, which adds up pretty quickly when you've got anywhere from 64 to 512 of the things per switch.
Nvidia's designs eliminate the need for these pluggables — at least on the switch side — enabling fiber optic cables to be plugged directly into the front of the switch. This, Nvidia argues, reduces power consumption and eliminates a source of failure.
"With integrated optics, we are reducing the power consumption by almost 3.5x," Gilad Shainer, SVP of Networking at NVIDIA, said ahead of GTC this spring.
Plenty of competition
While Nvidia's first co-packaged optical switches won't hit the market until later this year, Broadcom has had CPO switches in production for years now. The first generation of these were employed by Tencent, but now companies like Micas Networks are offering switches based on Broadcom's 51.2 Tbps Bailly CPO switch platform.
Broadcom is also experimenting with using the tech in scale-up networks. At Hot Chips last year, Broadcom claimed to have copackaged a GPU to an optical chiplet capable of 1.6TB/s of error-free bidirectional bandwidth.
- Some signs of AI model collapse begin to reveal themselves
- Nvidia is cozying up to China with Shanghai R&D lab plans, Senators cry
- Here's what it'll take for Nvidia and other US chipmakers to flog AI chips in China
- AMD puts Intel in rear view mirror with Threadripper Pro 9000 high-end desktop chips
Intel is also exploring the use of CPO in rack-scale systems. During Intel's first quarter earnings call last month, CEO of Products Michelle Johnston Holthaus said she saw "optics as a critical element" for rack scale architectures.
Meanwhile, startups like Celestial AI, Lightmatter, and Ayar Labs continue to push ahead with their own CPO chiplets and optical interposer designs.
But while CPO continues to gain traction among chipmakers, it's still in its infancy, and concerns over its reliability, serviceability, and the overall blast radius associated with such a tightly integrated technology still persist. ®