How CXL may change the datacenter as we know it
Bye-bye bottlenecks. Hello composable infrastructure?
Interview Compute Express Link (CXL) has the potential to radically change the way systems and datacenters are built and operated. And after years of joint development spanning more than 190 companies, the open standard is nearly ready for prime time.
For those that aren’t familiar, CXL defines a common, cache-coherent interface for connecting CPUs, memory, accelerators, and other peripherals. And its implications for the datacenter are wide ranging, Jim Pappas, CXL chairman and Intel director of technology initiatives, tells The Register.
So with the first CXL-compatible systems expected to launch later this year alongside Intel’s Sapphire Rapids Xeon Scalables and AMD’s Genoa forth-gen Epycs, we ask Pappas how he expects CXL will change the industry in the near term.
Composable memory infrastructure
According to Pappas, one of the first implementations for CXL will likely involve system memory. Until now, there’ve only been two ways to attach more memory to an accelerator, he explains. Either you added more DDR memory channels to support more modules, or it had to be integrated directly onto the accelerator or CPU package.
“You can’t put memory on the PCIe bus,” but with CXL you can, Pappas says. “CXL was designed for accelerators, but it was also designed to have a memory interface. We all knew from the very beginning that this could be used as a different port for memory.”
Instead of populating a system with more or larger memory modules, additional memory could be installed via a card using a common interface for PCIe and CXL. And thanks to the simple-switching systems introduced with the CXL 2.0 spec, it became possible for resources, including memory, to be pooled and accessed by multiple systems simultaneously.
It’s important to note that in this configuration, only the resources themselves and not the contents of the memory are shared among the hosts, Pappas emphasizes. “Each region of memory belongs to, at most, one coherency domain. We're not trying to share memory; that becomes much more complex.”
Another use case involves tiered memory architectures in which a system utilizes high-bandwidth memory on the package, a sizable pool of fast DDR5 memory directly attached to the CPU, and a larger pool of slower memory attached via a CXL module.
According to Pappas, memory pooling and tiered memory have implications for datacenter and cloud operators. “The biggest problems that the cloud customers have is their number one expense is memory. Roughly 50 cents of their equipment spend is on memory,” he says.
By pooling that memory, Pappas argues that operators can realize huge cost savings by reducing the amount of memory left sitting idle. And since pooled or tiered memory doesn’t behave any differently than system memory attached to the CPU, applications don’t need to be modified to take advantage of these technologies, Pappas says. If the application “asks for more memory, now there is essentially an infinite supply.”
This technology isn't theoretical either. Memory pooling and tiered memory were among several technologies CXL startup Tanzanite Silicon Solutions was working on prior to its acquisition by Marvell Technologies earlier this month.
Marvell believes the technology will prove pivotal to achieving truly composable infrastructure, which, until now, has largely been limited to compute and storage.
Goodbye AI/ML bottlenecks
Pappas also expects CXL to benefit AI/ML workloads by enabling a much more intimate relationship between the CPU, AI accelerator, and/or GPU than is currently possible over PCIe.
At a basic level, the way a CPU interacts with a peripheral, like a GPU, is by sending load/store instructions back and forth in batches over the PCIe bus. CXL eliminates this bottleneck, enabling instructions to be essentially streamed between the accelerator and the host.
“It’s very similar to what happens in a dual-processor system where the caches remain coherent across processors. We’re extending that down to accelerators,” Pappas says.
- Can you compose memory across a HPC cluster? Yes. Yes you can
- Compute Express Link glue that binds processors and accelerators hits spec version 2.0... so, uh, rejoice?
- Samsung unveils 512GB DRAM CXL module in E3.S form factor
- Why Marvell bought interconnect upstart Tanzanite
Extending this kind of cache coherency to accelerators other than CPUs is by no means easy or a new idea.
Intel and others have tried and failed in the past to develop a standardized interconnect for accelerators, he tells us. Part of the problem is the complexity associated with these interconnects is shared between the components, making it incredibly difficult to extend them to third parties.
“When we at Intel tried to do this, it was so complex that almost nobody, essentially nobody, was ever able to really get it working,” Pappas reveals. With CXL, essentially all of the complexity is contained within the host CPU, he argues.
This asymmetric complexity isn’t without trade-offs, but Pappas reckons they're more than worth it. These come in the form of application affinity, specifically which accelerator gets priority access to the cache or memory and which has to play second fiddle.
This is mitigated somewhat, Pappas claims, by the fact that customers will generally know which regions of memory the accelerator is going to access versus those accessed by the host. Users will be able to accommodate by setting a bias in the bios.
The CXL standard is by no means finished. The CXL Consortium is expected to publish the 3.0 spec later this year.
The update includes a bump from 32 gigatransfers per second to 64, inline with the planned move to PCIe 6.0., as well as support for a number of new memory usage models, Pappas teases.
The spec also introduces an avenue for implementing CXL’s interconnect technology in a non-asymmetric fashion. This functionality would allow appliances, like GPUs or NICs, to interact directly with other CXL devices, eliminating the CPU as a bottleneck entirely.
“This will be really important as you get multiple accelerators that need to operate consistently,” he says.
Finally, the spec hints at a CXL fabric with the introduction of multi-level switching.
A CXL network fabric will be key to extending the technology beyond the rack level. And there’s reason to believe this could appear in version 3.0 after Gen-Z — not to be confused with the generation of adults born after the turn of the century — donated its coherent-memory fabric assets to the CXL Consortium late last year.
Temper your expectations
As exciting as CXL may be for the future of the datacenter, don’t expect it to be an overnight success. The technology is very much in its infancy with the first generation of compatible systems expected to arrive later this year.
Pappas expects CXL-equipped systems will come in phases, with tiered memory and memory pooling likely being the first mainstream use cases.
“Over this next year, the first round of systems are going to be used primarily for proof of concepts,” he said. “Let's be honest, nobody's going to take a new technology that's never been tried.”
After proof of concepts, Pappas expects at least another year of experimental deployments before the technology eventually starts showing up in production environments. ®