How CXL may change the datacenter as we know it

Bye-bye bottlenecks. Hello composable infrastructure?


Interview Compute Express Link (CXL) has the potential to radically change the way systems and datacenters are built and operated. And after years of joint development spanning more than 190 companies, the open standard is nearly ready for prime time.

For those that aren’t familiar, CXL defines a common, cache-coherent interface for connecting CPUs, memory, accelerators, and other peripherals. And its implications for the datacenter are wide ranging, Jim Pappas, CXL chairman and Intel director of technology initiatives, tells The Register.

So with the first CXL-compatible systems expected to launch later this year alongside Intel’s Sapphire Rapids Xeon Scalables and AMD’s Genoa forth-gen Epycs, we ask Pappas how he expects CXL will change the industry in the near term.

Composable memory infrastructure

According to Pappas, one of the first implementations for CXL will likely involve system memory. Until now, there’ve only been two ways to attach more memory to an accelerator, he explains. Either you added more DDR memory channels to support more modules, or it had to be integrated directly onto the accelerator or CPU package.

“You can’t put memory on the PCIe bus,” but with CXL you can, Pappas says. “CXL was designed for accelerators, but it was also designed to have a memory interface. We all knew from the very beginning that this could be used as a different port for memory.”

Instead of populating a system with more or larger memory modules, additional memory could be installed via a card using a common interface for PCIe and CXL. And thanks to the simple-switching systems introduced with the CXL 2.0 spec, it became possible for resources, including memory, to be pooled and accessed by multiple systems simultaneously.

It’s important to note that in this configuration, only the resources themselves and not the contents of the memory are shared among the hosts, Pappas emphasizes. “Each region of memory belongs to, at most, one coherency domain. We're not trying to share memory; that becomes much more complex.”

Another use case involves tiered memory architectures in which a system utilizes high-bandwidth memory on the package, a sizable pool of fast DDR5 memory directly attached to the CPU, and a larger pool of slower memory attached via a CXL module.

According to Pappas, memory pooling and tiered memory have implications for datacenter and cloud operators. “The biggest problems that the cloud customers have is their number one expense is memory. Roughly 50 cents of their equipment spend is on memory,” he says.

By pooling that memory, Pappas argues that operators can realize huge cost savings by reducing the amount of memory left sitting idle. And since pooled or tiered memory doesn’t behave any differently than system memory attached to the CPU, applications don’t need to be modified to take advantage of these technologies, Pappas says. If the application “asks for more memory, now there is essentially an infinite supply.”

This technology isn't theoretical either. Memory pooling and tiered memory were among several technologies CXL startup Tanzanite Silicon Solutions was working on prior to its acquisition by Marvell Technologies earlier this month.

Marvell believes the technology will prove pivotal to achieving truly composable infrastructure, which, until now, has largely been limited to compute and storage.

Goodbye AI/ML bottlenecks

Pappas also expects CXL to benefit AI/ML workloads by enabling a much more intimate relationship between the CPU, AI accelerator, and/or GPU than is currently possible over PCIe.

At a basic level, the way a CPU interacts with a peripheral, like a GPU, is by sending load/store instructions back and forth in batches over the PCIe bus. CXL eliminates this bottleneck, enabling instructions to be essentially streamed between the accelerator and the host.

“It’s very similar to what happens in a dual-processor system where the caches remain coherent across processors. We’re extending that down to accelerators,” Pappas says.

Extending this kind of cache coherency to accelerators other than CPUs is by no means easy or a new idea.

Intel and others have tried and failed in the past to develop a standardized interconnect for accelerators, he tells us. Part of the problem is the complexity associated with these interconnects is shared between the components, making it incredibly difficult to extend them to third parties.

“When we at Intel tried to do this, it was so complex that almost nobody, essentially nobody, was ever able to really get it working,” Pappas reveals. With CXL, essentially all of the complexity is contained within the host CPU, he argues.

This asymmetric complexity isn’t without trade-offs, but Pappas reckons they're more than worth it. These come in the form of application affinity, specifically which accelerator gets priority access to the cache or memory and which has to play second fiddle.

This is mitigated somewhat, Pappas claims, by the fact that customers will generally know which regions of memory the accelerator is going to access versus those accessed by the host. Users will be able to accommodate by setting a bias in the bios.

What’s next?

The CXL standard is by no means finished. The CXL Consortium is expected to publish the 3.0 spec later this year.

The update includes a bump from 32 gigatransfers per second to 64, inline with the planned move to PCIe 6.0., as well as support for a number of new memory usage models, Pappas teases.

The spec also introduces an avenue for implementing CXL’s interconnect technology in a non-asymmetric fashion. This functionality would allow appliances, like GPUs or NICs, to interact directly with other CXL devices, eliminating the CPU as a bottleneck entirely.

“This will be really important as you get multiple accelerators that need to operate consistently,” he says.

Finally, the spec hints at a CXL fabric with the introduction of multi-level switching.

A CXL network fabric will be key to extending the technology beyond the rack level. And there’s reason to believe this could appear in version 3.0 after Gen-Z — not to be confused with the generation of adults born after the turn of the century — donated its coherent-memory fabric assets to the CXL Consortium late last year.

Temper your expectations

As exciting as CXL may be for the future of the datacenter, don’t expect it to be an overnight success. The technology is very much in its infancy with the first generation of compatible systems expected to arrive later this year.

Pappas expects CXL-equipped systems will come in phases, with tiered memory and memory pooling likely being the first mainstream use cases.

“Over this next year, the first round of systems are going to be used primarily for proof of concepts,” he said. “Let's be honest, nobody's going to take a new technology that's never been tried.”

After proof of concepts, Pappas expects at least another year of experimental deployments before the technology eventually starts showing up in production environments. ®


Other stories you might like

  • AMD touts big datacenter, AI ambitions in CPU-GPU roadmap
    Epyc future ahead, along with Instinct, Ryzen, Radeon and custom chip push

    After taking serious CPU market share from Intel over the last few years, AMD has revealed larger ambitions in AI, datacenters and other areas with an expanded roadmap of CPUs, GPUs and other kinds of chips for the near future.

    These ambitions were laid out at AMD's Financial Analyst Day 2022 event on Thursday, where it signaled intentions to become a tougher competitor for Intel, Nvidia and other chip companies with a renewed focus on building better and faster chips for servers and other devices, becoming a bigger player in AI, enabling applications with improved software, and making more custom silicon.  

    "These are where we think we can win in terms of differentiation," AMD CEO Lisa Su said in opening remarks at the event. "It's about compute technology leadership. It's about expanding datacenter leadership. It's about expanding our AI footprint. It's expanding our software capability. And then it's really bringing together a broader custom solutions effort because we think this is a growth area going forward."

    Continue reading
  • Intel is running rings around AMD and Arm at the edge
    What will it take to loosen the x86 giant's edge stranglehold?

    Analysis Supermicro launched a wave of edge appliances using Intel's newly refreshed Xeon-D processors last week. The launch itself was nothing to write home about, but a thought occurred: with all the hype surrounding the outer reaches of computing that we call the edge, you'd think there would be more competition from chipmakers in this arena.

    So where are all the AMD and Arm-based edge appliances?

    A glance through the catalogs of the major OEMs – Dell, HPE, Lenovo, Inspur, Supermicro – returned plenty of results for AMD servers, but few, if any, validated for edge deployments. In fact, Supermicro was the only one of the five vendors that even offered an AMD-based edge appliance – which used an ageing Epyc processor. Hardly a great showing from AMD. Meanwhile, just one appliance from Inspur used an Arm-based chip from Nvidia.

    Continue reading
  • AMD’s AI strategy comes into view with Xilinx, GPU, software plans
    Chip designer hopes to have broad inference and training coverage from the edge to the cloud

    Analysis After re-establishing itself in the datacenter over the past few years, AMD is now hoping to become a big player in the AI compute space with an expanded portfolio of chips that cover everything from the edge to the cloud.

    It's quite an ambitious goal, given Nvidia's dominance in the space with its GPUs and the CUDA programming model, plus the increasing competition from Intel and several other companies.

    But as executives laid out during AMD's Financial Analyst Day 2022 event last week, the resurgent chip designer believes it has the right silicon and software coming into place to pursue the wider AI space.

    Continue reading
  • AMD bests Intel in cloud CPU performance study
    Overall price-performance in Big 3 hyperscalers a dead heat, says CockroachDB

    AMD's processors have come out on top in terms of cloud CPU performance across AWS, Microsoft Azure, and Google Cloud Platform, according to a recently published study.

    The multi-core x86-64 microprocessors Milan and Rome and beat Intel Cascade Lake and Ice Lake instances in tests of performance in the three most popular cloud providers, research from database company CockroachDB found.

    Using the CoreMark version 1.0 benchmark – which can be limited to run on a single vCPU or execute workloads on multiple vCPUs – the researchers showed AMD's Milan processors outperformed those of Intel in many cases, and at worst statistically tied with Intel's latest-gen Ice Lake processors across both the OLTP and CPU benchmarks.

    Continue reading
  • AMD to end Threadripper Pro 5000 drought for non-Lenovo PCs
    As the House of Zen kills off consumer-friendly non-Pro TR chips

    A drought of AMD's latest Threadripper workstation processors is finally coming to an end for PC makers who faced shortages earlier this year all while Hong Kong giant Lenovo enjoyed an exclusive supply of the chips.

    AMD announced on Monday it will expand availability of its Ryzen Threadripper Pro 5000 CPUs to "leading" system integrators in July and to DIY builders through retailers later this year. This announcement came nearly two weeks after Dell announced it would release a workstation with Threadripper Pro 5000 in the summer.

    The coming wave of Threadripper Pro 5000 workstations will mark an end to the exclusivity window Lenovo had with the high-performance chips since they launched in April.

    Continue reading
  • Intel says Sapphire Rapids CPU delay will help AMD catch up
    Our window to have leading server chips again is narrowing, exec admits

    While Intel has bagged Nvidia as a marquee customer for its next-generation Xeon Scalable processor, the x86 giant has admitted that a broader rollout of the server chip has been delayed to later this year.

    Sandra Rivera, Intel's datacenter boss, confirmed the delay of the Xeon processor, code-named Sapphire Rapids, in a Tuesday panel discussion at the BofA Securities 2022 Global Technology Conference. Earlier that day at the same event, Nvidia's CEO disclosed that the GPU giant would use Sapphire Rapids, and not AMD's upcoming Genoa chip, for its flagship DGX H100 system, a reversal from its last-generation machine.

    Intel has been hyping up Sapphire Rapids as a next-generation Xeon CPU that will help the chipmaker become more competitive after falling behind AMD in technology over the past few years. In fact, Intel hopes it will beat AMD's next-generation Epyc chip, Genoa, to the market with industry-first support for new technologies such as DDR5, PCIe Gen 5 and Compute Express Link.

    Continue reading

Biting the hand that feeds IT © 1998–2022