PCIe 7.0 first official draft lands, doubling bandwidth yet again

The downside? You probably won't see kit to use it until 2027

Analysis The PCIe 7.0 spec is on track for release next year and, for many AI chip peddlers trying to push the limits of network fabrics and accelerator meshes, it can't come soon enough.

On Tuesday the PCI SIG consortium that steers the interface's development emitted version 0.5 of PCIe 7.0, and hailed it as the official first draft of the specification. The blueprint calls for 128GT/s per lane of raw throughput, continuing the generational doubling that we've come to expect from the peripheral component interconnect standard.

This higher performance will enable up to 512GB/s of bidirectional bandwidth from an x16 slot. That's compared to the 256GB/s that PCIe 6.0 devices will be capable of pushing when they start hitting the market later this year.

Other improvements coming with PCIe 7.0 include optimizations for power efficiency, latency, and reach. The third point is important because as bandwidth capacity increases, the distance signals can travel gets shorter. Retimers can be used to clean up and extend the signal, but they do add latency. This is why we tend to see at least one retimer per accelerator on modern GPU systems.

With that said, the real advantage of PCIe 7.0 spec is still bandwidth. While application processors supporting PCIe 6.0 haven't even hit the market, AI equipment vendors are already pushing the limits of the current spec. A PCIe 6.0 x16 slot provides just enough bandwidth to support a single 800Gb/s NIC.

This is a problem for AI hardware slingers trying to scale their systems faster. Intel, for example, bypassed this whole issue by baking Ethernet networking directly into its Gaudi accelerators. These connections are used for both chip-to-chip and node-to-node communications.

Nvidia, meanwhile, has taken to packing PCIe switches into its NICs to overcome bottlenecks and lane limitations on modern CPU chipsets. We're told its latest ConnectX-8 cards introduced at GTC last month will feature more than 32 lanes of PCIe 6.0. This was done to prevent system processors, which have a limited number of PCIe lanes and don't yet support PCIe 6.0, from bottlenecking communications between the GPU and the rest of the network.

However, Nvidia isn't stopping at 800G. The introduction of 200G Serializer/Deserializers in late 2023 opened the door to 102.4Tb/s switches supporting 1.6Tb/s ports. Nvidia's roadmap plans the release of networking gear capable of these 1TE-plus speeds using 200G SerDes beginning in 2025. However, taking advantage of them will require faster NICs with more PCIe bandwidth.

PCIe 7.0 would do the trick, but if the PCIe 6.0 ramp tells us anything it might not arrive in time. It's been two years since the PCIe 6.0 spec was finalized and we're only now starting to see products take advantage of it. This suggests it could be 2027 before the first PCIe 7.0 kit hits the market in volume, assuming the specification is officially issued in 2025 as anticipated.

While it appears PCIe 7.0 won't arrive in time for Nvidia's purposes, it will open the door to some of Compute Express Link's (CXL) more interesting applications.

The cache-coherent interconnect tech arrived with AMD's 4th-Gen Epyc and Intel's Sapphire Rapids platforms in late 2022 and early 2023. So far it's largely been limited to memory expansion modules from Samsung, Astera Labs, and Micron.

These modules allow additional DDR memory to be added via a PCIe slot, over which the CXL protocol piggybacks. These modules do incur roughly the equivalent of a NUMA hop, but the bigger limitation has to do with memory bandwidth. A PCIe 5.0 x16 slot only offers enough bandwidth for about two lanes of 5,600MT/s DDR5 memory.

However, that's not CXL's only party trick. CXL 2.0 adds support for switching. One application of this would be a memory appliance serving multiple hosts, kind of like a network attached storage server for DDR. CXL 3.0-compatible systems meanwhile adds support for switch fabrics, which should allow peripherals to communicate with each other without the host processor involvement.

All of these features will benefit heavily from PCIe 7.0's higher bandwidth. Having said that, CXL 3.0 and PCIe 7.0 won't be enough to replace interconnect fabrics such as Nvidia's NVLink or AMD's Infinity Fabric, which are capable of 1.8TB/s and 896GB/s respectively, anytime soon.

For that, PCI SIG is going to have to do more than just double the spec's gen-on-gen bandwidth every three years. In the meantime, silicon photonics startups such as Lightmatter, Celestial, and Ayar Labs are pushing alternative means of interconnecting peripherals and chiplets using light in a quest for ever more speed. ®

More about

TIP US OFF

Send us news


Other stories you might like