Does AI give InfiniBand a moment to shine? Or will Ethernet hold the line?

You could go all-in on Nvidia for the lower latency. Or tough it out with less exotic kit and tolerate slower training

Growing demand for AI will see the datacenter switching market grow by 50 percent, according to Dell'Oro analyst Sameh Boujelbene, who has also predicted considerable innovation in the switching arena.

Boujelbene estimates that AI systems currently account for "significantly less than 10 percent" of the total addressable market for network switching, and, of that, about 90 percent of deployments are using Nvidia/Mellanox's InfiniBand — not Ethernet. Those deployments propelled Nvidia's networking revenue to $10 billion a year and made it the second-biggest player in the field, ahead of Juniper and Arista.

There's a good reason for this: when it comes to AI workloads, bandwidth and latency are king, and InfiniBand is really low latency owing to an architecture that reduces packet loss. By comparison, packet loss is a given in Ethernet networks.

Many applications can cope with packet loss. But the issue slows AI training workloads, and they’re already expensive and time-consuming. This is probably why Microsoft opted to run InfiniBand when building out its datacenters to support machine learning workloads.

InfiniBand, however, tends to lag Ethernet in terms of raw bandwidth ceilings. Nvidia's very latest Quantum InfiniBand switches top out at 51.2 Tb/s with 400 Gb/s ports. By comparison, Ethernet switching hit 51.2 Tb/s nearly two years ago and can support 800 Gb/s port speeds.

In a traditional datacenter, you'd only expect to see kit this fast at the aggregation layer. Your typical server node just isn't going to saturate a 400 Gb/s of bandwidth let alone 100 Gb/s.

AI Clusters, meanwhile, are a different beast entirely. The average AI node comes equipped with one 400 Gb/s NIC per GPU. Such nodes can pack four or eight GPUs – so do the math for NICs – and they’re all needed to handle the immense data flows AI workloads generate.

Boujelbene likens the two competing standards to an interstate highway (Ethernet) with higher speed limits but the potential for collisions that can hold up traffic, and local roads (InfiniBand) which are a little slower, but steer clear of congestion.

While Ethernet technically has a bandwidth advantage, you might be thinking that other bottlenecks, like the PCIe bandwidth available to the NIC, would render this a moot point.

PCIe 5.0 is the best we've got in early 2024. At around 64 GB/s of bidirectional bandwidth, a 16x interface can support a single 400 Gb/s interface.

Some chipmakers, including Nvidia, have used clever integration of PCIe switching into their NICs to improve performance. Rather than hanging both the GPU and NIC off the CPU, the accelerator is daisy chained off the network interface via the PCIe switch. This is how we think Nvidia is going to achieve 800 Gb/s and 1,600 Gb/s networking before PCIe 6.0 or 7.0 hit the market.

By 2025, Dell'Oro projects that the vast majority of switch ports deployed in AI networks will be operating at 800 Gb/s and will double to 1,600 Gb/s by 2027.

Ethernet evolves in the AI age

In addition to higher bandwidth, recent innovations around Ethernet switching have helped to address many of the standard's concessions compared to InfiniBand.

This isn't a surprise to Nvidia, which somewhat ironically has been the loudest proponent of what it's taken to calling lossless Ethernet with the launch of its SpectrumX platform.

InfiniBand is great for those running a handful of very large workloads – like GPT3 or digital twins. But in more dynamic hyperscale and cloud environments, Ethernet is often preferred, Gilad Shainer, VP of marketing for Nvidia's networking division, previously told The Register.

Ethernet’s strengths include its openness and its ability to do a more than decent job for most workloads, a factor appreciated by cloud providers and hyperscalers who either don't want to manage a dual-stack network or become dependent on the small pool of InfiniBand vendors.

Nvidia's SpectrumX portfolio uses a combination of Nvidia's 51.2 Tb/s Spectrum-4 Ethernet switches and BlueField-3 SuperNICs to provide InfiniBand-like network performance, reliability, and latencies using 400 Gb/s RDMA over converged Ethernet (ROCE).

Broadcom has made similar claims across its Tomahawk and Jericho switch line, which use either data processing units to manage congestion or handling this in the top of rack switch with its Jericho3-AI platform, announced last year.

To Broadcom's point, hyperscalers and cloud providers such like AWS have done just that, Boujelbene said. The analyst noted that what Nvidia has done with SpectrumX is compress this work into a platform that makes it easier to achieve low-loss Ethernet.

And while Microsoft has favored InfiniBand for its AI cloud infrastructure, AWS is taking advantage of improving congestion management techniques in its own Elastic Fabric Adapter 2 (EFA2) network to interconnect its 16,384 GH200 compute cluster announced at its Re:Invent conference in late 2023.

While Dell'Oro anticipates that InfiniBand will maintain its lead in AI switching for the foreseeable future, the group predicts that Ethernet will make substantial gains, capturing 20 points of revenue share by 2027, driven in large part by the cloud and hyperscale datacenter operators. ®

More about


Send us news

Other stories you might like