Omni-Path is back on the AI and HPC menu in a new challenge to Nvidia's InfiniBand

After a five-year hiatus, Cornelis' interconnect returns at 400Gbps, with Ethernet support next

Five years after Intel spun off its Omni-Path interconnect tech into Cornelis Networks, its 400Gbps CN5000 line of switches and NICs is finally ready to do battle with its long-time rival, Nvidia's InfiniBand.

This time around, Cornelis isn't just going after supercomputers and HPC clusters. It's looking to get in on the AI boom as well by undercutting Nvidia on price performance.

For those who thought Omni-Path was dead and brain-dumped all memories of it, here's a quick refresher. Initially developed by Intel in 2015, Omni-Path is a lossless interconnect technology, similar in many respects to Nvidia's InfiniBand networking, aimed at high-performance compute applications.

The first Omni-Path switches offered 4.8Tbps of bandwidth across 48 100Gbps ports, and saw deployment in a number of supercomputing platforms, like the Los Alamos National Lab's Trinity system and the Department of Energy's Cori machine.

However, by 2019, Intel had abandoned the project, and it spun off the division as Cornelis Networks in September 2020.

Omni-Path has been around this whole time, it's just been stuck at 100Gbps. Now, Cornelis Networks is emerging from hibernation with a full complement of 400Gbps Omni-Path switches, NICs, and cabling that the company says can support clusters of more than 500,000 endpoints with near-linear performance scaling.

Digging into the CN5000 lineup

Rather than dawdling on, let's dive into what we all really care about: speeds and feeds.

Here's a look at Cornelis Network's single port 400Gbps Omni-Path superNIC

Here's a look at Cornelis Network's single port 400Gbps Omni-Path superNIC - Click to enlarge

First up is Cornelis' CN5000 superNIC. Similar to InfiniBand, today you can't use any NIC with Cornelis' switches and still get the benefits of the Omni-path architecture.

The card will be offered with either one or two 400 Gbps ports — presumably for redundancy, rather than additional bandwidth, as its PCIe 5.0 interface can't actually support more than one port at those speeds — and will have a 15-to-19-watt power draw, depending on whether you opt for air or liquid cooling. (The entire CN5000 line will support both.)

Cornelis' CN5000 switch features 48 ports of 400Gbps connectivity and 19.2Tbps of overall bandwidth

Cornelis' CN5000 switch features 48 ports of 400Gbps connectivity and 19.2Tbps of overall bandwidth - Click to enlarge

The NICs are designed to be paired to one of two CN5000 switches. The first is a 48-port appliance that takes up a single unit in your rack and offers 19.2 Tbps (400Gbps per port) of switching capacity.

The CN5000 is predominately aimed at enterprise AI and HPC deployments. For larger scale deployments, Cornelis' 576 port CN5000 Director class switch offers up to 230.4 Tbps of aggregate bandwidth.

Measuring at more than 600 pounds and between 17 and 19 rack units in height depending on whether you opt for liquid or air cooling, the Director switch is rated for roughly 22 kilowatts of power when fully populated with pluggable optics.

Cornelis' CN5000 Director Switch is a chassis design that features 18 CN5000 switches in a fully integrated 576 port appliance.

Cornelis' CN5000 Director Switch is a chassis design that features 18 CN5000 switches in a fully integrated 576 port appliance. - Click to enlarge

In fact, calling this a switch is a bit of a misnomer. It's really more of a switch chassis with 18 CR5000s, arranged in a two-level topology with 12 leaves and 6 spines.

This has the benefit of simplifying cabling and potentially reducing the number of optical transceivers required to support large-scale deployments.

Alongside its switches and NICs, Cornelis also offers a number of active optical and copper cables.

OmniPath vs InfiniBand

Compared to Nvidia's 400Gbps Quantum-2 InfiniBand and ConnectX-7 NICs, Cornelis promises up to 2x higher messaging rates, 35 percent lower latency, and 30 percent faster simulation times. As with any vendor-supplied benchmarks, take these claims with a grain of salt.

More importantly, Cornelis Networks CEO Lisa Spelman, who you may remember from her time leading Intel's Xeon division, claims the products will undercut Nvidia on price by a significant margin.

While Cornelis claims a performance edge over InfiniBand, its CN5000 switches fall a bit behind on bandwidth, offering about three quarters the number of 400Gbps ports at 48 versus 64.

And that's compared to Nvidia's nearly three-year-old Quantum-2 switches. Nvidia is set to boost port counts to 144 and speeds to 800Gbps with the launch of its Quantum-X800 and Quantum-X photonics platforms later this year.

However, that higher port bandwidth probably isn't as big a deal as it might seem, especially if you're not using Nvidia GPUs. That's because 400Gbps is the fastest you can go on a PCIe 5.0 NIC anyway. The only way around that is to strap a PCIe 6.0 switch to your NIC and hang your GPUs off it. This is exactly what Nvidia has done with its ConnectX-8 NICs.

Having said that, Cornelis expects to make the jump to 800Gbps next year, timed with the launch of the first PCIe 6.0 compatible CPUs from Intel and AMD.

Radix realities

Port count, on the other hand, may end up being a problem for Cornelis' kit depending on the scale of your network.

With just 48 ports, Cornelis' CN5000 isn't a particularly high-radix switch - which is to say, you're going to need a lot of them to support a large-scale HPC or AI training cluster.

While the CN5000 switch was designed for the enterprise, where smaller deployments are likely to be the norm, it can support large-scale environments. The company claims its equipment can support hundreds of thousands of endpoints.

But to network 128,000 GPUs at 400Gbps, we estimate you'd need somewhere in the neighborhood of 13,334 CN5000s in a three-level, non-blocking topology to make it work.

This topology, often referred to as a fat tree, is commonly employed in AI networks, as it offers a nice balance between bandwidth, latency, and congestion management.

But if you wanted to do the same thing using Nvidia's Quantum-2 InfiniBand switches, you'd need only 10,000 of them.

Moreover, if networking scale is your main priority, Ethernet has a clear advantage here. Even though Spelman insists Omni-Path isn't trying to compete with Ethernet, Ethernet is certainly evolving to compete with it.

A 51.2Tbps Ethernet switch, like Broadcom's Tomahawk 5 or Nvidia's Spectrum-4, would only need 5,000 appliances to get the job of networking 128,000 GPUs at 400Gbps done. Broadcom's new Tomahawk 6, which we looked at last week, would be able to accomplish it with half that many. (Although, just like Nvidia's Spectrum-X800, it's going to be a little while before you can get your hands on switches based on Broadcom's latest ASICs, and even when you can, they're likely to cost substantially more than Cornelis' enterprise-focused kit.)

Networking such a colossal cluster isn't exactly easy either, which is no doubt why Cornelis opted to build the CN5000 Director in the first place.

With 576 ports, only 733 of these Director switches would be required for a 128,000 GPU cluster, and would eliminate about a third of runs.

It should be noted that while a fat-tree topology offers a useful point of comparison, it's only one of many employed in AI and HPC clusters today. Which of these will deliver the best ratio of price to performance depends heavily on the application, Spelman notes.

“You have to measure the effectiveness of your network based on the impact it has on your total cluster performance and ultimately your application performance,” she said. If you base your decision on a micro-benchmark or the number of switches required, she argues, you may end up with a network that looks good on paper, but isn’t well optimized for application performance.

"The goal of the network is to accelerate your applications, and that's what we're trying to do. It's not networking for networking's sake," she said.

Smaller, flatter networks require fewer network hops, which reduces latency, which for AI training workloads can make a big difference. However, as Cornelis Networks co-founder Phil Murphy points out, because the company's switches offer so much lower latency than Ethernet or InfiniBand, it can actually get away with having more hops without compromising on latency.

If Cornelis actually manages to undercut InfiniBand to a meaningful degree and the CN5000 can deliver on the company's performance claims, the switches' lower radix may not be as big a deal.

Maybe bigger fatter networks aren't always a bad thing

There's not much point in having a 128,000 GPU cluster if your network prevents you from achieving more than 30-50 percent utilization.

This is the challenge facing Ethernet scale-out fabrics, Spelman said. "Even in the best, most highly tuned environment, you're getting maybe 50-55 percent utilization. So there's so much room there to improve."

For data-heavy AI training workloads, Cornelis claims its Omni-Path offers 6x shorter collective communication times than RDMA over converged Ethernet (RoCE).

When the Ethernet spec was first drafted, high-performance computing or AI training clusters weren't exactly a high priority. One of the challenges in Ethernet fabrics is packet loss. Any time a packet fails to reach its destination, it has to be rebroadcast. This results in higher tail latencies where the accelerators are stuck waiting for the rest of the network to catch up.

AMD has previously estimated that, on average, 30 percent of training time is wasted waiting for the network to catch up.

But things are starting to change. Over the past few years, Ethernet platforms like Broadcom's Tomahawk 5 and 6, Nvidia's Spectrum-X product lines, and AMD's Pensando NICs have evolved to make use of complex packet routing, congestion management, and packet spraying techniques to achieve what they claim are InfiniBand-like levels of performance, loss, and latency.

"GPU utilization on Ethernet networks built with Broadcom silicon is as good as, if not better than, networks built with InfiniBand or OmniPath," Pete Del Vecchio, product line manager for Broadcom's Tomahawk line, told El Reg. "All of the largest GPU clusters being deployed this year - by every major hyperscaler - are using Ethernet."

"It simply isn't credible to suggest that they'd knowingly deploy a network fabric delivering only one-half or one-third the utilization of an alternative," he added.

The road to Ultra Ethernet

In their current form, Cornelis' Omni-Path switches and NICs aren't designed to replace Ethernet, but that won't always be the case.

Starting next year, Cornelis' 800Gbps capable CN6000-series products will introduce cross-compatibility with Ethernet. That means you'll be able to use the company's superNICs with, say, a Broadcom switch, or its switches with something like a Pensando NIC.

At that point, we reckon Cornelis' CN6000 will be somewhat similar to Nvidia's Spectrum-X switches and BlueField superNICs. They'll work with any other Ethernet kit, but they'll perform best when paired together.

"Instead of starting with an Ethernet base and trying to add in all these features or capabilities, we're starting with the Omni-Path architecture and we're adding in Ethernet," Spelman said. "What we've done and created is this adaptation layer that allows our Ethernet to have access to some of those Omni-Path features."

This approach underscores Cornelis' transition to Ultra Ethernet as well. Introduced in 2023, The Ultra Ethernet Consortium (UEC) was founded by industry leaders like AMD, HPE, Arista, Broadcom, and others to modernize the Ethernet protocol for use in HPC and AI applications. Cornelis has been a major supporter of Ultra Ethernet nearly from the beginning.

Two years later, the first Ultra Ethernet-compatible chips are now making their way to market, even as the spec itself remains in its infancy.

"We're going to continue down the journey of integration with Ultra Ethernet, but what we started with first is our baseline architecture that already meets the feature requirements of what UEC has outlined," Spelman explained. "We're not holding up the roadmap to wait for the consortium."

In other words, Omni-Path already works, so they'll add support for Ultra Ethernet when it's ready.

Spelman expects that to happen in 2027 when Cornelis brings its 1.6Tbps compatible CN7000-series switches and NICs to market. ®

More about

TIP US OFF

Send us news


Other stories you might like