Nvidia turns to optical trickery to boost long-haul InfiniBand performance
The platform launches alongside new frameworks for AI preprocessing at the edge
SC22 Nvidia says users can now extend their 400Gbps InfiniBand networks even farther with the launch of its MetroX-3 long-haul system, which boosts the range of its Quantum-2 switches to 25 miles or 40 kilometers.
Nvidia sees two major use cases for the tech: the first is high-speed workload migration between physically disparate computing centers, while the other involves pooling those resources to tackle larger problems.
While it's been possible to interconnect two datacenters over InfiniBand in the past using the MetroX-2 platform acquired from Mellanox, the appliance was limited to a pair of 100Gbps uplinks. By comparison, the third iteration of Nvidia's MetroX platform adds support for a pair of 100Gbps dense wave division multiplex (DWDM) modules. The technology allows for substantially higher bandwidths by muxing multiple 100Gbps signals onto a single fiber.
"This really allows your extended campus clusters and your core datacenter to behave as a single-unit — a single datacenter," Dion Harris, head of datacenter product marketing at Nvidia, said during a press briefing.
The approach isn't without compromise. MetroX-3 is clearly an ecosystem play. It's designed to work with Nvidia's InfiniBand ecosystem of Quantum-2 switches, ConnectX-7 NICs, and/or BlueField data processing units (DPUs). That means if you're already using something like HPE's Ethernet-based Slingshot interconnects or even Nvidia's own Spectrum switches, MetroX-3 isn't for you.
- Nvidia sets out timeline for H100 GPUs – now for HGX, next year for DGX
- Edge compute, AI on track for meteoric growth – or so these predictions say
- Aurora delays keep Frontier supercomputer in #1 spot on Top500
- How AMD, Intel, Nvidia are keeping their cores from starving
Assuming you do live within Nvidia's walled garden, or want to expand an existing InfiniBand environment to a new location, there are also performance concessions that need to be taken into account. While DWDM allows for massive aggregate bandwidths over a single fiber, the technology is limited to relatively short runs in the neighborhood of 40 to 80 kilometers. As is typical with optics, what you gain in distance you give up on effective bandwidth and vice versa.
The move to DWDM does present a cost advantage. According to Dell'Oro analyst Jimmy Yu, the more you be can pack onto a single fiber strand, the less you need to spend on fiber leases to achieve a given amount of bandwidth.
And because DWDM is already widely employed by major telcos AT&T and Lumen Technology, Nvidia says customers can now tap into existing fiber infrastructures, rather than needing dedicated fiber connectivity.
Cutting through the noise
MetroX-3 is part of a broader suite of hardware and software announced by Nvidia at the Supercomputing event this week aimed at addressing the growing volume of streaming data at the edge.
"By creating more high-fidelity research and instrumentation, that means that you're going to have to have a much more efficient way of capturing, analyzing and processing that data," Harris said. When "you're producing 50 to 1,000 times more data, how much do you keep? How much do you move back to the core? How much do you analyze."
In addition to connecting datacenters over InfiniBand, Nvidia is positioning MetroX-3, along with its Quantum-series switches and BlueField DPUs, as a means to extend InfiniBand networks to lab environments where the bulk of data is being generated. By doing so, the company says customers can use its Holoscan HPC framework running on IGX, DGX or HGX platforms at the edge to sift out meaningful data from the noise before funneling that refined dataset back to the core datacenter.
Initially launched alongside Nvidia's IGX robotics and edge compute platform this fall, Holoscan AI inference was for medical imaging. However, the platform has since been repurposed for use on a variety of streaming data formats including non-image formats. Holoscan has also been reworked to support C++ and Python APIs, which Nvidia says researchers can use to develop custom data pipelines around their workflows. ®