UALink debuts its first AI interconnect spec – usable in just 18 short months
No-Nvidia networking club is banking on you running different GPUs on one network
The Ultra Accelerator Link Consortium has delivered its first GPU interconnect specification: UALink 200G 1.0.
The Consortium (UAC) was formed in May 2024 by a group of vendors – among them AMD, AWS, Broadcom, Cisco, Google, HPE, Intel, Meta, Microsoft, and Astera Labs – who think the world needs an open alternative to Nvidia's NVLink tech that allows creation of the networked GPU clusters needed to run AI workloads at scale.
Members aren't just advancing the cause of open standards. Nvidia's networking business won over $13 billion in revenue in its last financial year, and the GPU giant has signaled its intention to grow that business. UALink's members fancy creating a cheaper alternative they can control and deploy themselves at hyperscale, or profit from by creating hardware the rest of us buy.
They also think the world is ready for a networking standard that can be applied to GPUs from multiple vendors rather than requiring users to create network silos dedicated to each accelerator vendor.
To satisfy those goals, the UAC also wants to work over the Ethernet networks most orgs already operate.
UALink 200G 1.0, as the name implies, enables a 200 Gbps (gigabits per second) connection to an accelerator. It can also quadruple that speed by allowing four connections to each GPU.
The spec allows creation of compute pods packing 1,024 accelerators and achieving what the consortium describes as "the same raw speed as Ethernet with the latency of PCIe switches."
All while consuming somewhere between a third and a half of a typical Ethernet network.
- Nvidia CEO brushes off Big Tech's attacks on NVLink network tech
- Nvidia's Vera Rubin CPU, GPU roadmap charts course for hot-hot-hot 600 kW racks
- China’s tech giants deliver chips for Ethernet variant tuned to HPC and AI workloads
- xAI picked Ethernet over InfiniBand for its H100 Colossus training cluster
That's a lot to get done in under a year, but UAC didn't start from scratch. Chair and director Kurtis Bowman told The Register the spec draws very heavily on AMD's existing Infinity Fabric product.
"We were able to build on that [Infinity Fabric]," he told The Register, but also used tech from other UAC members who have cooked their own networking stacks to address their own needs.
"Intel, Google, and Microsoft said 'we have challenges in our datacenters and we need you to address that,'" Bowman said.
He admitted it will be around 18 months before compliant hardware goes on sale but thinks that's six months less than is typically required to turn a spec into product. Bowman thinks the likes of HPE, Dell, and Lenovo will adopt the spec and deliver AI solutions that employ it, as will the likes of Broadcom and Synopsys as they create custom accelerators for hyperscale customers.
Work on a second spec is already underway to take advantage of 400G Ethernet variants as they go mainstream. ®