Software

AI + ML

Nvidia opens up speedy NVLink interconnect to custom CPUs, ASICs

One of the two just needs to be made by Nv


Computex Nvidia has opened the NVLink interconnect tech used to stitch its rack-scale compute platforms together to the broader ecosystem with the introduction of NVLink Fusion at Computex this week.

If you're not familiar, Nvidia's NVLink is a high-speed interconnect which enables multiple GPUs in a system or rack to behave like a single accelerator with shared compute and memory resources.

In its current generation, Nvidia's fifth-gen NVLink fabrics can support up to 1.8 TB/s of bandwidth (900 GB/s in each direction) per GPU for up to 72 GPUs per rack. Until now this interconnect fabric has been limited to Nvidia GPUs and CPUs.

NVLink Fusion means the GPU will allow semi-custom accelerator designs to take advantage of the high-speed interconnect - even for non-Nvidia-designed accelerators.

According to Dion Harris, senior director of HPC, Cloud, and AI at Nvidia, the technology will be offered in two configurations. The first will be for connecting custom CPUs to Nvidia GPUs.

As we mentioned earlier, the advantage of using NVLink for CPU-to-GPU communications is that it offers 14x higher bandwidth compared to PCIe 5.0 (128 GB/s).

The second, and perhaps more surprising, configuration involves using NVLink to connect its Grace and, in the future, Vera CPUs to non-Nvidia accelerators.

This can either be achieved by integrating the NVLink IP into your design or via an interconnect chiplet packaged alongside a supported XPU.

In theory, this should open the door to superchip-style compute assemblies that feature any combination of CPUs and GPUs from the likes of Nvidia, AMD, Intel, and others, but only so long as Nv is involved. You couldn't, for example, connect an Intel CPU to an AMD GPU using NVLink Fusion. Nvidia isn't opening the interconnect standard entirely, and if you want to use its interconnect with your ASIC, then you'll be using its CPU, or vice versa.

Of course, all of this depends on chipmakers extending support for NVLink Fusion in the first place. From a design standpoint, MediaTek, Marvell, AIchip, Astera Labs, Synopsys and Cadence, have committed to supporting the interconnect. Fujitsu and Qualcomm, meanwhile, plan to build custom CPUs using the tech.

Neither Intel or AMD are on the list, just yet, and they may never be. Both companies have thrown their weight behind the Ultra Accelerator Link standard, an open alternative to NVLink for scale up networks.

The Ultra Accelerator Link Consortium published the first specification UALink 200G for the interconnect fabric last month, and currently caps out at 200 Gbps or about 50 GB/s of bidirectional bandwidth per lane to up to 1,024 accelerators.

While that might sound like a major downgrade compared to NVLink, it's important to remember that Nvidia's interconnect achieves 1.8TB/s by aggregating multiple slower lanes together. The same is possible using UALink, though as our sibling The Next Platform has previously explored, the spec is more flexible as to how lanes and ports are aggregated to achieve the desired bandwidth.

Nvidia stitches together GPU bit-barns with DGX Cloud Lepton

Also at this year's Computex mega-event in Taipei, Nvidia also lifted the veil on its DGX Cloud Lepton offering.

In a nutshell, the platform is a marketplace for deploying workloads across any number of GPU bit-barns that have agreed to rent out their compute on Lepton.

Alexis Bjorlin, VP of DGX Cloud at Nvidia, likens Lepton to a ridesharing app, but rather than connecting riders to drivers, it connects developers to GPUs.

At launch — the platform is currenly in early access — Nvidia says CoreWeave, Crusoe, Firmus, Foxconn, GMI Cloud, Lambda, Nscale, SoftBank, and Yotta have agreed to make "tens of thousands of GPUs" available for customers to deploy their workloads on.

Naturally, this being "DGX Cloud," the GPU giant is taking the opportunity to push its suite of Nvidia Inference Microservices (NIMs), blueprints, and cloud functions.

If any of this sounds familiar, Nvidia wouldn't be the first to try something like this. For example, Akash Network launched its decentralized compute marketplace, back in 2020, and by 2023 90 percent of the company's business was driven by GPUs rentals. ®

Send us news
Post a comment

Rack-scale networks are the new hotness for massive AI training and inference workloads

Terabytes per second of bandwidth, miles of copper cabling, all crammed into the back of a single rack

CoreWeave's $9B Core Scientific acquisition is a bid for more power

All the GPUs in the world aren't worth much if you don't have a place to put them

AI agents get office tasks wrong around 70% of the time, and a lot of them aren't AI at all

More fiction than science

EU rattles its purse and AI datacenter builders come running

176 expressions of interest to erect 'gigafactories' across 16 member states, with 3 million GPUs needed

Swiss boffins just trained a 'fully open' LLM on the Alps supercomputer

Source code and weights coming later this summer with an Apache 2.0 bow on top

With OpenAI, there are no allegiances - just compute at all costs

Google's TPUs might not be on Altman's menu just yet, but he's never been all that picky about hardware

IBM boasts new Power11 chips are stingy on power usage

More efficient cores plus an optional energy saver mode in Big Blue's latest CPUs

Fed chair Powell says AI is coming for your job

AI will make 'significant changes' to economy, labor market

EU businesses want a pause on AI regulations so they can cope with unregulated Big Tech players

Mistral fears continental companies may not get time to escape 'distant, behemoth corporations'

C-suite sours on AI despite rising investment, survey finds

Akkodis report suggests people skills may be helpful to bring out the best in AI

AI scores a huge own goal if you play up and play the game

A virtual environment makes a great de-hype advisor

Figma files for an (A)IPO with prospectus that mentions AI 150+ times

Warns investors its codebase is harder to maintain as it bakes in brainboxes