Nvidia wants to lure you to the Arm side with fresh server bait

GPU giant promises big advancements with Arm-based Grace CPU, says the software is ready


Interview 2023 is shaping up to become a big year for Arm-based server chips, and a significant part of this drive will come from Nvidia, which appears steadfast in its belief in the future of Arm, even if it can't own the company.

Several system vendors are expected to push out servers next year that will use Nvidia's new Arm-based chips. These consist of the Grace Superchip, which combines two of Nvidia's Grace CPUs, and the Grace-Hopper Superchip, which brings together one Grace CPU with one Hopper GPU.

The vendors lining up servers include American companies like Dell Technologies, HPE and Supermicro, as well Lenovo in Hong Kong, Inspur in China, plus ASUS, Foxconn, Gigabyte, and Wiwynn in Taiwan are also on board. The servers will target application areas where high performance is key: AI training and inference, high-performance computing, digital twins, and cloud gaming and graphics.

While Nvidia has vowed to continue using x86 CPUs from Intel and AMD in the future, the chip designer is hoping to lure datacenter operators and developers to the Arm side with the promise of some major advancements over x86 chips currently in the market.  

These advancements include 144 cores, up to 1TB of error-correcting LPDDR5x memory and as much as 1TB/s of memory bandwidth in a single socket for the Grace Superchip. To let the Superchip's two CPUs communicate, Nvidia is using its 900GB/s NCLink-C2C interconnect tech, which is also being used to connect the CPU and GPU inside the Grace Hopper Superchip.

"What Grace allows us is to push the boundaries of innovations and address the gaps that are there in the market," Paresh Kharya, Nvidia's director of datacenter computing, told The Register.

He claimed the 900GB/s interconnect speed is seven times faster than the PCIe Gen 5 technology that will debut with the upcoming Sapphire Rapids server chips from Intel and Genoa server chips. "There's nothing else out there that matches close to the speed," he said.

Kharya brought some other major claims about the Arm-based Superchips coming from Nvidia, including 2x higher energy efficiency for the memory subsystem thanks to the use of LPDDR5x and 2x faster memory bandwidth compared to systems currently available in the market.

Nvidia has also teased how a system with the Grace Superchip will perform when it comes to CPU-bound tasks: an estimated score of 740 on the SPECrate 2017_int_base benchmark, according, of course, to their own benchmarks. If we go with their numbers, that would make the system 50 percent faster than the CPU capabilities of Nvidia's DGX A100 system, which uses two 64-core AMD Epyc 7742 processors that came out in 2019.

Kharya said Nvidia compared the Grace Superchip to an x86 processor from three years ago because it considers the DGX A100 the "top of the line server" available today for AI applications.

"So we really love all the innovation that comes to the market from x86 CPUs, and we and our customers are able to take advantage of all of that, but at the same time having now Grace in our portfolio, we are able to push the boundaries of innovation and fill in the gaps," he said.

But to take advantage of these capabilities, datacenter operators and developers will need to make a big leap from the comfortable world of x86 systems to the interesting world of Arm servers.

It may seem like a big leap, but Kharya said Nvidia has done a lot of groundwork in partnership with Arm to prepare the server software ecosystem. This started back in 2019, when the GPU giant announced that it would expand support for the CUDA programming model along with its "full stack of AI and HPC software" to Arm-based server CPUs. Since then, Nvidia has made more of its software compatible.

"We announced our CUDA on Arm project a while ago, 2019, and we've been on a constant journey towards that. All of our key stacks support Arm, and these include our AI platform, Nvidia AI, our Omniverse platform for digital twins as well our Nvidia HPC platform. So we're working with the entire ecosystem to ensure readiness," Kharya said.

The company is also making sure Arm-based servers will provide the best possible performance through its Nvidia-Certified Systems program, which already includes GPU servers in the market now that use Ampere Computing's Arm-based Altra chips.

Some organizations have already announced plans to use servers with Nvidia's Grace and Grace Hopper Superchips, including the US Department of Energy's Los Alamos National Laboratory, which will use both chips for its next-generation Venado supercomputer.

But the true test will play out over the next few years as Nvidia tries to convince the datacenter world of Arm's differentiation and as organizations start putting the company's server designs through their paces. ®


Other stories you might like

Biting the hand that feeds IT © 1998–2022