Nvidia buys Arm ... Neoverse V2 CPU cores for Grace chip
GPU giant still thinks Softbank-owned designs are worth the RISC
Arm says Nvidia’s Grace processor will be among the first chips to use its upcoming Neoverse V2 CPU cores.
Unlike the Neoverse V1, which was announced more than a year before silicon using the CPU cores actually arrived, we won’t have to wait long for the first V2-powered systems to hit the market, or so we were told during a press conference Wednesday.
Chips using Arm's datacenter-class V2 cores are apparently already in the late stages of development and, in the case of Nvidia’s Grace, are in manufacturing. Nvidia has licensed the V2 from Arm to use in its processors.
As a reminder, announced at GTC Spring 2022, Nvidia’s Grace features 72 Armv9 processor cores and support for DDR5 memory. Two Grace dies are melded into a single Grace CPU SuperChip using the Nvidia's 900 GB/s NVLink-C2C interconnect for a total of 144 cores and 1TB of onboard memory in the package. We now know those cores will be Arm Neoverse V2.
Nvidia, which tried and failed to buy Arm at one point, is also preparing to ship a CPU-GPU SuperChip that melds a Grace CPU die with an GH100 GPU die using the same NVLink interconnect.
“We were very impressed with the V2 SPECfp and SPECint performance,” Ian Buck, VP of accelerated computing at Nvidia, said during the press event. “Also its power performance is excellent. Its performance-per-watt of a core itself, combined with Grace's LPDDR memory is showing upwards of two times the perf-per-watt performance compared to alternatives.”
Arm’s Neoverse V-series evolves
Beyond the HPC and AI-centric applications Nvidia is targeting, Arm is finding broader application of its Neoverse cores among hyperscale, cloud, and wireless industries, where partners like Ampere, Marvell, and Amazon have already made substantial inroads.
Introduced in 2018 as Arm’s answer to Intel’s Xeon and AMD's Epyc processor families, the Neoverse family of CPU cores has evolved over the past four years, expanding to suit three distinct use cases: its E-series cores focus on data processing, N on scale out, and V on performance.
- Nvidia teases server designs for Grace-Hopper Superchips
- Nvidia reveals 144-core Arm-based Grace 'CPU Superchip'
- Arm sues Qualcomm over custom Nuvia CPU cores, wants designs destroyed
- Official: Arm-based VMs available on Microsoft Azure
Arm’s Neoverse V2 is the successor to the Softbank-owned outfit's first performance-optimized core architecture launched last spring. Arm is claiming higher per-thread performance at half the power consumption of its x86 contemporaries.
“One thing that we benefit from is that we’re not building standard products for legacy markets,” Dermot O’Driscoll, VP of product solutions at Arm, said. “We engage in close collaboration with key cloud, HPC, and wireless infrastructure players, so we really understand their workloads and challenges.”
Architectural improvements power vague performance claims
According to O’Driscoll, the guiding principle behind V2 was improved performance for cloud and single-thread workloads while balancing power consumption, and to ship it as quickly as possible.
“Neoverse V2 will deliver market-leading integer performance,” O’Driscoll added.
When pressed for firm performance data, Arm declined to share details beyond a vague scatter plot projecting Nvidia's V2-powered chips will outperform x86 stalwarts’ next-gen parts slated for release this fall.
The graph also indicates the V2 Grace outperforms the AWS Graviton3 Arm-compatible chip, believed to use modified licensed Neoverse V1 cores, and the Alibaba Yitian 710 that has 128 Armv9 cores.
At GTC in March, Nvidia claimed a single Grace CPU die had achieved an estimated score of 740 on the SPECrate 2017_int_base benchmark used to measure CPU performance.
This would make the chip about 50 percent faster than the Zen 2-based AMD Epyc 7742 used in Nvidia’s DGX A100 systems.
Despite the vague performance-per-watt claims, Arm did disclose some architectural improvements made to the V2 core that may have contributed to these projections.
These include twice the L2 private cache at 2MB, a shift to Arm’s second-gen Scalable Vector Extensions (SVE2), and a vector engine with four lanes at 128 bits. V2 will also support substantially larger caches, up to 512MB per die.
Refreshed Neoverse N, E cores on the way
In addition to Arm’s performance-focused V2 cores, the British chip designer also has next-generation N- and E-series chips in the works. While details are slim – and you can find some analysis here on The Next Platform – Arm says the third iteration of its N-series core designs are in development and will be available to partners later next year.
While Arm’s V-series cores are designed to push the limits of performance, the N-series is designed for applications where thread count outweighs single-threaded performance. How much more efficient Arm’s next N core — presumably to be called N3 — will be remains to be seen. Arm has only committed to the vaguely worded “general increase” over N2.
Arm’s E-series cores are also getting a refresh in the near future. These E cores are designed for data plane processing applications, such as 5G RAN, edge networking, and other forms of acceleration.
You may note from the above diagram there is no actual E2 core: it's actually a combination of Arm's Cortex-A510 CPU cores and the CMN-700 interconnect mesh.
The E2's successor is apparently already in development, though Arm has yet to share details on availability. ®