Arm targets AI performance with latest Neoverse Compute Subsystems
More and more obvious what a key market ML is for the chip designer
Chip designer Arm has unveiled two additional Neoverse Compute Subsystem blueprints in its portfolio, and is working with Samsung on its next high-performance Cortex-X core on the Korean chipmaker's 2nm production process.
Arm introduced its Neoverse Compute Subsystems (CSS) last year, pitching it as a speedier way for customers to produce Arm-based silicon by including more pre-validated components than just the processor cores.
First out of the blocks was the CSS N2, which has since been taken up by Microsoft and incorporated into the Redmond giant's custom Cobalt 100 processor for Azure datacenters.
Now, Arm is adding two new Neoverse Compute Subsystems, the CSS N3 and the CSS V3, which as their names suggest are designed around new N3 and V3 Neoverse cores.
CSS N3 is all about power efficiency, and the first instantiation offers 32 cores with a power envelope of 40W TDP. The chip offers a performance-per-watt uplift of 20 percent over the CSS N2, Arm claims.
The N3 core supports Arm v9.2 features, with 2MB per core private cache and support for PCIe and CXL I/O, as well as the UCIe (Universal Chiplet Interconnect Express) standard for linking chiplets together.
CSS V3 is the first Compute Subsystem based on Arm’s V-Series performance cores. Arm claims this delivers over 50 percent more performance than the CSS N2 product, plus it can scale up to 128 cores per SoC. Arm claims the V3 core used in this is its highest single thread performance Neoverse core ever – at least until the next one.
This CSS supports PCIe 5.0 and CXL 3.0 I/O, and in addition to DDR5 also supports High Bandwidth Memory (HBM) which is located inside the CPU package for low latency, as seen in the Fujitsu A64FX processor used in the Fugaku supercomputer.
According to Arm, CSS N3 is to initially target 5G, networking, edge and DPU type applications, while the higher performance CSS V3 is being aimed at cloud and datacenter, AI and HPC applications.
With AI being such a key market for Arm, the chip designer is keen to show how it has optimized performance in the new Compute Subsystems for this workload. It claims that the Neoverse V3 and N3 cores achieve a performance increase of 84 percent and 196 percent over their predecessors, respectively, for AI and data analytics.
"Analyzing a specific mission critical algorithm at the heart of key partner workloads, we were able to identify and implement the most effective microarchitecture changes to impact performance," said Dermot O'Driscoll, VP of Arm’s Infrastructure Line of Business.
"In this case, that came down to better branch prediction, better management of the last level cache and associated memory bandwidth, and a big bump in L2 cache size. The result: a whopping 196 percent gain in performance on N3, and this on a workload where we were already outstripping the competition," he added.
Arm reckons that chipmaker SocioNext plans to produce a chiplet based on Neoverse CSS V3 that will be manufactured by TSMC using its 2nm production node, which it is due to start production with in 2025.
Faraday has already announced a chiplet-based server SoC that will feature 64 N-Series cores and be manufactured using Intel Foundry's 18A process node, and ADTechnology is set to deliver a 16-core CSS N-Series edge server platform manufactured by Samsung's chip foundry.
Samsung has also teamed up with the Brit chip designer to deliver the next generation of Arm's high-performance Cortex-X core from its foundry.
The South Korean outfit says it and Arm plan to use its 2nm Gate-All-Around (GAA) production node to provide custom silicon for datacenters as well as a chiplet-based solution targeting generative artificial intelligence for the mobile computing market.
Samsung previously indicated that it also aims to start manufacturing 2nm silicon in 2025, putting it neck and neck with TSMC. Samsung beat TSMC to making 3nm chips back in 2022.
Some details of the upcoming Cortex-X core, likely to be officially launched this year as the Cortex-X5, were disclosed last month by Patrick Moorhead, CEO at Moor Insights & Strategy. Quoting Arm, he said in a blog post that it was expected to deliver the "largest year-over-year IPC performance increase in five years."
- TSMC chips away at the competition with 2nm production set for 2025
- Faraday plots a 64-core Arm chip with Intel inside
- Arm cooking up powerful Cortex-X CPU to beat iPhone performance, says industry watcher
- Upstart retrofits an Nvidia GH200 server into a €47,500 workstation
Chris Bergey, SVP and GM for Arm's Client Business, says the work was part of the chip designer's longstanding collaboration with Samsung.
"Optimizing Cortex-X and Cortex-A processors on the latest Samsung process node underscores our shared vision to redefine what's possible in mobile computing, and we look forward to continuing to push boundaries to meet the relentless performance and efficiency demands of the AI era," he said in a prepared statement. ®
Want more commentary? Of course you do: Check out this analysis on The Next Platform.