Arm targets AI performance with latest Neoverse Compute Subsystems

More and more obvious what a key market ML is for the chip designer

Chip designer Arm has unveiled two additional Neoverse Compute Subsystem blueprints in its portfolio, and is working with Samsung on its next high-performance Cortex-X core on the Korean chipmaker's 2nm production process.

Arm introduced its Neoverse Compute Subsystems (CSS) last year, pitching it as a speedier way for customers to produce Arm-based silicon by including more pre-validated components than just the processor cores.

First out of the blocks was the CSS N2, which has since been taken up by Microsoft and incorporated into the Redmond giant's custom Cobalt 100 processor for Azure datacenters.

Now, Arm is adding two new Neoverse Compute Subsystems, the CSS N3 and the CSS V3, which as their names suggest are designed around new N3 and V3 Neoverse cores.

CSS N3 is all about power efficiency, and the first instantiation offers 32 cores with a power envelope of 40W TDP. The chip offers a performance-per-watt uplift of 20 percent over the CSS N2, Arm claims.

The N3 core supports Arm v9.2 features, with 2MB per core private cache and support for PCIe and CXL I/O, as well as the UCIe (Universal Chiplet Interconnect Express) standard for linking chiplets together.

CSS V3 is the first Compute Subsystem based on Arm’s V-Series performance cores. Arm claims this delivers over 50 percent more performance than the CSS N2 product, plus it can scale up to 128 cores per SoC. Arm claims the V3 core used in this is its highest single thread performance Neoverse core ever – at least until the next one.

This CSS supports PCIe 5.0 and CXL 3.0 I/O, and in addition to DDR5 also supports High Bandwidth Memory (HBM) which is located inside the CPU package for low latency, as seen in the Fujitsu A64FX processor used in the Fugaku supercomputer.

According to Arm, CSS N3 is to initially target 5G, networking, edge and DPU type applications, while the higher performance CSS V3 is being aimed at cloud and datacenter, AI and HPC applications.

With AI being such a key market for Arm, the chip designer is keen to show how it has optimized performance in the new Compute Subsystems for this workload. It claims that the Neoverse V3 and N3 cores achieve a performance increase of 84 percent and 196 percent over their predecessors, respectively, for AI and data analytics.

"Analyzing a specific mission critical algorithm at the heart of key partner workloads, we were able to identify and implement the most effective microarchitecture changes to impact performance," said Dermot O'Driscoll, VP of Arm’s Infrastructure Line of Business.

"In this case, that came down to better branch prediction, better management of the last level cache and associated memory bandwidth, and a big bump in L2 cache size. The result: a whopping 196 percent gain in performance on N3, and this on a workload where we were already outstripping the competition," he added.

Arm reckons that chipmaker SocioNext plans to produce a chiplet based on Neoverse CSS V3 that will be manufactured by TSMC using its 2nm production node, which it is due to start production with in 2025.

Faraday has already announced a chiplet-based server SoC that will feature 64 N-Series cores and be manufactured using Intel Foundry's 18A process node, and ADTechnology is set to deliver a 16-core CSS N-Series edge server platform manufactured by Samsung's chip foundry.

Samsung has also teamed up with the Brit chip designer to deliver the next generation of Arm's high-performance Cortex-X core from its foundry.

The South Korean outfit says it and Arm plan to use its 2nm Gate-All-Around (GAA) production node to provide custom silicon for datacenters as well as a chiplet-based solution targeting generative artificial intelligence for the mobile computing market.

Samsung previously indicated that it also aims to start manufacturing 2nm silicon in 2025, putting it neck and neck with TSMC. Samsung beat TSMC to making 3nm chips back in 2022.

Some details of the upcoming Cortex-X core, likely to be officially launched this year as the Cortex-X5, were disclosed last month by Patrick Moorhead, CEO at Moor Insights & Strategy. Quoting Arm, he said in a blog post that it was expected to deliver the "largest year-over-year IPC performance increase in five years."

Chris Bergey, SVP and GM for Arm's Client Business, says the work was part of the chip designer's longstanding collaboration with Samsung.

"Optimizing Cortex-X and Cortex-A processors on the latest Samsung process node underscores our shared vision to redefine what's possible in mobile computing, and we look forward to continuing to push boundaries to meet the relentless performance and efficiency demands of the AI era," he said in a prepared statement. ®

Want more commentary? Of course you do: Check out this analysis on The Next Platform.

Send us news

Arm servers are on Nutanix's long-range radar, not yet its to-do list

CTO waiting for major OEMs to get on board, but when/if that happens it'll be game on ... perhaps for AI

Opera sings sweetly with native version for Windows on Arm

Browser ditches x64 blues for a snappier tune

Korea's SKC gets $75M in CHIPS change for US-based glass substrate plant

Set up as a gamble, the Absolics subsidiary has just paid off

Thanks for the memory, South Korea tells nation's chip makers – now build processors

President warns of 'all-out national warfare' around silicon as he announces $19B development package

Microsoft invites punters to test drive custom Arm-based Cobalt 100 CPU VMs in Azure

Subscribers in US, Europe, SEA can take silicon out for a spin for free

Even TSMC can't cook chips fast enough to sate AI's hunger

Semiconductor foundry industry thanks its lucky stars amid slow general recovery

ASML could brick Taiwan's chipmaking machines in case of uninvited guests

If I can't have you, then no one will!

China sets goal for local carmakers to get a quarter of their chips domestically by 2025

Yet another technological self-sufficiency target for China

Qualcomm warms bed for Linux on Arm PCs

One eye on Windows, the other winking at penguins

Biden cranks up the heat on China with wall of tech tariffs

It's not just EVs – semiconductors, batteries, and solar cells all hiked

Japan may need 50% more electricity for hungry, hungry AI and chip fabs

While Tokyo pours billions into revitalizing chipmaking sector, it might want to check out the grid

US semiconductor building boom means staff shortages and talent slipping away

McKinsey's solution? Reach out to middle schoolers