Network edge? You get 64-bit Armv9 AI. You too, watches. And you, server remote management. And you...

Arm rolls out the Cortex-A320 for small embedded gear that dreams of big-model inference

Arm predicts AI inferencing will soon be ubiquitous. In order to give devices the oomph they need for all that neural-network processing, it is beefing up its embedded platform with the first 64-bit Armv9 CPU core aimed at edge workloads.

The Softbank-owned Brit chip design biz says AI development is moving quickly, claiming that network-edge machine-learning workloads were much simpler just a few years ago, focused on basic noise reduction or anomaly detection.

"Take the humble doorbell as an example," says Paul Williamson, senior veep and general manager for Arm's Internet-of-Things line-of-business. It evolved from a simple buzzer to a basic camera viewer and now on to a smarter AI-driven device capable of determining whether it is detecting people or even identifying specific individuals, he added.

To address this, the processor design house is introducing the Cortex-A320 CPU core, which is intended to be paired in edge AI system-on-chip (SoC) designs with the Ethos-U85, Arm's embedded neural processing unit (NPU) accelerator. It can be configured in clusters of four cores to scale and accommodate a range of performance needs.

Arm Cortex-A320 in edge platform

You got your Ethos-U85 on my Cortex-A320 ... You got your Cortex-A320 on my Ethos-U85. Arm's press slide on the combination of its AI-friendly cores for future SoCs

The A320 is said to be the "smallest Armv9 implementation," provides an AArch64 instruction set, and is a relatively simple single-issue, in-order, eight-stage core with up to 64KB in L1 cache, and up to 512KB L2. Good to see RISC-V keeping Arm on its toes, there.

As an indication of how fast things have moved, it is less than a year since Arm rolled out a reference platform for edge AI that paired the Ethos-U85 with the Cortex-M85, a microcontroller-grade CPU core design.

In contrast, the Cortex-A320 is part of Arm's full-fat application processor family, albeit an "ultra-efficient" one, based on the newer Armv9 architecture with the various enhancements this brings. The new pairing delivers more than eight times the machine-learning performance of last year's platform, Williamson claims, and is capable of handling large AI models of over a billion parameters.

Arm support for 1B parameters

Arm's illustration of support for 1B-parameter models ... Click to enlarge

"The continued demand for hardware to efficiently execute larger networks is pushing memory size requirements, so systems with better memory access performance are becoming really necessary to perform these more complex use cases," Williamson said.

"Cortex-A processors address this challenge as they've got intrinsic support for more addressable memory than Cortex M based platforms and they're more flexible at handling multiple tiers of memory access latency."

Within the family of Armv9 processors, Cortex-A320 is now said to be the most energy-efficient to date, as it's claimed to use half the power of the Cortex-A520, the high-efficiency core used in some reference designs.

The move to Armv9 brings with it the security features introduced in this architecture, such as memory tagging extensions for catching memory exceptions, while for AI processing, it also features the Scalable Vector Extensions (SVE2) and support for the BFloat16 data type.

Software development is also vital, and here Arm is offering support for the new edge hardware in its Arm Kleidi libraries. This includes Kleidi AI, a set of compute kernels for building AI frameworks, and Kleidi CV for computer vision applications.

This also supports optimizations in Armv9 such as Neon and SVE2, and is integrated into popular AI frameworks such as llama.cpp, ExecuTorch, and LiteRT, according to Williamson.

Cortex-A320 also has the ability to run applications using real-time operating systems such as FreeRTOS and Zephyr, plus support for Linux.

As with other Arm offerings, licensees will be responsible for building chips around the new Cortex-A320 and Ethos-U85. The firm said it is expecting to see it in silicon next year, but wouldn't name any specific partners or products that will be using it.

Beyond network-edge applications, its low-power design makes it suitable for various uses, including smartwatches and wearables. Cortex-A320 is also potentially "the ideal CPU for baseboard management controllers in servers and infrastructure," according to Williamson. ®

More about

TIP US OFF

Send us news


Other stories you might like