Arm flexes silicon muscles to push generative AI at the edge
Ethos-U85 microNPU boasts 4x performance boost over previous gen
Arm is aiming to boost AI performance at the edge with its latest embedded neural processing unit (NPU) and a Reference Design Platform for it to slot into, and said it expects to see devices based on it running generative AI models next year.
The Ethos line-up is Arm's NPU portfolio, and the Ethos-U series are embedded versions, or so-called microNPUs, designed to be paired with one of the chip designer's Cortex-M processors.
With the Ethos-U85, Arm is claiming a 4x performance boost and 20 percent greater power efficiency over previous generations. One reason for this is because it can be configured with 128 up to 2,048 multiply-accumulate units, the latter being four times the number in the existing Ethos-U65, delivering performance of up to 4 TOPs (trillion operations per second) at 1 GHz.
This step up is required because the AI and machine learning (ML) processing demands being placed on embedded systems are growing, according to Arm's IoT Line of Business SVP & GM, Paul Williamson.
"The first wave of edge compute was optimized for limited memory, low power needs of constrained devices," Williamson said, but since then they have become more connected and had to contend with larger and larger volumes of data.
"Machine learning inference has then been deployed to crunch the data that's been generated and find meaningful insights. And then AI has evolved from not only predicting the outcome, but also generating new data and further insights," he added.
Arm claims that Ethos-U85 now allows small embedded devices to support Transformer Networks as well as Convolutional Neural Networks (CNNs) for AI inferencing. This will drive the development of new applications, particularly in vision and generative AI use cases for tasks like analyzing data for image classification and object detection.
"We expect Ethos-U85 to be deployed in emerging edge AI use cases and smart home retail or industrial settings, where there is demand for that high performance compute with the support of latest AI frameworks," Williamson said.
Those frameworks include TensorFlow Lite and PyTorch, and the latest NPU is compatible with the existing toolchain so developers that have already coded for Ethos can continue to use the same tools and code with Ethos-U85.
To complement it, Arm has created the Corstone-320 IoT Reference Design Platform, which hardware partners can use to quickly create a chip design.
This combines Ethos-U85 with Cortex-M85, claimed as the company's highest performing design for microcontroller-based products, and the Mali-C55 image signal processor.
- How this open source LLM chatbot runner hit the gas on x86, Arm CPUs
- Chrome for Windows-Arm laptops officially lands in time for Snapdragon X Elite kit
- RISC-V PCIe 5 SSD controller for the rest of us hits 14GB/s
- First Armv9 automotive CPUs aim to power AI-enabled vehicles
But Ethos-U85 will also work with the higher-end Armv9 Cortex-A CPUs, to bring power-efficient edge inference into a broader range of higher-performing devices, Arm said.
Corstone-320 has been developed with applications in mind such as battery-powered camera systems for the smart home, connected cameras used in industrial production lines, and retail systems, according to Williamson.
The platform includes software tools and support including Arm Virtual Hardware. This latter capability allows for software development to start ahead of final silicon being available, Arm says, speeding time to market for complex edge AI devices.
Arm also sees an opportunity for small versions of generative AI models to run at the edge on embedded systems, and claims this platform will enable that.
Williamson said that Arm already has partners who are experimenting with running generative AI models.
"We expect to see platforms based on the Ethos-U85 in silicon in devices next year in 2025, so that is the point where we'd be able to see the first of those benefiting from that improved performance," he told The Register.
This might be seen in uses cases such as smaller library language models for localized support in voice detection and voice response, being able to use a much broader range of words and language rather than being fixed to a limited number of keywords, according to Williamson. ®