This article is more than 1 year old

Can't quite cram a working AI onto a $1 2KB microcontroller? Just get a PC to do it

Boffins craft code that compresses machine-learning models for modest microchips

Eggheads have devised software that can automatically produce machine-learning models small enough to run inside tiny microcontrollers.

Microcontroller units (MCUs) are pretty darn common: they can be found wedged inside everything from microwaves and washing machines, to cars and server motherboards, running relatively simple software to control or monitor hardware. They are power efficient and typically don't cost a lot: they're there to do one or two jobs, and do them well. As such, they have very limited resources compared to traditional computer systems and embedded electronics. The Microchip ATmega328P, for instance, as found in the Arduino Uno developer kit, sports just 2KB of RAM and 32KB of flash storage, and costs about a dollar. And that's by no means the smallest nor the largest.

Now these components are fine for running heuristic-based algorithms, and state machines, and other straightforward code to make decisions from input data. However, if you want to inject a little more intelligence into these systems, you're going to struggle to perform AI inference on an MCU given these tight resources. You need enough storage space to hold a trained neural-network model, and enough RAM to carry out inference operations on the model, ultimately converting input data into output decisions.

However, don't give up: deep learning can be deployed on MCUs, according to an arXiv-hosted paper written by a team of researchers at Arm ML Research and Princeton University in the US.

You need to know your SpArSe from your elbow

Rather than spend ages optimizing and compressing a trained model by hand to fit into a modest MCU, the trick instead is to write and use software that can automatically craft computer vision models that will run effectively on modest MCUs. Enter SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers. It uses a combination of neural architecture search (NAS) and network pruning to build AI systems for embedded controllers.

The NAS part shapes the model's architecture to match the requirements, and the network pruning side makes sure the model uses fewer parameters and thus takes up less memory without sacrificing performance and accuracy too much. All in all, the models outputted by SpArse take up less room and require less number crunching.

The researchers did not specify the exact MCU they used to test the neural-network architectures built by SpArse, other than to say it was an Arm-based STM32 chip and the model needed no more than 2KB of RAM to run. We're told four convolutional neural network (CNN) models were developed by SpArse and tested with an STM32 MCU on four different image datasets: MNIST, CIFAR10, CUReT, and Chars4k. These datasets are commonly used to train and test computer vision models for object recognition.


First, Google touts $150 AI dev kit. Now, Nvidia's peddling a $99 Nano for GPU ML tinkerers. Do we hear $50? $50?


MNIST examines computers on their ability to tell apart handwritten digits, CIFAR10 contains images with different objects such as cats, dogs, or cars, CUReT focuses on closeups of different textures, and Chars4k is a smaller subset of the dataset Chars74k has images of various characters and numbers.

First, the researchers trained the four different models using SpArSe and four Nvidia GeForce RTX 2080 GPUs, and then deployed them individually onto an MCU, each requiring less than 2KB of RAM to run. It's assumed the chip chosen had enough flash storage space to hold the models, even one by one, though weirdly this isn't discussed in the paper: the focus is on the RAM required, presumably because that's at a premium, and a major limiting factor, whereas flash or ROM storage can be scaled up to 128, 256, 512KB, and so on.

Anyway, it's claimed the models were accurate to 97 per cent on MNIST, 73 per cent on CIFAR, 96 per cent on CUReT, and 67 per cent on Chars4k, despite the tight RAM conditions.

“Although MCUs are the most widely deployed computing platform, they have been largely ignored by ML researchers. This paper makes the case for targeting MCUs for deployment of ML, enabling future IoT products and use cases. We demonstrate that, contrary to previous assertions, it is in fact possible to design CNNs for MCUs with as little as 2KB RAM,” they concluded in their paper, emitted at the end of May.

"Our Sparse Architecture Search method combines neural architecture search with pruning in a single, unified approach, which learns superior models on four popular IoT datasets. The CNNs we find are more accurate and up to 4.35x smaller than previous approaches, while meeting the strict MCU working memory constraint."

Cramming AI on internet-of-things and non-internet-connected devices may, at this rate, make so-called "smart" appliances actually smart. ®

More about


Send us news

Other stories you might like