ARM has had a look at the fridges, speakers and robots that use its Cortex-M series processor cores and decided they need a few maths lessons.
The Cortex-M7's block diagram
The Brit CPU designer has today revealed its new 32-bit Cortex-M7, which will sit at the top of its its microcontroller-grade family of cores in terms of performance. The previous cock-of-the-roost was the Cortex-M4.
The M7, we're told, has twice the DSP power of the M4 by executing twice as many instructions simultaneously, and it also helps that the M7 can operate at a higher clock frequency than the M4.
"The Cortex-M7 has a superscalar pipeline which can execute two instructions simultaneously," an ARM source told us.
"The Cortex-M4 can execute just one instruction at one time. This is where most of the speed-up comes from. The Cortex-M7 can run at a higher clock frequency than Cortex-M4 – together these give on average two-times uplift in DSP performance for Cortex-M7 over Cortex-M4."
DSP (digital signal processing) is particularly useful for efficiently juggling incoming streams of audio and video data, and performing fast motor control – better than a generic CPU core can manage.
By doubling the performance, ARM reckons appliances and gadgets using the M7 can more quickly perform the complex mathematics required to finely control motor movement in robots; analyze microphone, touchscreen, and other sensor data; and encrypt telemetry before it's sent over the air.
That means ovens with better voice-recognition when you speak to them, drones with tighter flight control, tiny sensor networks in walls sensing damp early, and so on. In theory.
All of this depends on the system-on-chips the M7 cores end up in, and the software running on them. Manufacturers can set the clock speed, and enable and disable various features as they desire; hardware and software engineers may have other ideas for products and bottlenecks in mind to stuff up ARM's dream of pumping intelligence into the Internet of Things.
More intelligent SoCs means less data flying back to base – since the microcontrollers can make more of their own decisions – which will result in simpler networks (and less information to intercept) but it'll make the code on the cores more complex – and that means more bugs, potentially.
If you want truly simple software and hardware, look towards the Cortex-M0+ and gear made by Electric Imp. But those devices just don't have the oomph of the M7, so there's your trade-off.
At 160MHz ... M7 cores can be plonked into audio gear and use the extra DSP power for crunching and decrunching audio (click to enlarge)
The M7 has a six-stage superscalar pipeline, with branch prediction, compared to the M4's three-stage, and runs the usual 32-bit ARMv7 instruction set. It's backed by the Keli CMSIS DSP library, and includes a single and double precision FPU. At least the DSP functionality is within the instruction set, albeit as an extension, rather than discrete DSP silicon that's a pain in the ASCII to communicate with.
Yes, this is a beefy microcontroller, lurching towards the application-grade Cortex-A family found in smartphones and tablets. The Cortex-R family, for completeness, is focused hard on realtime control – hard drive motor controllers, radio transmitters in phones, and so on. That's the go-to ARM architecture for on-the-spot deterministic reaction to interrupts and other events.
Having said that, the M7 aims for realtime determinism with tightly coupled memories and a 12-cycle interrupt latency. You can also use two M7 cores in lock step running the same code – one following two cycles behind the other – so that glitches can be detected by external electronics if the two CPUs sudden behave slightly differently.
We're told there's also more flexibility with interfacing flash memory: from what we understand, that means there's a greater choice of non-volatile NAND configurations supported by the cores. Flip some control bits, change the wait states, and off you go. This, as a result, optimizes memory accesses, we're told.
"It's documented, but how we do it internally involves quite a bit of secret sauce. We have to keep a bit of it secret," Nandan Nayampally, vice-president of product marketing, application processor systems, at ARM, told us.
According to ARM's benchmarking, the M7 achieves five CoreMark per MHz, or a 2,000 CoreMark score at 400MHz in a 40nm process at low power, if you run the code in tightly coupled memory. The M4 can hit 3.4 CoreMark per MHz, according to previous ARM figures, and runs at a lower clock speed. The M7 can scale up to 800MHz at 28nm.
By ramping up the brains in devices, and allowing them to make complex decisions rather than pipe raw data over a network to a larger CPU, it appears ARM is taking the internet out of the internet of things. But Nayampally told us connectivity, from ZigBee and Bluetooth to Wi-Fi, is still crucial – there's just less to put on the wire or in the ether.
"When the intelligence is in the endpoint, you get always on and always aware," said Nayampally.
"You can introduce more reliability in systems by preventing things from going down before they go down. And with good DSP functions, you can do things like voice recognition."
Atmel, Freescale and ST Microelectronics have already snapped up licenses to pump out chips with M7 cores in the 90nm to 40nm process range; each core taking up a 0.1mm square of silicon, before the manufacturer whacks peripherals, control logic, power management, and so on, into a chip package.
These will join the 2.9 billion Cortex-M cores embedded in devices in 2013, ARM is keen to tell us, and 1.7 billion already out the door in the first six months of 2014. As well as comms and embedded tech, 14 per cent of the 2013 figure apparently ended up in payment cards, a world away from drones and the Internet of Things. ®