Kernel tweaks improve Raspberry Pi performance, efficiency
There's a lot of room for improvement in modern computing, from the low end to the very high
Two separate development efforts are improving both Raspberry Pi power management and memory efficiency – one using tools built for massive clusters.
One set of kernel patches adds "Suspend to idle" (s2idle) support to the kernel for older Pi models. The other patch, from Igalia, brings NUMA support to the Pi 5, which rather unexpectedly boosts performance. As The Register explained back in 2013, NUMA is a technology more usually seen in clusters, short for non-uniform memory access.
First the power management support. Although the Pi series are based on mobile phone tech, they are effectively tiny desktop devices, mainly intended to run off the mains. The new code supports Pis based around the Broadcom BCM2835 SoC. This is the older SoC used in the Raspberry Pi 1, 2, and 3, so for now, this won't help newer Pi 4 or 5 hardware. Developer Stefan Wahren chose to target the older models first because more documentation is available for them.
S2idle is one of four common types of ACPI suspend states, as explained by the Arch wiki. For reasons we're sure made sense to someone somewhere, the levels are known as S0, S1, S3, and S4, and if that doesn't make you twitch, then you may enjoy knowing that s2idle support puts the machine into the S0 state, and not S2, because there isn't an S2 level.
Broadly, S0 – known as "freeze" – stops the machine from running. In theory, it also puts IO devices to sleep, but that doesn't work on the Pi's USB controller just yet. For completeness, S1 is "shallow" sleep, or standby mode. S3 is "deep" sleep, or suspend-to-RAM, and S4 is suspend-to-disk or hibernation.
Some of this is handled by firmware on PCs, especially in laptops, but as we have covered before, firmware and Linux driver support for Arm-based systems is a lot more complicated than in the relatively homogeneous x86 world.
These are early days and the power usage drop is modest, only about a third of a Watt. Wahren reports that a Pi 1 doing nothing drew 1.67 W, which dropped to 1.33 W when suspended. Even so, it's a step in the right direction, and if this becomes a standard kernel feature, it could reduce the power usage of millions of Pis out there.
The other interesting Pi development in kernel land is that enabling NUMA support lets the Pi 5 run faster. NUMA itself is nothing new to Linux – The Reg was reporting on IBM working on Linux NUMA support a quarter of a century ago. The gist is that in a system comprising lots of machines, each with multiple multi-core processors, the speed of access to memory will probably differ from one core to another, which may still be faster than from one physical CPU socket to another on the same motherboard – and that will be orders of magnitude faster than accessing RAM on another node.
Surprisingly, emulating this really helps the modest BCM2712 SoC inside the Pi 5, as developer Tvrtko Ursulin explains in the patch notes:
This series adds a very simple NUMA emulation implementation and enables selecting it on arm64 platforms.
Obvious question is why? Short answer – it can bring a significant performance uplift on Raspberry Pi 5.
Longer answer is that splitting the physical RAM into chunks, and utilizing an allocation policy such as interleaving, can enable the BCM2712 memory controller to better utilize parallelism in physical memory chip organisation.
In more concrete numbers, testing with Geekbench 6 shows that splitting into four emulated NUMA nodes can uplift the single core score of the benchmark by around 6 percent, and the multi-core by around 18 percent.
- Linux geeks cheer as Arm wrestles x86
- Apple's Macintosh 128K on a Pi Pico gets thumbs-up from Upton
- Raspberry Pi stock surges after London IPO
- Raspberry Pi unveils Hailo-powered AI Kit to make the model 5 smarter
For The Reg FOSS desk, considering these two patches together gives a pleasing example of the benefits of a single OS that can run on anything from a mobile phone to a supercomputer cluster. Adapting laptop-style power management to tiny single-board computers can drop their power usage, which could result in big savings if deployed to large clusters. Conversely, adapting memory management algorithms designed for large clusters can measurably improve the performance of the same family of tiny SBCs.
We really want to see much more aggressive power management brought to server OSes – the sort of stuff that makes pocket fondleslabs last longer – deployed in datacenters. While corporates have for many years been talking about carbon offsetting, it's bogus, it doesn't help. Many times, it has been shown to be ineffective. Meanwhile, datacenter emissions are rising.
Despite a decade of Kubernetes, most people using it still don't need it (even when the results are amusing). The world would benefit more from focusing on datacenter computing that scaled down rather than up and out. ®