Linux kernel patch from Google speeds up server shutdowns
First world problems: Too many NVMe drives, not enough seconds to spare
A new Linux kernel patch from a Google engineer resolves a problem caused by a condition that many of us might quite like to experience – having too many NVMe drives.
The problem is caused by the relatively long time it takes to properly shut down a drive: apparently, as much as four-and-a-half seconds.
Remember Sun's X4500 storage server, originally codenamed Thumper? It was truly radical when it appeared: a 3U dual-processor server, but with a stonking 48 drive bays. These days Google has a bunch of boxes with a still-fairly-impressive 16 NVMe drives attached to each one. And when they have to reboot, they take a long time.
If you have a storage server with 16 drives, that's 72 seconds of wasted time hanging around every time it reboots. Hardly an eon, but still annoying – because it's totally unnecessary.
The problem is that the kernel's drive-shutdown function is synchronous: for each drive, it waits for the shutdown command to complete before carrying on to the next. The new kernel patch does exactly the same thing, but changes the way that the calls are issued to be asynchronous. It issues the call to the first drive, then immediately moves on to the next, and works its way down the list. When they all return the desired status, the job is done.
- New Linux kernel bolsters random number generation
- Asahi Linux reaches 'very early Alpha'
- Linux 5.17 debuts after 'very calm' extra week of work
- Linus Torvalds ponders limits of automation as kernel release delayed
Presto, a minute off your reboot time. If you have more storage than Larry Page's home computer anyway.
Although this doesn't directly help most of us, sometimes these sorts of changes can have very pleasant side effects. For instance, there's a tool for kernel developers called
kexec which allows one kernel to load another kernel into memory and start it. This has a very desirable side effect, though: it allows you to turbocharge Linux restarts. Since your computer has to spend a minute or so in its firmware, performing some self-tests and so on before it loads the operating system, if you can bypass that and just restart directly from one OS into the other, you can reboot in seconds rather than minutes. And if you're thinking that you have an SSD and bootups are super quick anyway, the effect is even more extreme with an SSD. ®
Like many other things, progress has made things worse, and unfortunately, installing the
kexec-tools package on Ubuntu, which used to just magically work, now doesn't. Do let us know if you find a working fix.