Linus Torvalds's faulty memory (RAM, not wetware) slows kernel development

Emperor penguin swipes Intel's attitude to ECC memory and maybe wimpy Mac performance too

If the next version of the Linux kernel emerges a little slower than usual, blame a dodgy DIMM in Linus Torvalds's AMD Threadripper-powered PC and the vagaries of the memory market.

In a post responding to a kernel developer inquiring if he had missed a Git Pull, Torvalds on Sunday revealed the request was still in his queue as "I'm doing merges (very slowly) on my laptop, while waiting for new ECC memory DIMMs to arrive."

Torvalds needs the DIMMs because over the last few days he experienced what he described as "some instability on my main desktop the … with random memory corruption in user space resulting in my allmodconfig builds randomly failing with internal compiler errors etc."

The Linux boss's first thought was that a new kernel bug had caused the problem – which isn't good but sometimes happens.

His instinct was wrong.

"It was literally a DIMM going bad in my machine randomly after 2.5 years of it being perfectly stable," he wrote. "Go figure. Verified first by booting an old kernel, and then with memtest86+ overnight."

Torvalds appears to have been tracking delivery of the new DIMMs as he reported replacement memory was "out for delivery" and predicted it should arrive later on Sunday evening.

"I'll probably leave memtest86+ for another overnight with the new DIMMs just because this wasn't the greatest experience ever. A fair amount of wasted time blaming all the wrong things, because _obviously_ it wasn't my hardware suddenly going bad," he added.

Torvalds's post is interesting for two other reasons. One is that the laptop he mentions could be the recent MacBook – complete with Arm64 Apple silicon – that he used to push the final cut of Linux 5.19. If that's the same laptop he used on Sunday, said silicon may not be quite up to one of the more high-profile workloads in the world – or perhaps Linus just misses the comforts of a big screen.

His post also mentions that his main PC was set up for error correction code memory (ECC memory), but "during the early days of COVID when there wasn't any ECC memory available at any sane prices. And then I never got around to fixing it, until I had to detect errors the hard way."

"I absolutely *detest* the crazy industry politics and bad vendors that have made ECC memory so 'special'," he added.

That appears to be a reference to this post from 2021 in which Torvalds offered the following opinion:

The only reason Intel says "ECC is for servers and embedded" is because Intel marketing people have convinced the powers that be that they can sell otherwise inferior chips for a higher price by enabling ECC functionality. Look at the kinds of chips that Intel sells with ECC – those Xeons (and embedded Core i3 Atom class CPUs) sure don't tend to be better in other ways.

Don't fall for the bullshit. ECC is not for servers. ECC is for everybody, and wanting to pay a bit extra for RAM shouldn't mean that you are then limited in other ways.

The above is a reference to Intel not permitting use of ECC with all of its consumer-grade CPUs, and therefore suppressing demand by making it an option for fewer buyers. And because demand is low, manufacturers don't come to the party, prices stay high ... and many can't afford the extra resilience that ECC affords compared to normal RAM. Which is where this intersects with Torvalds' day job, because Linux (and all other software) can benefit from the error correction that ECC RAM performs.

Torvalds is currently occupied by version 6.1 of the Linux kernel, which among other things adds support for the Rust programming language. ®

Similar topics


Send us news

Other stories you might like