This article is more than 1 year old
Fed-up Torvalds suggests disabling AMD’s 'stupid' performance-killing fTPM RNG
Some Ryzen Linux machines still stumble along despite efforts to fix it all
Ongoing issues with Linux and AMD's fTPM – the chip designer's firmware-based TPM – appear to be wearing on kernel overseer Linus Torvalds' nerves, who has suggested switching off the module's random number generator altogether.
"Let's just disable the stupid fTPM hwrnd thing," Torvalds said on the open source kernel's development mailing list. "Maybe use it for the boot-time 'gather entropy from different sources,' but clearly it should not be used at runtime."
TPMs, whether they're firmware or hardware based, are used to securely create and store cryptographic keys, certificates, and passwords. The modules also, among things, generate random numbers for software to use.
In the case of AMD's fTPM, the module can cause intermittent stuttering, depending on which Ryzen processor you're using. It appeared when the fTPM was in use, it would access its flash storage via a serial interface, and when doing so, held up activity by the rest of the system. If the fTPM was used frequently, such as by software to generate streams of random numbers, the end result to users on affected systems was spluttering performance.
As AMD put it in a knowledge base entry from last year, "select AMD Ryzen system configurations may intermittently perform extended fTPM-related memory transactions in SPI flash memory ('SPIROM') located on the motherboard, which can lead to temporary pauses in system interactivity or responsiveness until the transaction is concluded."
The problem cropped up on PCs powered by Microsoft Windows, and was resolved in a BIOS update that fixed the fTPM to ensure it behaved better. The issue also impacted Linux, and while it appeared that a kernel-level patch had resolved the bug, the slowdown has cropped up again, attracting Torvalds' ire.
As we understand it, that kernel patch from February attempted to identify whether the PC was using a buggy version of AMD's fTPM and disabled the random number generator if so. The justification being that not everyone has installed the necessary BIOS update or can install it, as they're relying on motherboard makers to distribute the fix.
Fast forward to this month, and it seemed the patch doesn't catch all iterations of the buggy firmware, or that the firmware isn't completely fixed, so for some users, the stuttering persists. Hence the kernel chief's suggestion to just disable the fTPM's number generator regardless of version.
Torvalds' argument is fairly straightforward and amounts to: if fTPM is causing so many problems, why not just use the processor's rdrand instruction to offer random numbers instead. At best the fTPM could be used during system startup to provide entropy to the kernel's random number generation service, where uneven performance may not be that annoying, but during normal use, the fTPM is not to be used as a random number source, he suggested.
"Why would anybody use that crud when any machine that has it supposedly fixed — which apparently didn't turn out to be true after all — would also have the CPU rdrand instruction that doesn't have the problem," Torvalds wrote. "I don't see any downside to just saying that fTPM thing is not working. Even if it ends up working in the future, there are alternatives that aren't any worse."
Torvalds acknowledged that rdrand can be slow, but compared to the stuttering users are seeing as a result of the fTPM, it would seem to be the better alternative. "So rdrand — and rdseed in particular — can be rather slow, but I think we're talking hundreds of CPU cycles — maybe low thousands. Nothing like the stuttering reports we've seen from fTPM," he wrote.
- 'Weird numerological coincidence' found during work on Linux kernel 6.5
- Linux has nearly half of the desktop OS Linux market
- Linux kernel logic allowed Spectre attack on 'major cloud provider'
- Linus Torvalds suggests the 80486 architecture belongs in a museum, not the Linux kernel
The actual cause of the bug isn't clear at this point, though Torvalds offered a few theories as to what could be going on.
"I can easily imagine a BIOS fTPM code using some absolutely horrid global EFI synchronization lock or whatever, which could then cause random problems just based on some entirely unrelated activity," he wrote. "I would not be surprised, for example, if [it] wasn't the fTPM hwrnd code itself that decided to read some random number from SPI, but that it simply got serialized with something else that the BIOS was involved with."
"It's not like BIOS people are famous for their scalable code that is entirely parallel," he added.
You can find Torvalds' full comments here.
The Register reached out to AMD for comment on the issue and to get a better idea of the consequences associated with disabling the fTPM's random number generator.
fTPM can be toggled off within the BIOS, however doing so can limit the functionality of the system, particularly with regard to hardware encryption and security. With that said, the TPM's functionality is likely more relevant to users of Windows 11. Regardless of whether they actually use any services that rely on the TPM, Redmond's latest operating system does technically require it.
AMD has previously suggested using a physical TPM module as an alternative to the firmware TPM used by many motherboards. You'll want to disable any encryption that relies on the TPM first, of course, and you'll also need a motherboard that has the appropriate header to accept such a module, which isn't guaranteed. ®