Retbleed slugs VM performance by up to 70 percent in kernel 5.19
VMware ran tests and saw some nasty numbers. Performance of next kernel otherwise uncontroversial
VMware engineers have tested the Linux kernel's fix for the Retbleed speculative execution bug, and report it can impact compute performance by a whopping 70 percent.
In a post to the Linux Kernel Mailing List titled "Performance Regression in Linux Kernel 5.19", VMware performance engineering staffer Manikandan Jagatheesan reports the virtualization giant's internal testing found that running Linux VMs on the ESXi hypervisor using version 5.19 of the Linux kernel saw compute performance dip by up to 70 percent when using single vCPU, networking fall by 30 percent and storage performance dip by up to 13 percent.
Jagatheesan said VMware's testers turned off the Retbleed remediation in version 5.19 of the kernel and ESXi performance returned to levels experienced under version 5.18.
Because speculative execution exists to speed processing, it is no surprise that disabling it impacts performance. A 70 percent decrease in computing performance will, however, have a major impact on application performance that could lead to unacceptable delays for some business processes.
VMware's tests were run on Intel Skylake CPUs – silicon released between 2015 and 2017 that will still be present in many server fleets. Subsequent CPUs addressed the underlying issues that allowed Retbleed and other Spectre-like attacks.
- Torvalds: Linux kernel team has sorted Retbleed chip flaw
- Boffins release tool to decrypt Intel microcode. Have at it, x86 giant says
- Older AMD, Intel chips vulnerable to data-leaking 'Retbleed' Spectre variant
But many VMware users will likely have Skylake CPUs in production, or (perhaps unwittingly) use them in clouds. Assuming those users have adopted version 5.19 of the kernel – which may not be likely – they have a choice to make. Do they take the performance hit, or do they decide that Retbleed, like its predecessors, is not easy to exploit and wear the risk of running without mitigation?
Or might the issues caused by the fix hasten some migration decisions? Maybe even migration to VMware's shiny new DPU-accelerated future?
Jagatheesan's post ends as follows:
We believe these findings would be useful to the Linux community and wanted to document the same.
Might that be a call for the community to revisit the Retbleed fixes and make them a little more subtle?
One key member of the Linux community, emperor penguin Linus Torvalds, appears not to concerned by the situation. He's not commented on Jagatheesan's thread – which is not unusual – and his weekly state of the kernel post announces the debut of release candidate five for version 6.0 of the kernel.
Progress on that release is "fairly normal," Torvalds wrote. "Nothing looks particularly scary, so jump right in." ®