IBM on Saturday slipped out news of a nasty bug in its VIOS, its Virtual I/O Server that offers virtualisation services on Power Systems under AIX.
Issue IV91339 strikes when moving virtual machines and means “there is a very small timing window where the VIOS may report to the client LPAR that some I/Os have completed before they actually do.”
IBM advises that “This could cause applications running on the client [logical partition] LPAR to read the wrong data from the virtual device. It's also possible that data written by the client LPAR to the virtual device may be written incorrectly.”
Hence the issue's title: “possible data corruption after LPM failure.”
Of course data corruption is precisely what Power Systems and AIX are supposed not to do. The platforms are promoted as exceptionally stable and resilient, just the ticket for mission critical applications that can't afford many maintenance windows, never mind unplanned ones.
So IBM's guidance guidance that “Installation of the ifix requires a reboot” will not go down well with users.
There's some upside in the fact that the problem only affects VIOS 2.2.3.x through 2.2.5.x, and even then has a couple of other caveats.
But the bottom line here is that Power and VIOS users have an interesting resilience vs. convenience choice to make.
If you're one of those users, El Reg wishes you good luck! ®
UPDATE: IBM's now released a fix and updated its advice on this issue.
Big Blue now also says "The risk of hitting this exposure outside of the IBM test lab has had extensive evaluation and is considered extremely small. The controlled test environment where this problem was observed makes use of a high-precision test injection tool that was able to inject a specific error within a tiny window."
"The chances of hitting this window outside of the IBM test lab are highly unlikely and there is no known occurrence of this issue outside of the IBM test lab."
The Reg is nonetheless aware that IBM has recommended users implement the patch.