This article is more than 1 year old
Data-destroying defect found after OpenZFS 2.2.0 release
Earlier and later versions may be affected – worth your while reading the advisories
Updated A data-destroying bug has been discovered following the release of OpenZFS 2.2.0 as found in FreeBSD 14 among other OSes.
This file-trashing flaw is believed to be present in multiple versions of OpenZFS, not just version 2.2.0. It was initially thought that a feature new to that release, a feature called block cloning, primarily caused the data loss. However, it now appears, as of 1945 UTC, November 27, that this cloning feature simply exacerbates a previously unknown underlying bug. We're told the corruption is quite rare in real-world operation.
Ed Maste of the FreeBSD Foundation posted this note to users of that OS:
We want to bring your attention to a potential data corruption issue affecting multiple versions of OpenZFS. It was initially reported against OpenZFS 2.2.0 but also affects earlier and later versions.
This issue can be reproduced with targeted effort but it has not been observed frequently in real-world scenarios. This issue is not related to block cloning, although it is possible that enabling block cloning increases the probability of encountering the issue.
He added "it is unclear if the issue is reproducible on the version of ZFS in FreeBSD 12.4," and continued on Twitter that "a variation of the buggy code may exist in FreeBSD 12.4 and Illumos but the issue is masked for other reasons."
Maste's mailing list note includes a suggested workaround, though he stressed that this mitigation "does not deterministically prevent the issue but does drastically reduce the likelihood of encountering it."
Part of the ongoing confusion around this data-destroying bug is that it seems to happen with and without block cloning enabled. It's suggested this may be the fix the file system needs. Upgrading to OpenZFS 2.2.1 turns off the cloning feature, which may well reduce the chance of you encountering the flaw.
What follows is our original article, which was written and published when it was believed the data corruption was primarily linked to the new block cloning feature in version 2.2.0. As we rightly warned earlier, though, that feature "may have uncovered an underlying, different, and pre-existing bug."
OpenZFS 2.2.0 was released just last month with a new feature called block cloning, as we reported when we looked at release candidate 3. Unfortunately, there appears to be a file-corrupting flaw in that code somewhere, as found by Gentoo-toting Terin Stock, who reported bug #15526. As a result, OpenZFS 2.2.1 is already out, which disables the new feature.
This is a bit of an embarrassment for OpenZFS, a project with an enviable reputation for data integrity. It's also less than ideal for fixed-release-cycle OSes that have the new version of OpenZFS, including the newly released FreeBSD 14. Fortunately for FreeBSD, though, version 14.0 ships with the feature disabled by default.
We have mentioned the work of BSD boffin Colin Percival before, but anyone brave enough to have already installed this point-zero release should heed his warning on Twitter X: "FreeBSD 14's ZFS code supports 'block cloning'. This is turned off by default. DO NOT ENABLE THIS FEATURE UNLESS YOU WANT TO LOSE DATA."
The bug manifests as corruption of the contents of files when they're copied; instead of their expected contents, there are stretches of zeroes, mixed with blocks of what looks like Base64-encoded data. It showed up when using Gentoo's portage
command, the distro's package-management tool – an operation that typically involves copying lots of data. Worse still is that the file system's own health checks don't detect any problem. For now, release 2.2.1 simply disables the feature.
At the time of writing, it's not certain exactly what causes it. It seems to be an extremely specific (and therefore unlikely) combination of circumstances, which means it almost never happens, as Bronek Kozicki spells out on GitHub:
You need to understand the mechanism that causes corruption. It might have been there for decade and only caused issues in a very specific scenarios, which do not normally happen. Unless you can match your backup mechanism to the conditions described below, you are very unlikely to have been affected by it.
- a file is being written to (typically it would be asynchronously – meaning the write is not completed at the time when writing process "thinks" it is)
- at the same time when ZFS is still writing the data, the modified part of file is being read from. The same time means "hit a very specific time", measured in microseconds (that's millionth of a second), wide window. Admittedly, as a non-developer for ZFS project, I do not know if using HDD as opposed to SSD would extend that time frame.
- if it is being read at this very specific moment, the reader will see zeros where the data being written is actually something else
- if the reader then stores the incorrectly read zeroes somewhere else, that's where the data is being corrupted
One of the bug hunters has written a tiny script, reproducer.sh
, which hammers ZFS volumes and checks to see if files are getting corrupted. One of the problems around this issue is that there's no way to write a program that can report if a file has been corrupted or not by inspecting its contents: it's perfectly normal for some types of file to contain long stretches of zeroes. The only way to be sure is comparing checksums from before and after copy operations – so concerned users who lack backups held on other types of file system cannot readily tell. OpenZFS's built-in scrub
tool for checking the validity of storage pools cannot detect the problem.
- FFmpeg 6.1 drops a Heaviside dose of codec magic
- Revival of Medley/Interlisp: Elegant weapon for a more civilized age sharpened up again
- Rocky Linux and Oracle Unbreakable Linux also hit 9.3
- Will anybody save Linux on Itanium? Absolutely not
A possible fix is open, and the investigation looks like it may have uncovered an underlying, different, and pre-existing bug, which could have been present as long ago as 2013. The bug revolves around ZFS dnodes
, and the logic of how the code checks whether a dnode is "dirty" or not, which governs whether it must flush it: sync any the changes to disk.
It's possible that this single cause was deeply hidden, and so very unlikely to be hit. Unfortunately, the new faster copy functionality meant that what used to be a bug that would only corrupt data once in tens of millions of file copies, suddenly became more likely, especially on machines with lots of processor cores all in simultaneous use.
For Linux users, an additional condition seems to be that the OS has a recent version of the coreutils
package – above version 9.x. This is the tool that provides the functionality of the cp
command. So far, we have also not been able to verify if Ubuntu 23.10 has the block clone feature enabled by default in its recently returned (but still experimental) support for being installed onto ZFS, but at least one comment to the original bug is by someone who has reproduced it on Ubuntu.
It seems very likely that OpenZFS 2.2.1, which simply turns off block-cloning, will quickly be followed by a 2.2.2 release to fix the underlying dnode handling. ®