'Disappearing' data under ZFS on Linux sparks small swift tweak
Fix took three days and a rescue for stranded data is on the way
Updated Maintainers of ZFS on Linux have hustled out a new version after the previous release caused created the impression of data loss.
ZFS on Linux 0.7.7 only landed on March 21st, but as this GitHub thread titled “Unlistable and disappearing files”, users experienced “Data loss when copying a directory with large-ish number of files.”
The bug meant that attempts copies produced errors that claimed the filesystem was full and resulted in files just not arriving at their intended destinations.
Users verified the problem under a few Linuxes and quickly debated whether to roll back or wait for relief.
Oracle ZFS man calls for Big Red to let filesystem upstream into LinuxREAD MORE
While they chatted the culprit was found in the form of this commit. It’s since been excised and ZFS on Linux 0.7.8 released with its removal the only change.
The new version was created with impressive speed: the thread reporting the bug was started on April 7th 2018 and the fix landed three days later. So even though three reviewers signed off on the cruddy commit, the speedy response may mean it’s possible to consider this a triumph of sorts for open source. ®
Updated to add
ZFS on Linux developers have been in touch to tell us that "it turns out that no data is actually lost."
But it looks like data is lost: as explained by developer Richard Yao: "The regression caused us to lose hard links in some directories where new hard links were being made. Not every system that upgraded made enough new hard links in directories to actually trigger an issue."
The good news is that "we can get all of the files back." The bad news is they won't have "names and directory paths."
But the project's developers are working on "a tool integrated into the driver that will let people repair affected systems. The missing files would go into a lost+found directory."
"There are a couple caveats though," Yao said. "Any snapshots containing damaged directories will need to be destroyed to restore the pool to pristine condition. If anyone cloned those snapshots, the clones will need to be destroyed too (but you can copy data off).
"The tool will provide a list of things to destroy that require manual destruction by the system administrator. It is possible to optionally just leave things as is if the tool suggests destroying things, and nothing bad should happen aside from having an annoying message about the bad snapshots."