[Tfug] Should ZFS have an fsck tool? I thought this was interesting

Eric Gearhart eric at nixwizard.net
Tue Nov 3 18:02:33 MST 2009


On Tue, Nov 3, 2009 at 5:50 PM, Zack Williams <zdwzdw at gmail.com> wrote:
>
> http://blogs.sun.com/bill/entry/zfs_and_the_all_singing
>
> http://hub.opensolaris.org/bin/view/Community+Group+zfs/zfstestsuite
>
> But obviously none of these provide coverage like a fsck tool would.
>
> - Zack

I've read that ZFS makes some assumptions as well, which can be a
problem. For example, apparently ZFS assumes that when it asks a disk
controller to sync its cache, the controller absolutely complies and
isn't flaky about it.

Apparently someone corrupted their pool because they set up
OpenSolaris + ZFS in a Virtualbox guest, and the VB virtual disk
system didn't honor the cache sync commands...

"This means that if you design something for PC hardware, you need to
at least acknowledge that crappy hardware does exist and is going to
make your software abstractions leaky. A good design should not
exclude worst-case scenarios. For ZFS, this means they need to
acknowledge that disks are going to break things and corrupt the data
in ways that the ZFS design isn't going to be able to avoid. And when
that happens, your users will want to have a good fsck tool to fix the
mess or recover the data.

It's somewhat contradictory that the ZFS developers worked really hard
to design those anti-corruption mechanisms, but they left the extreme
cases of data corruption where a fsck is necessary uncovered. There's
an interesting case of ZFS corruption on a virtualized Solaris guest:
VirtualBox didn't honor the sync cache commands of the virtualized
drive. In a virtualized world, ZFS doesn't only have to cope with
hardware, but also with the software and the file system running in
the underlying host. In those scenarios, ZFS can only be as safe as
the host filesystem is, and that means that, again, users will face
corruption cases that require a fsck."

--
Eric




More information about the tfug mailing list