[HN Gopher] Filesystems can experience at least three different ...
       ___________________________________________________________________
        
       Filesystems can experience at least three different sorts of errors
        
       Author : zdw
       Score  : 83 points
       Date   : 2022-03-12 13:38 UTC (9 hours ago)
        
 (HTM) web link (utcc.utoronto.ca)
 (TXT) w3m dump (utcc.utoronto.ca)
        
       | diegocg wrote:
       | It's surprising the large amount of people I have found who are
       | incapable of conceiving the notion of what this post calls
       | "structural error" in modern file systems such as ZFS. They have
       | this idea that checksums and scrubbing make ZFS invulnerable to
       | corruption, and thus that the notion of a fsck does not make
       | sense in ZFS. It probably makes sense for many because it fits
       | with the fact that ZFS does not have a fsck, but that does not
       | make that kind of corruption less real.
        
         | csdvrx wrote:
         | ZFS Scrub is like a fsck in a crontab...
        
           | diegocg wrote:
           | No, it is not. ZFS scrub only checks that the checksums is
           | valid. A scrub will not check in detail for what this post
           | calls "structural error" (it only will find errors when doing
           | normal walk of structures, which is not very detailed - doing
           | a detailed file system checking is so resource intensive that
           | some fsck implementations have a "low memory" mode in order
           | to avoid OOM).
           | 
           | The checksum/mirroring mechanisms cannot fix any structural
           | error when the filesystem is doing something and finds and
           | inconsistency.
           | 
           | ZFS chose to not have a fsck out of pure arrogance, not
           | because scrub is a proper substitute. ZFS developers believed
           | that corruption bugs produced by code can be fixed by
           | providing Bug Free Code (tm). That, and the fact that errors
           | due to media corruption will be fixed with checksums and
           | mirroring, made them believe that they could make fsck a
           | thing of the past. Other modern file systems mimicking ZFS
           | are developing a fsck, despite having scrub-like
           | functionality.
           | 
           | ...as I said, and your reply proves again:
           | 
           | > It's surprising the large amount of people I have found who
           | are incapable of conceiving the notion of what this post
           | calls "structural error" in modern file systems such as ZFS
           | 
           | People _really_ wants to believe ZFS has some kind of magic.
        
           | chungy wrote:
           | scrub is exactly the "zfs fsck" that people claim they pine
           | for. It is part of the file system's design philosophy that
           | every administrative action happens on online pools, which
           | even includes correcting errors in the file system. Making it
           | so you don't have to take your system offline is a feature,
           | not a bug.
        
             | siebenmann wrote:
             | Unfortunately ZFS scrubs are not as complete as fsck on a
             | regular filesystem. ZFS scrubs only verify that checksums
             | are intact. They don't verify that filesystem level
             | metadata is correct (although they do verify ZFS structural
             | metadata as part of walking everything, which isn't the
             | same thing). For example, a ZFS scrub will not detect that
             | a filesystem inode has certain sorts of crazy or invalid
             | contents, or damaged ACLs. It doesn't even necessarily
             | verify that the filesystem directory structure is correct
             | and intact.
             | 
             | For more on this, see https://utcc.utoronto.ca/~cks/space/b
             | log/solaris/ZFSScrubLim...
             | 
             | (The tl;dr is that a fsck on an ordinary filesystem has to
             | walk the directory tree to find everything. However, ZFS
             | maintains a separate list of active inodes and a scrub can
             | just walk over them and check the checksums of all of their
             | data blocks. It doesn't have to, for example, read a
             | directory's contents to find further files to scrub.)
        
             | rincebrain wrote:
             | I think the difficulty in these conversations is that
             | people say they want a "fsck for ZFS", when often they
             | don't mean they want "periodic sanity checks for ZFS", they
             | mean they want something like "xfs_repair -d* for ZFS" or
             | "extundelete for ZFS" - that is, a tool for salvaging
             | beyond-the-pale mangling such that your pool is no longer
             | able to be imported.
             | 
             | * - not a perfect analogy, but I'm hard-pressed to think of
             | good off the shelf tools for what they're looking for. I
             | guess reiserfsck's infamously side-effect laden --rebuild-
             | tree would probably be closest...
             | 
             | (I am acquainted with zdb -r and import -T; neither helps
             | you if there's not enough metadata consistent to get enough
             | of a pool structure in memory to 'import', but one could
             | still conceivably salvage some data in that case.)
        
       | cmurf wrote:
       | Btrfs does report most of this, tracked per device, and stored in
       | the device tree. `btrfs decice stats`. The counter can be reset.
       | It is a cumulative counter so if you have one defect and the
       | affected file is read three times, the respective counter
       | increments by three, assuming a persistent error.
       | 
       | https://www.man7.org/linux/man-pages/man8/btrfs-device.8.htm...
       | 
       | Man page includes definitions of the 5 kinds of errors tracked.
       | 
       | The article mentions structural errors. Sounds like this is the
       | detection of an inconsistency. These aren't counted on btrfs, but
       | are logged. Anytime the read or write time tree checker finds a
       | problem, it results in the filesystem going read-only to prevent
       | (further) confusion from ending up on disk. These are
       | exceptionally rare, I've never seen one on any of my filesystems;
       | but have seen it catch things like bitflips due to bad RAM, i.e.
       | the checksum was computed correctly on already corrupted
       | (meta)data.
        
       | rwmj wrote:
       | It's a fair point, but couldn't you derive this information from
       | the kernel log - the different types of errors would look very
       | different in the log. I think for the last type ("structural"
       | caused by errors in the code) you'd likely find different
       | subclasses from the log since there would be various places in
       | the kernel code raising errors.
        
       | amelius wrote:
       | And how about network filesystems?
        
       | wyldfire wrote:
       | Note that many storage devices themselves track errors and
       | provide a well-defined interface to track them. SMART. This is
       | all the more critical for the devices with moving parts.
       | 
       | IIRC they do distinguish between command/interface errors and
       | medium errors, which is somewhat analogous to the filesys I/O and
       | integrity errors discussed.
        
       | bediger4000 wrote:
       | Good article. This article, and it looks like almost the entire
       | blog it's a part of, are about computer engineering and operation
       | detached from "the business". So much about computers is written
       | from a business point of view, where anything that doesn't pay
       | for itself in the next quarter is considered worthless. This blog
       | doesn't take "business concerns" into account at all.
        
       | jandrewrogers wrote:
       | On of the more insidious types of errors are phantom writes,
       | which are thankfully rarely seen in current storage hardware.
       | This can cause data loss where everything otherwise looks correct
       | -- I/O, integrity, and structure -- because what you read back
       | may be a valid but old version of what was written to storage.
       | 
       | This type of error can be detected with sufficiently robust
       | integrity checking e.g. some type of durable Merkle tree, but
       | ensuring that integrity checking can reliably detect phantom
       | writes has a relatively high performance cost so many storage
       | systems just assume it will not happen, given the low prevalence
       | and high cost. FWIW, I think this is the correct tradeoff for
       | many storage systems, where it is not a highly probable source of
       | data loss in practice and some types of replications
       | architectures make it relatively straightforward to detect after
       | it has occurred even if not immediately.
        
       ___________________________________________________________________
       (page generated 2022-03-12 23:00 UTC)