[HN Gopher] Filesystems can experience at least three different ... ___________________________________________________________________ Filesystems can experience at least three different sorts of errors Author : zdw Score : 83 points Date : 2022-03-12 13:38 UTC (9 hours ago) (HTM) web link (utcc.utoronto.ca) (TXT) w3m dump (utcc.utoronto.ca) | diegocg wrote: | It's surprising the large amount of people I have found who are | incapable of conceiving the notion of what this post calls | "structural error" in modern file systems such as ZFS. They have | this idea that checksums and scrubbing make ZFS invulnerable to | corruption, and thus that the notion of a fsck does not make | sense in ZFS. It probably makes sense for many because it fits | with the fact that ZFS does not have a fsck, but that does not | make that kind of corruption less real. | csdvrx wrote: | ZFS Scrub is like a fsck in a crontab... | diegocg wrote: | No, it is not. ZFS scrub only checks that the checksums is | valid. A scrub will not check in detail for what this post | calls "structural error" (it only will find errors when doing | normal walk of structures, which is not very detailed - doing | a detailed file system checking is so resource intensive that | some fsck implementations have a "low memory" mode in order | to avoid OOM). | | The checksum/mirroring mechanisms cannot fix any structural | error when the filesystem is doing something and finds and | inconsistency. | | ZFS chose to not have a fsck out of pure arrogance, not | because scrub is a proper substitute. ZFS developers believed | that corruption bugs produced by code can be fixed by | providing Bug Free Code (tm). That, and the fact that errors | due to media corruption will be fixed with checksums and | mirroring, made them believe that they could make fsck a | thing of the past. Other modern file systems mimicking ZFS | are developing a fsck, despite having scrub-like | functionality. | | ...as I said, and your reply proves again: | | > It's surprising the large amount of people I have found who | are incapable of conceiving the notion of what this post | calls "structural error" in modern file systems such as ZFS | | People _really_ wants to believe ZFS has some kind of magic. | chungy wrote: | scrub is exactly the "zfs fsck" that people claim they pine | for. It is part of the file system's design philosophy that | every administrative action happens on online pools, which | even includes correcting errors in the file system. Making it | so you don't have to take your system offline is a feature, | not a bug. | siebenmann wrote: | Unfortunately ZFS scrubs are not as complete as fsck on a | regular filesystem. ZFS scrubs only verify that checksums | are intact. They don't verify that filesystem level | metadata is correct (although they do verify ZFS structural | metadata as part of walking everything, which isn't the | same thing). For example, a ZFS scrub will not detect that | a filesystem inode has certain sorts of crazy or invalid | contents, or damaged ACLs. It doesn't even necessarily | verify that the filesystem directory structure is correct | and intact. | | For more on this, see https://utcc.utoronto.ca/~cks/space/b | log/solaris/ZFSScrubLim... | | (The tl;dr is that a fsck on an ordinary filesystem has to | walk the directory tree to find everything. However, ZFS | maintains a separate list of active inodes and a scrub can | just walk over them and check the checksums of all of their | data blocks. It doesn't have to, for example, read a | directory's contents to find further files to scrub.) | rincebrain wrote: | I think the difficulty in these conversations is that | people say they want a "fsck for ZFS", when often they | don't mean they want "periodic sanity checks for ZFS", they | mean they want something like "xfs_repair -d* for ZFS" or | "extundelete for ZFS" - that is, a tool for salvaging | beyond-the-pale mangling such that your pool is no longer | able to be imported. | | * - not a perfect analogy, but I'm hard-pressed to think of | good off the shelf tools for what they're looking for. I | guess reiserfsck's infamously side-effect laden --rebuild- | tree would probably be closest... | | (I am acquainted with zdb -r and import -T; neither helps | you if there's not enough metadata consistent to get enough | of a pool structure in memory to 'import', but one could | still conceivably salvage some data in that case.) | cmurf wrote: | Btrfs does report most of this, tracked per device, and stored in | the device tree. `btrfs decice stats`. The counter can be reset. | It is a cumulative counter so if you have one defect and the | affected file is read three times, the respective counter | increments by three, assuming a persistent error. | | https://www.man7.org/linux/man-pages/man8/btrfs-device.8.htm... | | Man page includes definitions of the 5 kinds of errors tracked. | | The article mentions structural errors. Sounds like this is the | detection of an inconsistency. These aren't counted on btrfs, but | are logged. Anytime the read or write time tree checker finds a | problem, it results in the filesystem going read-only to prevent | (further) confusion from ending up on disk. These are | exceptionally rare, I've never seen one on any of my filesystems; | but have seen it catch things like bitflips due to bad RAM, i.e. | the checksum was computed correctly on already corrupted | (meta)data. | rwmj wrote: | It's a fair point, but couldn't you derive this information from | the kernel log - the different types of errors would look very | different in the log. I think for the last type ("structural" | caused by errors in the code) you'd likely find different | subclasses from the log since there would be various places in | the kernel code raising errors. | amelius wrote: | And how about network filesystems? | wyldfire wrote: | Note that many storage devices themselves track errors and | provide a well-defined interface to track them. SMART. This is | all the more critical for the devices with moving parts. | | IIRC they do distinguish between command/interface errors and | medium errors, which is somewhat analogous to the filesys I/O and | integrity errors discussed. | bediger4000 wrote: | Good article. This article, and it looks like almost the entire | blog it's a part of, are about computer engineering and operation | detached from "the business". So much about computers is written | from a business point of view, where anything that doesn't pay | for itself in the next quarter is considered worthless. This blog | doesn't take "business concerns" into account at all. | jandrewrogers wrote: | On of the more insidious types of errors are phantom writes, | which are thankfully rarely seen in current storage hardware. | This can cause data loss where everything otherwise looks correct | -- I/O, integrity, and structure -- because what you read back | may be a valid but old version of what was written to storage. | | This type of error can be detected with sufficiently robust | integrity checking e.g. some type of durable Merkle tree, but | ensuring that integrity checking can reliably detect phantom | writes has a relatively high performance cost so many storage | systems just assume it will not happen, given the low prevalence | and high cost. FWIW, I think this is the correct tradeoff for | many storage systems, where it is not a highly probable source of | data loss in practice and some types of replications | architectures make it relatively straightforward to detect after | it has occurred even if not immediately. ___________________________________________________________________ (page generated 2022-03-12 23:00 UTC)