[HN Gopher] XFS Metadata Corruption on Linux 6.3 Tracked Down to...
       ___________________________________________________________________
        
       XFS Metadata Corruption on Linux 6.3 Tracked Down to One Missing
       One-Line Patch
        
       Author : LinuxBender
       Score  : 89 points
       Date   : 2023-05-29 13:38 UTC (9 hours ago)
        
 (HTM) web link (www.phoronix.com)
 (TXT) w3m dump (www.phoronix.com)
        
       | sp332 wrote:
       | Is it a little worrying that even with all the attention, no one
       | seems to know what this line of code actually does?
        
         | juujian wrote:
         | Glad I am not the only one who was thinking that.
        
         | _a_a_a_ wrote:
         | Agreed, the tone of the quotes is scarily relaxed. This should
         | not be how good software dev is done. Maybe they are being more
         | rigorous than I give them credit but it doesn't sound good.
        
           | pengaru wrote:
           | The transparency of FOSS conferring exceptionally high
           | visibility into how the sausage is made often creates this
           | kind of impression.
           | 
           | But in reality what's happening here is folks are getting
           | access to bleeding-edge kernel development snapshots who
           | choose to run these kernel versions, and are lucky to get
           | such quick access to patches even before the scope of new
           | bugs are entirely understood by the developers. Note there's
           | nothing preventing these affected users from simply running a
           | prior known-stable kernel version until the bug is better
           | understood, they're opting in on the chaos.
           | 
           | It's unfair to assume Dave Chinner et al won't be running the
           | issue seemingly fixed by this one-line change fully to
           | ground.
           | 
           | If you're not interested in playing the role of kernel QA and
           | interacting with the upstream devs when things break in not
           | yet understood ways, don't run bleeding edge kernel versions.
           | LTS and -stable releases are offered for a reason.
        
             | jeffbee wrote:
             | You're not the first person to propose this, but like all
             | those other people, you are wrong. 6.3 is the latest
             | "stable" release. It is the version front and center on
             | kernel.org. There is nothing "bleeding-edge" about it.
        
               | pengaru wrote:
               | Ah I didn't notice 6.3 had already been promoted to
               | stable, that's unfortunate.
               | 
               | Relative to a kernel version you'd encounter in something
               | like rhel or debian stable however, tracking mainline's
               | "stable" branch is still pretty damn aggressive.
        
       | jeffbee wrote:
       | Giant refactor + no unit tests = data loss. The history of Linux
       | in a nutshell.
        
         | patrakov wrote:
         | I wouldn't say "no unit tests". There are xfstests, the problem
         | is that nobody runs them on stable backports to verify their
         | correctness and completeness.
        
           | jeffbee wrote:
           | xfstests are not unit tests, they are integration stress
           | tests, and their coverage is quite poor. Nothing in that
           | suite exercises `xfs_bmap_btalloc_at_eof` particularly.
           | That's the kind of unit test you want before undertaking a
           | large refactor. There are several testable postconditions
           | that would be trivial to test, if this code had an easy way
           | to add and run unit tests. It has two mutable (in-out)
           | parameters and a comment that says allocation returns as if
           | the function was never called. And that is where the bug
           | lies, according to the patch (which also adds or modifies no
           | tests).
        
       | garganzol wrote:
       | This is why I always see the code as a math sheet - if every
       | little expression is perfect then the combined result is
       | guaranteed to be perfect too. This rule never fails.
        
       | malkia wrote:
       | I wonder if unit testing was ever considered, (or possible?) for
       | the Linux source code?
        
         | speed_spread wrote:
         | Code that does I/O has a lot of interplay that's hard to
         | replicate and impossible to cover entirely. The physical world
         | is nothing but shared mutable state.
        
       | hnarn wrote:
       | FLOSS developers are real heroes, but so are the people willing
       | to spend time testing newer non-LTS versions of the code and
       | report their issues.
       | 
       | I have enough on my plate just dealing with the issues arising
       | from using stable code, I think it's admirable that people find
       | the time raising their glance to future releases and helping us
       | all enjoying a less panic-inducing experience.
        
         | talhah wrote:
         | Bleeding edge arch linux user here, I've barely come across any
         | major bugs in the last couple of years. Whenever I find
         | something I do report it and it usually gets fixed really
         | quickly.
         | 
         | In fact, many of these bugs were on stable releases too.
        
           | awill wrote:
           | exactly. A RHEL kernel is likely a lot more stable than the
           | kernel.org LTS kernel. Often bugfixes and security patches
           | are backported to the LTS kernel, meaning both can be
           | affected by similar bugs.
        
         | georgyo wrote:
         | In my experience, bleeding edge and stable are about the same
         | amount of pain. Breakage isn't actually that common, and fixes
         | come a lot faster.
         | 
         | And even if you perfer stable, the latest will become stable
         | eventually. Not trying your workload out on the next releases
         | has pretty much the same risk profile of just running latest.
         | 
         | Many problems can only be found by running your particular
         | workload.
        
           | ilyt wrote:
           | That seems to be mostly bathtub curve for most of the
           | software for us when it comes to amount of work.
           | 
           | Running on "latest commit from master" from many projects
           | (not Linux) will just get you code nobody even tested and so
           | a lot of bugs fixed quickly.
           | 
           | Running on "latest stable" (whatever that means for project)
           | means fixes from time to time when it updates, but in vast
           | majority of cases not that much work.
           | 
           | Anything behind that like LTS releases ? Extra work.
           | 
           | Now any doc you find might be about never release or feature
           | that changed. "Bugs" might not get fixed if they are not big
           | enough to backport.
           | 
           | Upgrade to new LTS version will also get you years of changes
           | in app that you then have to apply to the system, vs having
           | to do it "change by change" when keeping up to date.
           | 
           | If you use configuration management that also often means
           | multiple different configs to manage at the very least till
           | previous LTS version gets finally upgraded
        
         | drewg123 wrote:
         | We run bleeding edge FreeBSD at Netflix and are never more than
         | a few weeks behind the FreeBSD main branch. This has worked out
         | quite well for us.
         | 
         | We used to run -stable, and update every few years, like from
         | FreeBSD 9.x to FreeBSD 10.x. We found that when we did that, we
         | would often encounter some small subtle bug that was tickled in
         | our environment, and which was incredibly hard to track down.
         | That sort of bug was hard to track down because the diff
         | between branches was enormous, and because there were thousands
         | of commits to sift through, and because the person responsible
         | for the bug may have committed it months or years ago, and has
         | forgotten about it.
         | 
         | We eventually decided to track the main branch, updating
         | frequently. This means that while we find more bugs, but they
         | are far easier to fix because they were introduced more
         | recently, and there are a lot fewer commits to look through to
         | find where they came from.
        
           | hpb42 wrote:
           | Is there a position open on your team? This sounds like the
           | stuff I'm into!
        
       ___________________________________________________________________
       (page generated 2023-05-29 23:00 UTC)