[HN Gopher] XFS Metadata Corruption on Linux 6.3 Tracked Down to... ___________________________________________________________________ XFS Metadata Corruption on Linux 6.3 Tracked Down to One Missing One-Line Patch Author : LinuxBender Score : 89 points Date : 2023-05-29 13:38 UTC (9 hours ago) (HTM) web link (www.phoronix.com) (TXT) w3m dump (www.phoronix.com) | sp332 wrote: | Is it a little worrying that even with all the attention, no one | seems to know what this line of code actually does? | juujian wrote: | Glad I am not the only one who was thinking that. | _a_a_a_ wrote: | Agreed, the tone of the quotes is scarily relaxed. This should | not be how good software dev is done. Maybe they are being more | rigorous than I give them credit but it doesn't sound good. | pengaru wrote: | The transparency of FOSS conferring exceptionally high | visibility into how the sausage is made often creates this | kind of impression. | | But in reality what's happening here is folks are getting | access to bleeding-edge kernel development snapshots who | choose to run these kernel versions, and are lucky to get | such quick access to patches even before the scope of new | bugs are entirely understood by the developers. Note there's | nothing preventing these affected users from simply running a | prior known-stable kernel version until the bug is better | understood, they're opting in on the chaos. | | It's unfair to assume Dave Chinner et al won't be running the | issue seemingly fixed by this one-line change fully to | ground. | | If you're not interested in playing the role of kernel QA and | interacting with the upstream devs when things break in not | yet understood ways, don't run bleeding edge kernel versions. | LTS and -stable releases are offered for a reason. | jeffbee wrote: | You're not the first person to propose this, but like all | those other people, you are wrong. 6.3 is the latest | "stable" release. It is the version front and center on | kernel.org. There is nothing "bleeding-edge" about it. | pengaru wrote: | Ah I didn't notice 6.3 had already been promoted to | stable, that's unfortunate. | | Relative to a kernel version you'd encounter in something | like rhel or debian stable however, tracking mainline's | "stable" branch is still pretty damn aggressive. | jeffbee wrote: | Giant refactor + no unit tests = data loss. The history of Linux | in a nutshell. | patrakov wrote: | I wouldn't say "no unit tests". There are xfstests, the problem | is that nobody runs them on stable backports to verify their | correctness and completeness. | jeffbee wrote: | xfstests are not unit tests, they are integration stress | tests, and their coverage is quite poor. Nothing in that | suite exercises `xfs_bmap_btalloc_at_eof` particularly. | That's the kind of unit test you want before undertaking a | large refactor. There are several testable postconditions | that would be trivial to test, if this code had an easy way | to add and run unit tests. It has two mutable (in-out) | parameters and a comment that says allocation returns as if | the function was never called. And that is where the bug | lies, according to the patch (which also adds or modifies no | tests). | garganzol wrote: | This is why I always see the code as a math sheet - if every | little expression is perfect then the combined result is | guaranteed to be perfect too. This rule never fails. | malkia wrote: | I wonder if unit testing was ever considered, (or possible?) for | the Linux source code? | speed_spread wrote: | Code that does I/O has a lot of interplay that's hard to | replicate and impossible to cover entirely. The physical world | is nothing but shared mutable state. | hnarn wrote: | FLOSS developers are real heroes, but so are the people willing | to spend time testing newer non-LTS versions of the code and | report their issues. | | I have enough on my plate just dealing with the issues arising | from using stable code, I think it's admirable that people find | the time raising their glance to future releases and helping us | all enjoying a less panic-inducing experience. | talhah wrote: | Bleeding edge arch linux user here, I've barely come across any | major bugs in the last couple of years. Whenever I find | something I do report it and it usually gets fixed really | quickly. | | In fact, many of these bugs were on stable releases too. | awill wrote: | exactly. A RHEL kernel is likely a lot more stable than the | kernel.org LTS kernel. Often bugfixes and security patches | are backported to the LTS kernel, meaning both can be | affected by similar bugs. | georgyo wrote: | In my experience, bleeding edge and stable are about the same | amount of pain. Breakage isn't actually that common, and fixes | come a lot faster. | | And even if you perfer stable, the latest will become stable | eventually. Not trying your workload out on the next releases | has pretty much the same risk profile of just running latest. | | Many problems can only be found by running your particular | workload. | ilyt wrote: | That seems to be mostly bathtub curve for most of the | software for us when it comes to amount of work. | | Running on "latest commit from master" from many projects | (not Linux) will just get you code nobody even tested and so | a lot of bugs fixed quickly. | | Running on "latest stable" (whatever that means for project) | means fixes from time to time when it updates, but in vast | majority of cases not that much work. | | Anything behind that like LTS releases ? Extra work. | | Now any doc you find might be about never release or feature | that changed. "Bugs" might not get fixed if they are not big | enough to backport. | | Upgrade to new LTS version will also get you years of changes | in app that you then have to apply to the system, vs having | to do it "change by change" when keeping up to date. | | If you use configuration management that also often means | multiple different configs to manage at the very least till | previous LTS version gets finally upgraded | drewg123 wrote: | We run bleeding edge FreeBSD at Netflix and are never more than | a few weeks behind the FreeBSD main branch. This has worked out | quite well for us. | | We used to run -stable, and update every few years, like from | FreeBSD 9.x to FreeBSD 10.x. We found that when we did that, we | would often encounter some small subtle bug that was tickled in | our environment, and which was incredibly hard to track down. | That sort of bug was hard to track down because the diff | between branches was enormous, and because there were thousands | of commits to sift through, and because the person responsible | for the bug may have committed it months or years ago, and has | forgotten about it. | | We eventually decided to track the main branch, updating | frequently. This means that while we find more bugs, but they | are far easier to fix because they were introduced more | recently, and there are a lot fewer commits to look through to | find where they came from. | hpb42 wrote: | Is there a position open on your team? This sounds like the | stuff I'm into! ___________________________________________________________________ (page generated 2023-05-29 23:00 UTC)