[HN Gopher] Uncovering a 24-year-old bug in the Linux Kernel (2021) ___________________________________________________________________ Uncovering a 24-year-old bug in the Linux Kernel (2021) Author : endorphine Score : 399 points Date : 2022-10-15 13:08 UTC (9 hours ago) (HTM) web link (engineering.skroutz.gr) (TXT) w3m dump (engineering.skroutz.gr) | sponaugle wrote: | This was a cool example of a class of bugs that are both hard to | find with no active example, and hard to prevent in complex | systems. The optimization that was added many years ago for | performance didn't update something that had a use case that was | incompatible with not being updated in a very small number of | circumstances. | | It is an interesting thought experiment to consider what kind of | tool or automated detection could have found this. Some type of | dependency linking between variables might have shed some light, | but I'm not sure that would have really highlighted this kind of | issue. | | Great description of both the bug and the path to the solution! | gizmo686 wrote: | Probably the only way to prevent this type of issue in an | automated fashion is to change your perspective from proving | that a bug exists, to proving that it doesn't exist. That is, | you define some properties that your program must satisfy to be | considered correct. Then, when you make optimizations such as | bulk receiver fast-path, you must prove (to the static analysis | tool) that your optimizations to not break any of the required | properties. You also need to properly specify the required | properties in a way that they are actually useful for what | people want the code to do. | | All of this is incredibly difficult, and an open area of | research. Probably the biggest example of this approach is the | Sel4 microkernel. To put the difficulty in perspective, I | checkout out some of the sel4 repositories did a quick line | count. | | The repository for the microkernel itself [0] has 276,541 | | The testsuite [1] has 26,397 | | The formal verification repo [2] has 1,583,410, over 5 times as | much as the source code. | | That is not to say that formal verification takes 5x the work. | You also have to write your source-code in such a way that it | is ammenable to being formally verified, which makes it more | difficult to write, and limits what you can reasonably do. | | Having said that, this approach can be done in a less severe | way. For instance, type systems are essentially a simple form | of formal verification. There are entire classes of bugs that | are simply impossible in a properly typed programs; and more | advanced type systems can eliminate a larger class of bugs. | Although, to get the full benefit, you still need to go out of | your way to encode some invariant into the type system. You | also find that mainstream languages that try to go in this | direction always contain some sort of escape hatch to let the | programmer assert a portion of code is correct without needing | to convince the verifier. | | [0] https://github.com/seL4/seL4 | | [1] https://github.com/seL4/sel4test | | [2] https://github.com/seL4/l4v | xani_ wrote: | > That is not to say that formal verification takes 5x the | work. You also have to write your source-code in such a way | that it is ammenable to being formally verified, which makes | it more difficult to write, and limits what you can | reasonably do. | | Also hire significantly more skilled people. Write formal | verification on job requirement and the pool of candidates | will shrink massively. | | Explains why it is so rare really. "Spend 5-10x on developers | to have some bugs not happen" is not a great sell. | simtel20 wrote: | It's a great question! Thinking back... | | At the time this bug was introduced it would probably have been | cost prohibitive to create a test case. We were proud of | 100mbit networks, had flaky nics the vendors didn't help | maintain much of the time (and which were often broken in | hardware) and the filesystem max file size was something like | 2tb, and most drives wee're in the handful of gbs. Conceiving | of testing for something like this would have been expensive. | And none of the big system vendors took Linux seriously then. | | Though perhaps flooding zeros across a TCP socket could work, I | really think that a kernel hacker would have found a lot of | other hardware and driver issues before ever being able to | trigger this. | aposm wrote: | Awesome breakdown - as someone who is fairly familiar with TCP | theoretically but not with the details of the TCP implementation | in the Linux kernel, this was just the right balance of detail. | Great technical writing IMO! | myself248 wrote: | Okay so I got to the wrap-up at the end, about "why did nobody | else find this", the author sets up some logical dominoes but | doesn't knock them down. Allow me to try: | | Earlier in the article, the author mentions that they recently | upgraded some network hardware, and the problem seemed to become | more frequent after that. | | Packet loss or other network issues would force the stack to fall | out of fast-path and update the counter, avoiding the bug. | | Running over ssh would avoid the bug. The only time you'd run | rsync not over ssh would be within your own network. | | So it sounds like (this is my conjecture here) this would only | appear to someone running rsync internally, over a high- | performance network with no packet loss, and upgrading the | switches might've finally gotten the network good enough to | expose the bug? | cryptonector wrote: | One might expect this to have been hit by HPN (high performance | networking) users, but perhaps if they are storage I/O bound | rather than CPU or network I/O bound, then probably not. | ardel95 wrote: | That sounds plausible. But also, most software (browsers, web | service SDKs, RPC frameworks) treat TCP connections as fallible | by setting read/write timeouts and aggressively reopening | broken connections. So, I'm totally not surprised this issue | went unnoticed for this many years. | verisimilitudes wrote: | _How is it possible for a TCP bug that leads to stuck connections | to go unnoticed for 24 years?_ | | It's because the fools responsible never rewrite their code, use | a broken language, and don't even try to prove half of the broken | garbage they write. Then, when it turns out to have been broken | for decades, they chuckle and shove another finger into another | crack, never understanding how they misuse computers. | layer8 wrote: | This is a good case for formal verification. | mdaniel wrote: | I struggle because I want to upvote these comments, because | that's the world I want to live in. But the opposite side of | that coin is who is going to author the incredibly arcane | _specification_ of TCP against which any such implementation is | formally verified? | | Maybe TCP stacks are one of the few cases where that make | sense, but I'd suspect if it was "worth the cost" it would have | already been done | layer8 wrote: | There are certain guarantees you want such a formal | specification to give, like for example not getting | permanently stuck in some state as with the present bug. You | can formalize the proofs for those guarantees and have their | correctness machine-checked. Something like TLA+/PlusCal is | likely suitable for that. | | A formal specification is less ambiguous than a prose | specification. Formalizing the TCP specification will, if | anything, expose aspects where the specification is unclear, | or corner cases where the specification actually leads to | unwanted behavior and doesn't provide the desired guarantees. | | So, while you can't prove that the formal specification | matches the prose specification a 100%, you _can_ prove that | it provides all the guarantees the original prose | specification was aiming for (once you've formalized those | desired guarantees), which is something you can't do for the | prose specification. | sneak wrote: | > _These snapshots are updated daily through a pipeline that | involves taking an LVM snapshot of production data, anonymizing | the dataset by stripping all personal data, and transferring it | via rsync to the development database servers._ | | I don't know what sort of data these people process, but most | datasets about people are not anonymized by simply removing the | PII. | abraae wrote: | Yes they are. Any information that can be used to identify a | person by definition is PII. | | Once all the PII is removed, by definition the dataset is | anonymized. | capitol_ wrote: | This is obviously true, as you are stating an axiom. But what | I think the grand parent is trying to say is that databases | with PII can often be deanonymized by looking at the other | data that isn't obviously PII. | | Take for example a database over all mobile phone positions | over time, this can be 'anonymized' by removing all | connections from the phones to information on who owns the | phones. | | But it can still be trivially deanonymized by analyzing where | the phones are at night and during office hours, not very | many persons work in the same building and sleep in the same | house. | omginternets wrote: | Which kernel version has this patch? | MatthiasPortzel wrote: | I remember when this was originally posted, but I voted it up | again because I think it's such an excellent story, and excellent | programming. We need more people and companies like this, who are | willing to go beyond "oh it fails randomly sometimes" and track | down the underlying issues. | | => https://news.ycombinator.com/item?id=26102241 Previous | Discussion (497 points - 41 comments) | [deleted] | c0mptonFP wrote: | > We need more people and companies like this, who are willing | to go beyond "oh it fails randomly sometimes" and track down | the underlying issues. | | I absolutely disagree. Most capable engineers I know have this | urge to go down rabbit holes and fix any issue, this is nothing | special. | | Everyone wants to be the hero that found a bug deep in the | stack, make a glorious pull request, and be celebrated in the | community. | | I much more value people who have enough self-control to pick | meaningful battles, and follow the right priorities. | jackmott wrote: | black_puppydog wrote: | Eh, right, many bugs we have don't really matter. | | Oh what is that you say, security vulnerabilities are also | just bugs that get exploited? Oh well... | [deleted] | [deleted] | rrss wrote: | In my experience, the "oh it fails randomly sometimes" bugs | are often in some random dull legacy infrastructure component | where there is zero attention or celebration for fixing them, | and so engineers tend to tolerate losing a bit of time once a | week due to them for years rather than someone spending half | a day to fix it for everyone. | robertlagrant wrote: | Exactly. I could fix any complex bug. I just choose not to. | KolmogorovComp wrote: | At the company level, it is indeed more expensive to fix | upstream rather thank work around it, but on a macro scale it | is much more beneficial. | | In my opinion fixing upstream whenever possible even if not | the best short-term solution should be considered the price | to pay for using OSS. | CSSer wrote: | GP's comment is also odd because the article notes they took | your approach. They documented the problem when they first | noticed it happening infrequently and moved on to higher | priorities. When it started happening every single day it | became mission critical to investigate. | HenrikB wrote: | I think this was well prioritized; they struggled with the | issue at times, found a temporary workaround, but when that | workaround stod being efficient and the bug hit them | everyday, they decided to track down the source. Then they | reported upstream, it was reproduced, and someone patched it, | and rolled out new, fixed kernels. | | That is a perfect example of how things works and should | work. They contributed to the community. I think it was a | great prioritization. | | I'm certain there were lots of other people hitting this bug | and killing processes or rebooting to get around it. The | troubleshooting and reporting done here, silently saved a lot | of of other people a lot of efforts - now and in the future. | I don't think they were after it to be heroes; they just | shared their story, which I'm sure will encourage others to | maybe do the same one day. | freedomben wrote: | This opinion is a popular one these days (particularly since | it complements the demands of business nicely by maximizing | personal/company profit), but it is a big part of the reason | why the majority of software these days is so unreliable and | buggy. It results in hacks on top of hacks to paper over | problems in the lower levels of the abstraction tower that is | modern software, and it results in tons of "WTF" bugs that | are just accepted and never fixed. | trasz wrote: | This _is_ the meaningful stuff. Engineers might have the | urge, but most don't have the opportunity, because they need | to focus on the currently fashionable framework. | | A good rule of thumb regarding meaningful battles is to | ignore everything promoted by companies like Google or | Facebook - everything they do is either going to be abandoned | in five years, or makes sense only in the context of solving | problems nobody else have. | stjohnswarts wrote: | seems like something an engineer might fix on their own | time if they were feeling feisty about the matter. | Something tells me if it went on for 20 years it was an | edge case that only very rarely came up and was mostly a | non-issue. | trasz wrote: | I suspect it was definitely an issue, it's just that most | companies like Google don't care about reliability, only | availability, and it might just not show up in their | stats. | digiou wrote: | For the record, this is one of the top Greek employers. This is | Greece's Amazon essentially. The C-team are intact since day-1 | and AFAIK still writing (some) code. | | It is not unheard of to have 4-day weeks and developer-first | mindset at that place. | charcoalhobo wrote: | Love deep dive troubleshooting like this. I haven't heard of | systemtap before; looks nice. When I had to troubleshoot a kernel | bug [1] I used perf [2] probes which are also really nice for | this kind of debugging. | | [1] https://www.spinics.net/lists/xdp-newbies/msg01231.html | | [2] https://www.brendangregg.com/perf.html | thow232329 wrote: | "This setup has worked rather well for the better part of a | decade and has managed to scale from 15 developers to 150" | | LOL | dang wrote: | Could you please stop creating accounts for every few comments | you post? We ban accounts that do that. This is in the site | guidelines: https://news.ycombinator.com/newsguidelines.html. | | You needn't use your real name, of course, but for HN to be a | community, users need some identity for other users to relate | to. Otherwise we may as well have no usernames and no | community, and that would be a different kind of forum. | https://hn.algolia.com/?sort=byDate&dateRange=all&type=comme... | | Also, could you please stop posting unsubstantive and/or snarky | and/or flamebait comments? It's not what this site is for, and | it destroys what it is for. If you wouldn't mind reviewing | https://news.ycombinator.com/newsguidelines.html and taking the | intended spirit of the site more to heart, we'd be grateful. | halukakin wrote: | Could someone provide link(s) on how regular snapshots of | databases can be taken like this? (Googling didn't help much, | maybe I'm googling for the wrong keywords.) For me, backing up | the database is a few-hour-long process. Restoring it for a | developer again is a few hours process. I read about snapshots | before but haven't realized they could be this effective. | rrdharan wrote: | It's the lack of clarity on how they manage access control for | what should be regulated data that surprises me, more than the | technology achievement. | nick__m wrote: | for mariadb : | | 0) make sure the the database data volume is on lvm or zfs | | in a sql prompt: 1) BACKUP STAGE START; BACKUP | STAGE BLOCK_COMMIT; 2) \! the shell command to take the | snapshot 3) BACKUP STAGE END; | | you can now mount your snapshot, copy it offsite and delete it. | The restore procedure is left as an exercise! | halukakin wrote: | Very helpful. Thank you! | ClumsyPilot wrote: | can't most COW dilesystems like BTRFS or ZFS take a snapshot at | a point in time instantly? | abdulocracy wrote: | LVM does the same but at the block level. | | https://wiki.archlinux.org/title/Create_root_filesystem_snap. | .. | mauvehaus wrote: | Because it isn't a backup. They put the database into a | quiescent state on disk, take a file system snapshot, let the | dbms resume working, and send the snapshot data via rsync. | | This requires the cooperation of the dbms software to get the | on-disk data quiesced. Then your snapshot has to go fast enough | that the dbms doesn't end up with too many spinning plates | before you let it start writing normally. | halukakin wrote: | Got it. Thank you! | justin_oaks wrote: | I love when you're using open source software and can find the | bug yourself, even if it's deep down the stack. | | Imagine if this bug were somewhere in closed source software. | You'd have to reach out to the software's customer support team. | Every time I reach out to customer support I expect to have an | unpleasant experience. It is rarely otherwise. | [deleted] | xani_ wrote: | Kinda why I'm not a fan of cloud, same black box problem. | perth wrote: | And even if you did reach out to customer support, it would | rarely ever get dev attention unless most people have the | issue. Even in that case, it sometimes still gets a fat | wontfix, like the famous OneDrive file corruption bug. | themoonisachees wrote: | Raising this bug in windows (how? Microsoft sells support, | barely, but you can't talk to the ipv4 stack dev anyway) woul | get you laughed out of the chat room because it can't posibly | be the ip stack's fault. | didgetmaster wrote: | As someone who thrives on tracking down rare but annoying bugs in | a debugger, I love stories like this. It is not just bugs that | cause real failures which can be headaches; but also bugs that | just slow things down unexpectantly. They can sometimes go | undetected for decades like this one. | | I wrote an article this past year that talks about silent bugs | that slowly eat resources and collectively can be very expensive | in terms of wasted time and energy: | https://didgets.substack.com/p/finding-and-fixing-a-billion-... | xani_ wrote: | > As someone who thrives on tracking down rare but annoying | bugs in a debugger, | | As someone that is cursed to inevitably find some obscure bug | the second I start using some piece of software I'm happy I'm | not the only one | | > I wrote an article this past year that talks about silent | bugs that slowly eat resources and collectively can be very | expensive in terms of wasted time and energy | | "Using JS for backend is ecoterrorism" lmao | myself248 wrote: | Okay but where's the bug story? Did I miss the story? | didgetmaster wrote: | I wrote the article right after I fixed a huge inefficiency | problem in a function within my own project. I neglected to | give the specifics in the article, but here they are since | you asked. | | My Didgets tool lets you create pivot tables against | relational database tables, even very large ones. For the | pivot values, you can choose to just count the occurrence of | each value or if it is a number type you can add them up. You | can also add up the values in a separate number column. Here | is a quick demo video: | https://www.youtube.com/watch?v=2ScBd-71OLQ | | When adding up numbers in a separate column, I had just a few | lines of unnecessary code that ended up being called | exponentially. For smaller tables it was barely noticeable, | but for tables with 30 million+ rows it really bogged down. | | A simple fix to the affected lines caused a certain test | against a large table to go from over 10 minutes down to | under 20 seconds. The effects of just a few lines of code | when applied to a big enough data set can really impact | performance. It is the old Einstein equation E=mc2 in effect | which is discussed here: | https://didgets.substack.com/p/musings-from-an-old- | programme... | shurane wrote: | I guess there is a lost art of writing for optimal | code/memory/execution time, especially as our resources | increase. | | I think the idea here is to write code quickly that's | inefficient, and re-write it to be efficient if the | performance is required down the line. For companies where | there's bigger fish to fry, i.e. customer acquisition, it's | more useful to pump out more features (even at the expense | of bugs) because that draws customers. | | But in places where performance is important, you do see | developers squeeze out more cycles/memory. I.e. kernel/OS | development, database servers, video games. It's just that | most developers aren't in those areas of specialty anymore. | | Btw, have you heard of https://handmade.network/ and | https://en.wikipedia.org/wiki/Demoscene ? Wondering what | your thoughts are in those areas. There are probably more | communities like the ones I mentioned, where developers are | interested in writing the kind of code that you are talking | about. | pvillano wrote: | > but also bugs that just slow things down unexpectantly. They | can sometimes go undetected for decades like this one. | | Reminds me of the GTA Online quadratic time JSON parsing bug | itismetheidiot wrote: | how odd to see a write up from skroutz.gr blog being at the first | page of HN... | dang wrote: | Also these! | | _Speeding Up Our Build Pipelines_ - | https://news.ycombinator.com/item?id=20775297 - Aug 2019 (24 | comments) | | _The infrastructure behind one of the most popular sites in | Greece_ - https://news.ycombinator.com/item?id=9982361 - July | 2015 (5 comments) | | _Working with the ELK stack_ - | https://news.ycombinator.com/item?id=9008119 - Feb 2015 (35 | comments) | NKosmatos wrote: | Yeap, it's a bit strange, but the post was very well written, | with a nice breakdown and easily understandable steps that can | be followed by most software engineers. | | There have been some sporadic posts from Skroutz in the past, | but nothing that gained so much attention. | | For those that don't know it, Skroutz is the biggest Greek | online price aggregator/e-commerce market/price comparison | site. | [deleted] ___________________________________________________________________ (page generated 2022-10-15 23:00 UTC)