[HN Gopher] Finding the "second bug" in glibc's condition variable ___________________________________________________________________ Finding the "second bug" in glibc's condition variable Author : ingve Score : 246 points Date : 2022-09-18 12:55 UTC (10 hours ago) (HTM) web link (probablydance.com) (TXT) w3m dump (probablydance.com) | jart wrote: | If you need really good synchronization primitives while Glibc is | fixing things then I've had a lot of success with *NSYNC | https://github.com/google/nsync which is what Cosmopolitan Libc | uses to implement pthread_cond_t. | jeffbee wrote: | Its design is similar to absl::Mutex, which is available to c++ | users. They may also have been written initially by Mike | Burrows? | [deleted] | userbinator wrote: | Concurrency is bug-prone. Complex code is bug-prone. glibc's CV | implementation combines both. | | In my experience, both writing and teaching concurrency, there's | a very strong sentiment of "I'll just use another CV/lock/etc." | whenever people encounter a bug and try to fix it, which seldom | fully solves the problem but usually makes it even more subtle | and compounds the number of resulting states. | | One of my tricks to increase chances for race conditions and | deadlocks when testing is to use a modified sampling profiler to | randomly delay and pause threads, but with such a long sequence | of steps as this bug requires, that would've been unlikely to | help. Yet Murphy's Law says that it will happen in practice... | sbf501 wrote: | What bothers me is that the OCaml code looks so harmless: | static void st_masterlock_acquire(st_masterlock * m) { | pthread_mutex_lock(&m->lock); while (m->busy) { | m->waiters ++; custom_condvar_wait(&m->is_free, | &m->lock); m->waiters --; } | m->busy = 1; pthread_mutex_unlock(&m->lock); } | | How is this bug not more common with such a simple invocation? | asveikau wrote: | Probably you can get away with that simple invocation "most of | the time", but what matters is the higher level code that calls | it triggers the bug where other access patterns would not. | | Also, I do wonder why they roll their own mutex using condition | variables. Most people probably would use a mutex and thus | wouldn't hit the condition variable at all. Condition variables | are more niche than mutex. | mbac32768 wrote: | According to Xavier Leroy (OCaml language dev): | | > It is possible to use a plain POSIX mutex as master lock, | but fairness is awful. | | https://discuss.ocaml.org/t/is-there-a-known-recent-linux- | lo... | jwatte wrote: | That kinda depends on what custom_condvar_wait() does, right? | grogers wrote: | Just 6 days ago they merged a custom condvar implementation | based on futex directly to workaround the glibc bug. Before | that it was just pthread_cond_wait | ncmncm wrote: | Someone should be very embarrassed about how they have handled | this whole process. | | Surely code to operate condition variables correctly is in the | public domain, that does not involve mitigations or stolen | signals. | pengaru wrote: | In 2022 a POSIX OS with a broken pthread_cond_t implementation is | a broken OS, full stop. This is _the_ POSIX API for implementing | sleeping synchronization mechanisms in threaded programs. | | The glibc project should be incredibly embarrassed by this, | assuming there's anyone even still paying attention there. | nix23 wrote: | Well it's gnu/linux, a multiuser unix that is not usable as a | multiuser system (read the sdf.org story). It's a cheap hw- | framework to run more or less reliable...but no problem we have | kubernetes and containers to cover/perfume the stinking parts. | ajross wrote: | Skimming this, this isn't something glibc can do reliably. This | has to be a kernel fix. The underlying FUTEX_OP operation is... | obtuse. Futexes were originally designed for simple locking and | semaphores, and only later extended to condition variables, and | the result isn't pretty. (To be fair: it's a very hard problem on | kernels that need to scale to hundreds of cores across many | levels of memory interconnect.) | | Trying to fix this in userspace is just about tuning to try to | suppress the underlying issues, and recovering cleanly when | things break. | mannykannot wrote: | I'm not sure whether to read your comment as disputing the | author's claim of having a fix for this particular problem, or | that there will inevitably be other cases where condition | variables fail unless futexes are fixed in the kernel. | jart wrote: | How is this the kernel's fault? The article says | "pthread_cond_signal() will signal on the wrong futex". Yes I | understand futexes are tricky, but I don't see how they're | broken. | ajross wrote: | To be clear: I'm not assigning "fault", I'm saying the design | is flawed. The current futex abstraction forces userspace to | do things like "pick a thread to wake up", which is | inherently racy when viewed from outside the kernel. So yes, | you have to fix this particular issue in glibc because the | microbug is in glibc. But glibc shouldn't have tried to solve | the problem in the first place. | | IPC is just really hard, we struggle with exactly this in | Zephyr too (though we have mostly managed to avoid this | particular kind of mistake as in an RTOS the kernel/API | boundary is thinner), at a much smaller scale. Add NUMA and | cache-locality-concern to the mix and things get harder | still. | keepquestioning wrote: | Would macOS see this issue too? | saagarjha wrote: | macOS uses its own pthreads implementation: | https://github.com/apple-oss-distributions/libpthread | mort96 wrote: | This seems like some really good debugging effort, kudos. | | But, my takeaway from this is that there's a glibc bug which | causes deadlocks in correct code, and a proposed fix (even if | partial) has been available for years, and glibc still hasn't | shipped a version with the fix so it's up to distros and users to | patch the broken glibc code. | | Reminds me of how a glibc update broke GNU m4 and GNU didn't ship | a fix to m4 for years, so it was up to distros and users to patch | the latest release of GNU m4 to make it compile with the latest | version of GNU glibc. | | And then a couple of years later GNU shipped another version of | glibc which broke m4. | | GNU doesn't strike me as an organisation which places a lot of | value on software quality and reliability. | stefan_ wrote: | It's hilarious how many of these stories there are. I remember | debugging a DNS issue on an embedded system; the resolv.conf | changed but the program continued trying the old DNS server. | Turns out GLIBC will read the resolv.conf once on startup and | never again after (unless specifically invalidated). Except | most people will never notice this behavior because (1) most | distributions carry a patch from 2004 that reloads it | transparently or (2) use libraries that just always invalidate | it, just to be sure. | | This stupidity surfaces everywhere; here is a Go bug report for | it: | | https://github.com/golang/go/issues/21083 | rekado wrote: | > GNU doesn't strike me as an organisation which places a lot | of value on software quality and reliability. | | This stems from a misunderstanding of what GNU is. Ideally all | projects under the GNU umbrella would work towards a unified | operating system, but that's sadly not what it's like. The GNU | project is not much of a project in the common sense of the | word, nor is it much of an organisation. This is one of the | many reasons why https://gnu.tools exists, a subset of GNU with | common goals and values, including collaborative project | management. | xwolfi wrote: | It sounds silly for deep theoricians and a somewhat | politically active organization like GNU to have wanted to go | all the way to an OS, the kind of software where you really | need to throw all the beauty out of the window and solve dumb | users tangible problems, like how to read a bluray from an | obscure proprietary drive or how to run a game with a | monopolistic corporation's binary drivers. | | GNU can never succeed at it because it can never compromise | and maybe that's fine. It's a shame that it s considered a | failure when they accomplished so much in the OS scaffolding | part. | mort96 wrote: | I kind of get that. And if a change in GNU glibc caused a bug | in GNU wget, I would hope that wget would release a fix | relatively quickly, but I wouldn't necessarily have any | special expectations just because glibc and wget are both | part of GNU. | | But I wish that the projects in the GNU toolchain would at | the very least collaborate. The combination of glibc + gcc + | autotools + m4 makes such a core part of any GNU system that | you'd hope that GNU cares to keep them mutually compatible. | So when they release an update to glibc which breaks m4 which | makes it impossible to compile the current version of the GNU | toolchain using the current version of the GNU toolchain, | that's extremely disappointing. | yjftsjthsd-h wrote: | It does naively seem like a central library like glibc | would want to run regression integration tests on every new | version to make sure they're not breaking the utterly | massive ocean of software that depends on them, and that | the very obvious first step in that direction would be to | run tests using software from the same group and/or core | libraries/software that they know are required for a lot of | base systems to work (ex. GNU coreutils, GCC, clang, | busybox). I suspect it does in fact mostly boil down to | resourcing problems - although that just begs further | questions, since I would expect that organizations like | Google or Red Hat would consider that kind of thing | important enough to support. | dezgeg wrote: | Using GNU Guix (the distribution) should be a logical and | simple test-bed to see how exactly glibc/gcc pre-release | would affect downstream packages. | [deleted] | userbinator wrote: | _GNU doesn't strike me as an organisation which places a lot of | value on software quality and reliability._ | | That's not surprising, because their main purpose is to promote | the GPL and "free as in freedom" software. I've heard theories | that their software is deliberately overly complex and uniquely | different from other implementations in order to make it easier | to find GPL violations. In other words, they're optimising for | something other than quality. | | I've inspected the glibc condition variable implementatoin | (while looking at what may be a slightly different bug, or it | could be another manifestation of this one; not sure at this | point since it was years ago) and the first thing I noticed was | that it is disturbingly complex. | pengaru wrote: | > That's not surprising, because their main purpose is to | promote the GPL and "free as in freedom" software. | | You're conflating the FSF and the GNU project. | userbinator wrote: | Same difference; this is in contrast to e.g. BSD, which is | far less politically motivated. (And AFAIK they have no | problems in their CV implementation that I know of...) | chrsig wrote: | Awesome to see the author post TLA+ instructions & model source. | One day I'll get around to actually learning it. In the meantime, | what's posted is too involved for me to grok at first glance, but | I'm still glad to see more real world uses pop up. | aaaaaaaaaaab wrote: | ComputerGuru wrote: | I maintain my own FOSS threads signals/events libraries in C++ | [0] and in rust [1] (atop of parking lot as a futex shoe-in) | and I disagree. This has nothing to do with the language and | everything to do with the semantics of the locks. Writing | concurrency primitives is HARD and the more functionality your | API exposes, the more room there is for nuanced bugs in how | everything interplays with everything else. | | [0]: https://github.com/neosmart/pevents | | [1]: https://github.com/neosmart/rsevents | galangalalgol wrote: | I'm a rust beginner, but it seems like a mutex or lock is | shared mutable state, and abstracting this with unsafe code | to implement events is just creating a loophole by which to | call a function on another thread without having a mutable | reference? I'm sure I'm missing something, or a lot of | things. It just aeems like code I wouldn't writ in c++ even | anymore. I typically don't do signalling between threads or | have them share mutable state. If they do signal it is by | read or write to a value small enough to be atomic. | ComputerGuru wrote: | Rust's synchronization types are built on the semantics of | shared objects, with the synchronization object owning the | data access to which is channeled exclusively via the | synchronization type. | | Events and signals aren't intended to be used off you're | trying to share ownership of memory or data - you should | use the std types in that case. They're for signaling | between threads or for building your own synchronization | objects atop of. For example, an auto reset event can be | used in lieu of a one-bit channel (sort of like Mpmc<()>) | and manual reset events can be used to implement the same | in broadcast mode, much more efficiently than the channel | crates. A common usage is to block until an event is | available or a shutdown has been requested - in lieu of | manually polling an AtomicBool and aborting if it is true, | making your code more responsive (no delay between polls) | and more efficient (no repeated polling, no busy waits, | etc). They can be used to signal state transitions or | readiness, which doesn't have anything to "own", for | example, the rsevents-extra crate [0] implements a | countdown event (event becomes available when n outstanding | tasks have finished in parallel threads) or a semaphore | (restricting concurrency to a region of code, not limiting | access to a variable or object). | | [0]: https://github.com/neosmart/rsevents-extra | asveikau wrote: | It's worth noting that condition variables are a finicky | interface. Speaking personally, I would typically much rather | work with something like a semaphore. I also feel like I've | seen a lot of misuse of condition variables out there. | ComputerGuru wrote: | Condition variables were the POSIX equivalent to Win32's | WaitForMultipleObjects (and IOCP apis like it). Now both | are available on Windows (and my library makes WFMO | available on *nix/Linux/macOS). | | Semaphores alone can't mimic the same semantics (juggling | between multiple, disparate locks/sync objects) without | running into a possible race condition unless you have an | API that can somehow wait on one of two or more semaphores | to become available. | enjoy-your-stay wrote: | WaitForMultipleObjects (been in Win32 forever) is not | really related to IO completion Ports, whose API came a | lot later. | | My understanding is that condition variables were | designed to be a superior alternative to Windows Event | objects. | | I'm glad to read here that others find them fiddly and | overly awkward to use since that was also my impression | vs the consistent and usually easy to use Windows thread | synchronization HANDLE based objects. Dave Cutler | definitely did a good job there IMHO. | gpderetta wrote: | Condition variables predate Windows and its predecessor | VMS by many years (they do not predate unix itself but | certainly they predate pthreads by decades). | | In particular pthreads borrowed them from Mesa, but they | originally were part of the monitor abstraction designed | by Hoare. | | So they weren't designed to be a superior alternative to | windows events; they might be considered an alternative | to events in general but I couldn't easily find any | history of the events abstraction. | asveikau wrote: | I totally disagree with your assessment. | WaitForMultipleObjects enters the kernel every time and | blocks on kernel objects. The nearest equivalent is | select, poll, epoll, kqueue, etc. Newer innovations like | eventfd start to introduce synchronization objects into | the mix of what type of kernel object to block on. | | A Win32 mutex or semaphore is very heavy and typically | avoided in a sane high performance application unless you | need it to cross process boundaries. A lot of the time | (most?) you will want a win32 critical section, which is | similar to the modern futex based pthread mutex | implementation in that it does not enter the kernel | unless there is contention. | | A pthread condition variable is clumsy. You need to give | it an extra mutex and loop etc. It doesn't have any of | the simplicity of WaitForMultipleObjects, even with all | the faults of that interface of which there are many. | ComputerGuru wrote: | I was not talking about implementation details but | semantics. I agree with what you are saying. My own | implementations are user-mode unless sleeping. | eloff wrote: | Rust doesn't help with implementing low level primitives like | this, you'd have to use unsafe code. | cesarb wrote: | > Rust doesn't help with implementing low level primitives | like this | | A charitable reading of the parent comment might be that Rust | _already implements_ these particular low-level primitives, | so that by using Rust, you are not using the problematic | glibc pthread code. See https://blog.rust- | lang.org/2022/06/30/Rust-1.62.0.html#thinn... for the | relevant announcement: "Previously, Mutex, Condvar, and | RwLock were backed by the pthreads library on Linux. [...] | Rust's standard library now ships with a raw futex-based | implementation of these locks on Linux, [...]" | jstimpfle wrote: | In either case you need to use some implementation of | synchronisation primitives, and that could contain bugs. | Where's the difference? | dundarious wrote: | glibc's impl is a "raw futex-based" one as well. I'm no | great fan of glibc as an organization or body of code, but | the comparative benefits of Rust here are not well | explained. | tialaramex wrote: | Although glibc's implementation is indeed "raw futex- | based" it is not merely a raw futex, of course. What you | get from glibc isn't a futex (tricky to use properly) but | instead a C structure type named pthread_mutex_t. It's | quite big. 40 bytes! | | All Rust wanted was a futex, which for reference is just | any 32-bit aligned value. The pthread_mutex_t is less | useful to Rust, yet it's ten times bigger! | | So, Mara's new Mutex<T> in the Linux builds of Rust is | just a futex (32-bits) plus a poison boolean plus your | type T which is protected by the mutex. If your type is | Zero Size then this rounds up to 8 bytes, five times | smaller than pthread_mutex_t. If your type was just a u16 | (unsigned 16-bit integer) then it rounds up to the same | size, but you're also protecting your 16-bit unsigned | integer which is nice. | | Now, if you're protecting huge data structures with | mutexes, you don't care, it makes no measurable | difference. But if you heard futex is small (it is) and | protected loads of tiny objects with one each (say, | individual 32-bit floating point X,Y co-ordinate pairs) | Rust now makes your choice cost roughly what you expected | instead of bloating your program for no reason. | | It's also a bit faster (but probably not enough that you | should care) and simplifies some other book keeping in | Rust itself. | | [ Above I wrote about Mutex, for Condvar the argument is | similar but I haven't spent lots of time understanding | the innards of Condvar whereas I spent a week reading | Mutex stuff ] | gpderetta wrote: | Condvar and mutexes are for very different use cases | though, and condvars are notoriously hard to get right, | especially when supporting all the full set of posix | behaviours with good performance. | asveikau wrote: | Which means they have plenty of opportunities to introduce | their own bugs. | kibwen wrote: | That doesn't help here, as Rust also uses glibc for plenty of | things (although recently it did move away from pthreads in | certain areas for performance reasons, see | https://twitter.com/m_ou_se/status/1526211117651050497 ). | wizeman wrote: | Just a word of caution: I've tried replacing the mitigation patch | (which I had been using for 18 months) with the author's minimal | fix (the first out of the 6 patches, which is supposed to fix the | issue) and I've immediately ran into a deadlock while building | Poly/ML 5.9 (3 times, in fact), whereas previously I hadn't | encountered this bug before, despite having built Poly/ML many | times. | | This was on a 32-core machine with lots of background compilation | going on at the same time (i.e. with a high load). | | It could be a coincidence or an unrelated bug, but it's also | possible that the author's patch might not fix the issue | completely. | KyleSanderson wrote: | Apply the entire series... That's what was likely tested the | most. | oldtimer1901 wrote: | Old timers will recall that when Java first became popular in the | mid 90s it brought to light all sorts of bugs in various | operating systems' thread implementations - particularly in | SunOS/Solaris. It is discouraging that such basic bugs still | exist in GNU/Linux after 25 years. But it's not surprising - if | anyone tells you that they understand all possible multi-threaded | scenarios and race conditions in the implementation of these | libraries - they are mistaken. It's all trial and error. | [deleted] | dgan wrote: | can we have a "formal semantics for multi-threading" ? why it's | not a thing? | tsimionescu wrote: | Another poster shared the Java memory model, and C++ and C have | them as well. The article we're discussing shows another kind | of formal model for multithreaded code in TLA+, and actually | uses it to investigate this bug. There is also Tony Hoare's | Communicating Sequential Processes formal model, first | presented in 1978; and an entire field of CS research called | process calculus. | | So, pick a formal model and use it - though note that formal | methods tend to add significant extra effort to use, and may or | may not pay off depednig on your domain and desired outcomes. | kjeetgill wrote: | There are. They're called memory models. I know Java's is | pretty readable (see below). I think C and C++ have them too | now. But when you're building on these things there's always | lots of room to get them wrong! | | [0]: https://en.wikipedia.org/wiki/Java_memory_model | | [1]: | https://docs.oracle.com/javase/specs/jls/se18/html/jls-17.ht... | baq wrote: | It's super duper hard, mostly. | metadat wrote: | Is there a way to bundle and use an more up-to-date version of | glibc inside my docker containers? Is it a straightforward matter | of creating an intermediate build-base-image where glibc gets | compiled and copying over the resulting glibc .so artifact files | to the final image? | | I suspect there may be additional concerns, but I'm not a glibc | expert and have not needed to mess with it much. | | Also, does musl libc [0] have less bugs, on average? | | [0] https://musl.libc.org/ ___________________________________________________________________ (page generated 2022-09-18 23:00 UTC)