[HN Gopher] Finding the "second bug" in glibc's condition variable
       ___________________________________________________________________
        
       Finding the "second bug" in glibc's condition variable
        
       Author : ingve
       Score  : 246 points
       Date   : 2022-09-18 12:55 UTC (10 hours ago)
        
 (HTM) web link (probablydance.com)
 (TXT) w3m dump (probablydance.com)
        
       | jart wrote:
       | If you need really good synchronization primitives while Glibc is
       | fixing things then I've had a lot of success with *NSYNC
       | https://github.com/google/nsync which is what Cosmopolitan Libc
       | uses to implement pthread_cond_t.
        
         | jeffbee wrote:
         | Its design is similar to absl::Mutex, which is available to c++
         | users. They may also have been written initially by Mike
         | Burrows?
        
       | [deleted]
        
       | userbinator wrote:
       | Concurrency is bug-prone. Complex code is bug-prone. glibc's CV
       | implementation combines both.
       | 
       | In my experience, both writing and teaching concurrency, there's
       | a very strong sentiment of "I'll just use another CV/lock/etc."
       | whenever people encounter a bug and try to fix it, which seldom
       | fully solves the problem but usually makes it even more subtle
       | and compounds the number of resulting states.
       | 
       | One of my tricks to increase chances for race conditions and
       | deadlocks when testing is to use a modified sampling profiler to
       | randomly delay and pause threads, but with such a long sequence
       | of steps as this bug requires, that would've been unlikely to
       | help. Yet Murphy's Law says that it will happen in practice...
        
       | sbf501 wrote:
       | What bothers me is that the OCaml code looks so harmless:
       | static void st_masterlock_acquire(st_masterlock * m)         {
       | pthread_mutex_lock(&m->lock);           while (m->busy) {
       | m->waiters ++;             custom_condvar_wait(&m->is_free,
       | &m->lock);             m->waiters --;           }
       | m->busy = 1;           pthread_mutex_unlock(&m->lock);         }
       | 
       | How is this bug not more common with such a simple invocation?
        
         | asveikau wrote:
         | Probably you can get away with that simple invocation "most of
         | the time", but what matters is the higher level code that calls
         | it triggers the bug where other access patterns would not.
         | 
         | Also, I do wonder why they roll their own mutex using condition
         | variables. Most people probably would use a mutex and thus
         | wouldn't hit the condition variable at all. Condition variables
         | are more niche than mutex.
        
           | mbac32768 wrote:
           | According to Xavier Leroy (OCaml language dev):
           | 
           | > It is possible to use a plain POSIX mutex as master lock,
           | but fairness is awful.
           | 
           | https://discuss.ocaml.org/t/is-there-a-known-recent-linux-
           | lo...
        
         | jwatte wrote:
         | That kinda depends on what custom_condvar_wait() does, right?
        
           | grogers wrote:
           | Just 6 days ago they merged a custom condvar implementation
           | based on futex directly to workaround the glibc bug. Before
           | that it was just pthread_cond_wait
        
       | ncmncm wrote:
       | Someone should be very embarrassed about how they have handled
       | this whole process.
       | 
       | Surely code to operate condition variables correctly is in the
       | public domain, that does not involve mitigations or stolen
       | signals.
        
       | pengaru wrote:
       | In 2022 a POSIX OS with a broken pthread_cond_t implementation is
       | a broken OS, full stop. This is _the_ POSIX API for implementing
       | sleeping synchronization mechanisms in threaded programs.
       | 
       | The glibc project should be incredibly embarrassed by this,
       | assuming there's anyone even still paying attention there.
        
         | nix23 wrote:
         | Well it's gnu/linux, a multiuser unix that is not usable as a
         | multiuser system (read the sdf.org story). It's a cheap hw-
         | framework to run more or less reliable...but no problem we have
         | kubernetes and containers to cover/perfume the stinking parts.
        
       | ajross wrote:
       | Skimming this, this isn't something glibc can do reliably. This
       | has to be a kernel fix. The underlying FUTEX_OP operation is...
       | obtuse. Futexes were originally designed for simple locking and
       | semaphores, and only later extended to condition variables, and
       | the result isn't pretty. (To be fair: it's a very hard problem on
       | kernels that need to scale to hundreds of cores across many
       | levels of memory interconnect.)
       | 
       | Trying to fix this in userspace is just about tuning to try to
       | suppress the underlying issues, and recovering cleanly when
       | things break.
        
         | mannykannot wrote:
         | I'm not sure whether to read your comment as disputing the
         | author's claim of having a fix for this particular problem, or
         | that there will inevitably be other cases where condition
         | variables fail unless futexes are fixed in the kernel.
        
         | jart wrote:
         | How is this the kernel's fault? The article says
         | "pthread_cond_signal() will signal on the wrong futex". Yes I
         | understand futexes are tricky, but I don't see how they're
         | broken.
        
           | ajross wrote:
           | To be clear: I'm not assigning "fault", I'm saying the design
           | is flawed. The current futex abstraction forces userspace to
           | do things like "pick a thread to wake up", which is
           | inherently racy when viewed from outside the kernel. So yes,
           | you have to fix this particular issue in glibc because the
           | microbug is in glibc. But glibc shouldn't have tried to solve
           | the problem in the first place.
           | 
           | IPC is just really hard, we struggle with exactly this in
           | Zephyr too (though we have mostly managed to avoid this
           | particular kind of mistake as in an RTOS the kernel/API
           | boundary is thinner), at a much smaller scale. Add NUMA and
           | cache-locality-concern to the mix and things get harder
           | still.
        
       | keepquestioning wrote:
       | Would macOS see this issue too?
        
         | saagarjha wrote:
         | macOS uses its own pthreads implementation:
         | https://github.com/apple-oss-distributions/libpthread
        
       | mort96 wrote:
       | This seems like some really good debugging effort, kudos.
       | 
       | But, my takeaway from this is that there's a glibc bug which
       | causes deadlocks in correct code, and a proposed fix (even if
       | partial) has been available for years, and glibc still hasn't
       | shipped a version with the fix so it's up to distros and users to
       | patch the broken glibc code.
       | 
       | Reminds me of how a glibc update broke GNU m4 and GNU didn't ship
       | a fix to m4 for years, so it was up to distros and users to patch
       | the latest release of GNU m4 to make it compile with the latest
       | version of GNU glibc.
       | 
       | And then a couple of years later GNU shipped another version of
       | glibc which broke m4.
       | 
       | GNU doesn't strike me as an organisation which places a lot of
       | value on software quality and reliability.
        
         | stefan_ wrote:
         | It's hilarious how many of these stories there are. I remember
         | debugging a DNS issue on an embedded system; the resolv.conf
         | changed but the program continued trying the old DNS server.
         | Turns out GLIBC will read the resolv.conf once on startup and
         | never again after (unless specifically invalidated). Except
         | most people will never notice this behavior because (1) most
         | distributions carry a patch from 2004 that reloads it
         | transparently or (2) use libraries that just always invalidate
         | it, just to be sure.
         | 
         | This stupidity surfaces everywhere; here is a Go bug report for
         | it:
         | 
         | https://github.com/golang/go/issues/21083
        
         | rekado wrote:
         | > GNU doesn't strike me as an organisation which places a lot
         | of value on software quality and reliability.
         | 
         | This stems from a misunderstanding of what GNU is. Ideally all
         | projects under the GNU umbrella would work towards a unified
         | operating system, but that's sadly not what it's like. The GNU
         | project is not much of a project in the common sense of the
         | word, nor is it much of an organisation. This is one of the
         | many reasons why https://gnu.tools exists, a subset of GNU with
         | common goals and values, including collaborative project
         | management.
        
           | xwolfi wrote:
           | It sounds silly for deep theoricians and a somewhat
           | politically active organization like GNU to have wanted to go
           | all the way to an OS, the kind of software where you really
           | need to throw all the beauty out of the window and solve dumb
           | users tangible problems, like how to read a bluray from an
           | obscure proprietary drive or how to run a game with a
           | monopolistic corporation's binary drivers.
           | 
           | GNU can never succeed at it because it can never compromise
           | and maybe that's fine. It's a shame that it s considered a
           | failure when they accomplished so much in the OS scaffolding
           | part.
        
           | mort96 wrote:
           | I kind of get that. And if a change in GNU glibc caused a bug
           | in GNU wget, I would hope that wget would release a fix
           | relatively quickly, but I wouldn't necessarily have any
           | special expectations just because glibc and wget are both
           | part of GNU.
           | 
           | But I wish that the projects in the GNU toolchain would at
           | the very least collaborate. The combination of glibc + gcc +
           | autotools + m4 makes such a core part of any GNU system that
           | you'd hope that GNU cares to keep them mutually compatible.
           | So when they release an update to glibc which breaks m4 which
           | makes it impossible to compile the current version of the GNU
           | toolchain using the current version of the GNU toolchain,
           | that's extremely disappointing.
        
             | yjftsjthsd-h wrote:
             | It does naively seem like a central library like glibc
             | would want to run regression integration tests on every new
             | version to make sure they're not breaking the utterly
             | massive ocean of software that depends on them, and that
             | the very obvious first step in that direction would be to
             | run tests using software from the same group and/or core
             | libraries/software that they know are required for a lot of
             | base systems to work (ex. GNU coreutils, GCC, clang,
             | busybox). I suspect it does in fact mostly boil down to
             | resourcing problems - although that just begs further
             | questions, since I would expect that organizations like
             | Google or Red Hat would consider that kind of thing
             | important enough to support.
        
               | dezgeg wrote:
               | Using GNU Guix (the distribution) should be a logical and
               | simple test-bed to see how exactly glibc/gcc pre-release
               | would affect downstream packages.
        
         | [deleted]
        
         | userbinator wrote:
         | _GNU doesn't strike me as an organisation which places a lot of
         | value on software quality and reliability._
         | 
         | That's not surprising, because their main purpose is to promote
         | the GPL and "free as in freedom" software. I've heard theories
         | that their software is deliberately overly complex and uniquely
         | different from other implementations in order to make it easier
         | to find GPL violations. In other words, they're optimising for
         | something other than quality.
         | 
         | I've inspected the glibc condition variable implementatoin
         | (while looking at what may be a slightly different bug, or it
         | could be another manifestation of this one; not sure at this
         | point since it was years ago) and the first thing I noticed was
         | that it is disturbingly complex.
        
           | pengaru wrote:
           | > That's not surprising, because their main purpose is to
           | promote the GPL and "free as in freedom" software.
           | 
           | You're conflating the FSF and the GNU project.
        
             | userbinator wrote:
             | Same difference; this is in contrast to e.g. BSD, which is
             | far less politically motivated. (And AFAIK they have no
             | problems in their CV implementation that I know of...)
        
       | chrsig wrote:
       | Awesome to see the author post TLA+ instructions & model source.
       | One day I'll get around to actually learning it. In the meantime,
       | what's posted is too involved for me to grok at first glance, but
       | I'm still glad to see more real world uses pop up.
        
       | aaaaaaaaaaab wrote:
        
         | ComputerGuru wrote:
         | I maintain my own FOSS threads signals/events libraries in C++
         | [0] and in rust [1] (atop of parking lot as a futex shoe-in)
         | and I disagree. This has nothing to do with the language and
         | everything to do with the semantics of the locks. Writing
         | concurrency primitives is HARD and the more functionality your
         | API exposes, the more room there is for nuanced bugs in how
         | everything interplays with everything else.
         | 
         | [0]: https://github.com/neosmart/pevents
         | 
         | [1]: https://github.com/neosmart/rsevents
        
           | galangalalgol wrote:
           | I'm a rust beginner, but it seems like a mutex or lock is
           | shared mutable state, and abstracting this with unsafe code
           | to implement events is just creating a loophole by which to
           | call a function on another thread without having a mutable
           | reference? I'm sure I'm missing something, or a lot of
           | things. It just aeems like code I wouldn't writ in c++ even
           | anymore. I typically don't do signalling between threads or
           | have them share mutable state. If they do signal it is by
           | read or write to a value small enough to be atomic.
        
             | ComputerGuru wrote:
             | Rust's synchronization types are built on the semantics of
             | shared objects, with the synchronization object owning the
             | data access to which is channeled exclusively via the
             | synchronization type.
             | 
             | Events and signals aren't intended to be used off you're
             | trying to share ownership of memory or data - you should
             | use the std types in that case. They're for signaling
             | between threads or for building your own synchronization
             | objects atop of. For example, an auto reset event can be
             | used in lieu of a one-bit channel (sort of like Mpmc<()>)
             | and manual reset events can be used to implement the same
             | in broadcast mode, much more efficiently than the channel
             | crates. A common usage is to block until an event is
             | available or a shutdown has been requested - in lieu of
             | manually polling an AtomicBool and aborting if it is true,
             | making your code more responsive (no delay between polls)
             | and more efficient (no repeated polling, no busy waits,
             | etc). They can be used to signal state transitions or
             | readiness, which doesn't have anything to "own", for
             | example, the rsevents-extra crate [0] implements a
             | countdown event (event becomes available when n outstanding
             | tasks have finished in parallel threads) or a semaphore
             | (restricting concurrency to a region of code, not limiting
             | access to a variable or object).
             | 
             | [0]: https://github.com/neosmart/rsevents-extra
        
           | asveikau wrote:
           | It's worth noting that condition variables are a finicky
           | interface. Speaking personally, I would typically much rather
           | work with something like a semaphore. I also feel like I've
           | seen a lot of misuse of condition variables out there.
        
             | ComputerGuru wrote:
             | Condition variables were the POSIX equivalent to Win32's
             | WaitForMultipleObjects (and IOCP apis like it). Now both
             | are available on Windows (and my library makes WFMO
             | available on *nix/Linux/macOS).
             | 
             | Semaphores alone can't mimic the same semantics (juggling
             | between multiple, disparate locks/sync objects) without
             | running into a possible race condition unless you have an
             | API that can somehow wait on one of two or more semaphores
             | to become available.
        
               | enjoy-your-stay wrote:
               | WaitForMultipleObjects (been in Win32 forever) is not
               | really related to IO completion Ports, whose API came a
               | lot later.
               | 
               | My understanding is that condition variables were
               | designed to be a superior alternative to Windows Event
               | objects.
               | 
               | I'm glad to read here that others find them fiddly and
               | overly awkward to use since that was also my impression
               | vs the consistent and usually easy to use Windows thread
               | synchronization HANDLE based objects. Dave Cutler
               | definitely did a good job there IMHO.
        
               | gpderetta wrote:
               | Condition variables predate Windows and its predecessor
               | VMS by many years (they do not predate unix itself but
               | certainly they predate pthreads by decades).
               | 
               | In particular pthreads borrowed them from Mesa, but they
               | originally were part of the monitor abstraction designed
               | by Hoare.
               | 
               | So they weren't designed to be a superior alternative to
               | windows events; they might be considered an alternative
               | to events in general but I couldn't easily find any
               | history of the events abstraction.
        
               | asveikau wrote:
               | I totally disagree with your assessment.
               | WaitForMultipleObjects enters the kernel every time and
               | blocks on kernel objects. The nearest equivalent is
               | select, poll, epoll, kqueue, etc. Newer innovations like
               | eventfd start to introduce synchronization objects into
               | the mix of what type of kernel object to block on.
               | 
               | A Win32 mutex or semaphore is very heavy and typically
               | avoided in a sane high performance application unless you
               | need it to cross process boundaries. A lot of the time
               | (most?) you will want a win32 critical section, which is
               | similar to the modern futex based pthread mutex
               | implementation in that it does not enter the kernel
               | unless there is contention.
               | 
               | A pthread condition variable is clumsy. You need to give
               | it an extra mutex and loop etc. It doesn't have any of
               | the simplicity of WaitForMultipleObjects, even with all
               | the faults of that interface of which there are many.
        
               | ComputerGuru wrote:
               | I was not talking about implementation details but
               | semantics. I agree with what you are saying. My own
               | implementations are user-mode unless sleeping.
        
         | eloff wrote:
         | Rust doesn't help with implementing low level primitives like
         | this, you'd have to use unsafe code.
        
           | cesarb wrote:
           | > Rust doesn't help with implementing low level primitives
           | like this
           | 
           | A charitable reading of the parent comment might be that Rust
           | _already implements_ these particular low-level primitives,
           | so that by using Rust, you are not using the problematic
           | glibc pthread code. See https://blog.rust-
           | lang.org/2022/06/30/Rust-1.62.0.html#thinn... for the
           | relevant announcement: "Previously, Mutex, Condvar, and
           | RwLock were backed by the pthreads library on Linux. [...]
           | Rust's standard library now ships with a raw futex-based
           | implementation of these locks on Linux, [...]"
        
             | jstimpfle wrote:
             | In either case you need to use some implementation of
             | synchronisation primitives, and that could contain bugs.
             | Where's the difference?
        
             | dundarious wrote:
             | glibc's impl is a "raw futex-based" one as well. I'm no
             | great fan of glibc as an organization or body of code, but
             | the comparative benefits of Rust here are not well
             | explained.
        
               | tialaramex wrote:
               | Although glibc's implementation is indeed "raw futex-
               | based" it is not merely a raw futex, of course. What you
               | get from glibc isn't a futex (tricky to use properly) but
               | instead a C structure type named pthread_mutex_t. It's
               | quite big. 40 bytes!
               | 
               | All Rust wanted was a futex, which for reference is just
               | any 32-bit aligned value. The pthread_mutex_t is less
               | useful to Rust, yet it's ten times bigger!
               | 
               | So, Mara's new Mutex<T> in the Linux builds of Rust is
               | just a futex (32-bits) plus a poison boolean plus your
               | type T which is protected by the mutex. If your type is
               | Zero Size then this rounds up to 8 bytes, five times
               | smaller than pthread_mutex_t. If your type was just a u16
               | (unsigned 16-bit integer) then it rounds up to the same
               | size, but you're also protecting your 16-bit unsigned
               | integer which is nice.
               | 
               | Now, if you're protecting huge data structures with
               | mutexes, you don't care, it makes no measurable
               | difference. But if you heard futex is small (it is) and
               | protected loads of tiny objects with one each (say,
               | individual 32-bit floating point X,Y co-ordinate pairs)
               | Rust now makes your choice cost roughly what you expected
               | instead of bloating your program for no reason.
               | 
               | It's also a bit faster (but probably not enough that you
               | should care) and simplifies some other book keeping in
               | Rust itself.
               | 
               | [ Above I wrote about Mutex, for Condvar the argument is
               | similar but I haven't spent lots of time understanding
               | the innards of Condvar whereas I spent a week reading
               | Mutex stuff ]
        
               | gpderetta wrote:
               | Condvar and mutexes are for very different use cases
               | though, and condvars are notoriously hard to get right,
               | especially when supporting all the full set of posix
               | behaviours with good performance.
        
             | asveikau wrote:
             | Which means they have plenty of opportunities to introduce
             | their own bugs.
        
         | kibwen wrote:
         | That doesn't help here, as Rust also uses glibc for plenty of
         | things (although recently it did move away from pthreads in
         | certain areas for performance reasons, see
         | https://twitter.com/m_ou_se/status/1526211117651050497 ).
        
       | wizeman wrote:
       | Just a word of caution: I've tried replacing the mitigation patch
       | (which I had been using for 18 months) with the author's minimal
       | fix (the first out of the 6 patches, which is supposed to fix the
       | issue) and I've immediately ran into a deadlock while building
       | Poly/ML 5.9 (3 times, in fact), whereas previously I hadn't
       | encountered this bug before, despite having built Poly/ML many
       | times.
       | 
       | This was on a 32-core machine with lots of background compilation
       | going on at the same time (i.e. with a high load).
       | 
       | It could be a coincidence or an unrelated bug, but it's also
       | possible that the author's patch might not fix the issue
       | completely.
        
         | KyleSanderson wrote:
         | Apply the entire series... That's what was likely tested the
         | most.
        
       | oldtimer1901 wrote:
       | Old timers will recall that when Java first became popular in the
       | mid 90s it brought to light all sorts of bugs in various
       | operating systems' thread implementations - particularly in
       | SunOS/Solaris. It is discouraging that such basic bugs still
       | exist in GNU/Linux after 25 years. But it's not surprising - if
       | anyone tells you that they understand all possible multi-threaded
       | scenarios and race conditions in the implementation of these
       | libraries - they are mistaken. It's all trial and error.
        
         | [deleted]
        
       | dgan wrote:
       | can we have a "formal semantics for multi-threading" ? why it's
       | not a thing?
        
         | tsimionescu wrote:
         | Another poster shared the Java memory model, and C++ and C have
         | them as well. The article we're discussing shows another kind
         | of formal model for multithreaded code in TLA+, and actually
         | uses it to investigate this bug. There is also Tony Hoare's
         | Communicating Sequential Processes formal model, first
         | presented in 1978; and an entire field of CS research called
         | process calculus.
         | 
         | So, pick a formal model and use it - though note that formal
         | methods tend to add significant extra effort to use, and may or
         | may not pay off depednig on your domain and desired outcomes.
        
         | kjeetgill wrote:
         | There are. They're called memory models. I know Java's is
         | pretty readable (see below). I think C and C++ have them too
         | now. But when you're building on these things there's always
         | lots of room to get them wrong!
         | 
         | [0]: https://en.wikipedia.org/wiki/Java_memory_model
         | 
         | [1]:
         | https://docs.oracle.com/javase/specs/jls/se18/html/jls-17.ht...
        
         | baq wrote:
         | It's super duper hard, mostly.
        
       | metadat wrote:
       | Is there a way to bundle and use an more up-to-date version of
       | glibc inside my docker containers? Is it a straightforward matter
       | of creating an intermediate build-base-image where glibc gets
       | compiled and copying over the resulting glibc .so artifact files
       | to the final image?
       | 
       | I suspect there may be additional concerns, but I'm not a glibc
       | expert and have not needed to mess with it much.
       | 
       | Also, does musl libc [0] have less bugs, on average?
       | 
       | [0] https://musl.libc.org/
        
       ___________________________________________________________________
       (page generated 2022-09-18 23:00 UTC)