[HN Gopher] Must be this tall to write multi-threaded code (2015) ___________________________________________________________________ Must be this tall to write multi-threaded code (2015) Author : luu Score : 81 points Date : 2022-06-09 18:53 UTC (1 days ago) (HTM) web link (bholley.net) (TXT) w3m dump (bholley.net) | bena wrote: | I think warnings like this are good and necessary. A lot of | people are going to ignore this warning simply because they | wrongly believe they're good enough to ignore it. And they're | going to try and write this massive project simply to prove that | the warning doesn't apply to them. And they will serve as great | examples of why that warning is there in the first place. | | But there's another group. This group heeds the warning. They do | their best to not violate the warning. But then they run into an | issue. They exhaust every possible option trying to resolve the | issue. But eventually, they realize that if they can violate the | warning, just a little bit. And they're careful. And they violate | the warning in the least way possible and take every action | deliberately. And it works. | | Because the warning doesn't mean it's impossible, it just means | that it is incredibly difficult and should be approached with the | utmost care. | | Eventually, we do learn the pitfalls, internalize them, abstract | them away, etc. And then we can even do away with the warning. | Because it's no longer the edge of the universe, it's just | another stop along the way. | bee_rider wrote: | Just sprinkle in some OMP PARALLEL DO's it'll be fine. | bitwize wrote: | I knew I was right to trust Steve Summit with multithreaded stuff | back when I worked with him! | robocat wrote: | > Buggy multi-threaded code creates race conditions, which are | the most dangerous and time-consuming class of bugs in software. | > safe and scalable parallelism is achievable by minimizing or | eliminating concurrent access to shared mutable state. > Rust is | designed from the ground up to facilitate this | | Does async code with mutable state sometimes result in race | conditions in rust? | | I have had that experience with JavaScript. For example old | school short timeouts with a function written with a closure | implicitly expects the state to remain the same but a race | condition happens asynchronously to change the state, and then | the closure is called, and then the closure code fails. A problem | of unintuitive surprise to the programmer that can happen with | async code (even without thread concurrency). | buybackoff wrote: | I would change _to edit_ MT code. It usually starts with a clear | picture in mind... But even one 's own code quickly becomes | cryptic without lots of comments and hidden assumptions. I used | to have this picture printed and put above a server in a small | startup-like team. The whole thing ended not related to any lock | or interlocked call though. | dang wrote: | Discussed at the time: | | _Must Be This Tall to Write Multi-Threaded Code_ - | https://news.ycombinator.com/item?id=9905374 - July 2015 (112 | comments) | eftychis wrote: | The problem with concurrency and parallelism is ownership -- | mainly that is write access than coherence. I have seen again and | again the bad habit of people to bypass well defined set ups of | data ownership because they feel they are smarter, it is | unnecessary, we will figure it out down the road, and my | favourite: it doesn't solve deadlocks so why use anything as some | bug families remain. | | The issue is not of frameworks or type systems but of companies | and people. Engineers know this stuff by training, but either are | new and commit hubris or management makes them (or maybe some of | them are simply untrained -- and somehow never went to a bank | (queues)). | | P.S. For fun instance, I have seen unsafe used to bypass Rust's | type and ownership system to keep track of db snapshots in a | "nice setup." Rust was not wrong. | YesBox wrote: | I recently wrote some multi-threaded code (for the first time) | for a game I'm working on. Luckily the data I need to process is | in a (C++) vector, so I was able to break the data up by dividing | the vector size by available cores. I certainly felt 10 ft. tall | when it worked! | | Another area of my code where concurrency will greatly benefit it | requires updating an unordered_map. Since it's positioned based | and not unit_id based, I'll need to figure out a way to prevent | data races. The best option I've thought of so far is assigning | thread numbers to each edge in my graph, and storing the units in | each edge in a queue, which should prevent data races and car | collisions. (edit: ha) | | That being said, I'm the sole developer on this project and I | cannot imagine working on a multi-threaded application with | dozens of other developers. It is nice dipping my toes in the | water though. | kerblang wrote: | Most of us work in information systems, which is to say, doing | database stuff all day, usually with an RDBMS like MySQL, | Postgres etc. And what are we doing? Grabbing a bunch of | transactional locks. Against what? Shared mutable state, and not | in-memory state, but database state counting into the thousands, | millions, billions or more records. And if we do it wrong? | Deadlock, often deadlock across multiple processes. And what do | we do about that? We deal with it, because what else can you do? | | You could of course throw transactions away and declare them more | trouble than they're worth, but that probably won't go over well. | I've found it helps to have a "consistent first lock", so for | example in a consumer-based system you'd lock the customer record | first, because most transactional operations happen in that | context. If you always have that consistent first lock then | deadlock can't happen. | | My point is that if I assert "multithreading is unacceptable!", | most of business information systems goes out the window on the | same principle, because the locking is even more dastardly - | multi-process and persistent state instead of in-process in- | memory state. I don't think you could throw | actors/schmactors/coroutines/goroutines/etc at such usefully | either. And if you said "Well only the best programmers need | apply," well BIS is not exactly hotshot heaven, by and large. | It's a bunch of grunts. | | So I agree multithreading is hard when you are locking many | things at once, which I try to avoid. But I just don't get this | thing of trying to banish it or make it the exclusive domain of | geniuses. | quantified wrote: | 2015 | gumby wrote: | Not that anything has improved since then. | xwdv wrote: | Aye, it feels like we've mostly reached a plateau and arrived | here through short terminism. | hedora wrote: | Well, Rust is replacing C++ incredibly quickly. | spacechild1 wrote: | Citation needed. | hedora wrote: | Amazon somewhat quietly replaced the S3 service | implementation with a rust reimplementation, then | announced it after the transition landed without making | any bad headlines. | | I've heard Microsoft has moved to Rust for all new | projects (there is an exception process for C++, of | course). | | Parts of Chrome use it now. | | Support for using Rust in the mainline Linux kernel | should land soon. | Kab1r wrote: | Go, Erlang/Elixir maybe, but as much as I like Rust, it has | not made writing _concurrent_ programs easier imo. | hedora wrote: | I write concurrent systems code for a living. Rust is | much easier to get right than go, because it uses the | borrow checker to statically check for correct | synchronization. (Check out tokio.) | | I haven't used erlang/elixir. | ntoskrnl wrote: | Seems to be a theme at Mozilla. | | > Must be this tall to use `unsafe` | | https://i.stack.imgur.com/w9XMh.jpg | khuey wrote: | That's more or less the same spot in Mozilla's SF office. | kcl wrote: | I've written on this problem before: | https://kevinlawler.com/parallel | | One issue is that the pthread model is too low level (think AND | and OR gates), but you must include it for completeness. | | I agree with the comments here that say higher-level | multithreading support needs to be "baked in" to the language. | Grand Central Dispatch did a good job of raising the bar here, | but has its own problems, and is arguably too low level. | | Another issue is that existing, widespread languages are too old | to have cooked higher-level threading support in. You really have | to do it from the start. C++ bolt-on parallelism might as well be | syntactic sugar. Newer languages, like Rust, have attempted to | solve concurrency problems but failed - it's not clear that the | designers genuinely understand concurrency at all. Goroutines | solve one limited issue of many and seem about as difficult as | asking programmers to write threaded programs. | | To do this correctly you need a new language that cooks it in | from the start. There is a laundry list of things that have to be | done precisely. It might be possible to do this by rewriting a | reference implementation of an existing popular language, but the | development overhead and performance concerns involved in | backwards-compatibility make that probably infeasible. (Python | can't even escape it's global-interpreter lock.) So likely the | language that solves this problem is either unwritten or unknown. | | Other mentions in the comments here: | | ThreadSanitizer: It's not very good, nor can it be. | | Erlang: solves the problem, but the actor formulation is too | awkward for all but Erlang-specific use-cases. | lostcolony wrote: | >> Erlang: solves the problem, but the actor formulation is too | awkward for all but Erlang-specific use-cases. | | Err? I mean, it requires rethinking a bunch of the lessons you | learn when concurrency is hard and expensive and to be | minimized, sure. But when concurrency is easy and cheap and to | be embraced, a bunch of things become a lot easier and more | natural to write. | | My go to example, from a production system, was a complicated | task scheduling system (with user and hardware interrupts to | handle). Sure, we could have written it using threads and a | priority queue or something...instead we just had one Erlang | process per task, with interrupts routed to that process as | messages (and linked timer processes that fired messages to | those processes for the scheduled stuff, essentially just | another type of interrupt/message). Super simple to write, | super simple to reason about, never had any sort of race | condition or locking issues, across years of development and | production use. | waynesonfire wrote: | > Erlang: solves the problem, but the actor formulation is too | awkward for all but Erlang-specific use-cases. | | Play with Erlang a bit and you'll find the modern band-aid | pattern of async / await awkward. | tialaramex wrote: | > it's critical to explore ways to incrementally add safe | concurrency in C++. | | I don't buy it. Some things are foundational and attempts to add | them incrementally are doomed because your foundations won't | support the work. After building them easy things seem easy - but | then hard things tear out your inadequately founded "incremental | addition" and make a huge mess everywhere. | | Google has a team doing this sort of thing (indeed David Baron | might even be on it for all I know), for exactly the same reason, | they wrote a web browser in C++ and it has threading problems and | they wish they could fix that. So Mozilla isn't exploring terra | incognita here, and if there's something to be found I hope they | find it, but I expect nothing. | nemothekid wrote: | > _So Mozilla isn 't exploring terra incognita here, and if | there's something to be found I hope they find it, but I expect | nothing._ | | This post was written in 2015. Looking back I think we can | infer: | | 1. They found a solution (create a new language) | | 2. From the success of that language, it seems to deliver on | its promise (although it has its warts) | | 3. They failed to incorporate that into the web browser because | rewriting a web browser as a momentous task that had little | measurable business impact; and decided to layoff the engineers | instead. | | Google is a wealthier company, but they may decide that trying | to "fix" C++ is a more realistic endeavor. | tialaramex wrote: | Rust is mentioned in the blog post. | | Firefox has a whole bunch of Rust in it, much more than it | did when that blog post was written. Mozilla let go some core | Rust people, but of course it still employs numerous Rust | programmers. | dblohm7 wrote: | (Former Mozilla employee here) | | Rust is encouraged to be used by all developers who are | writing native code (it's not supposed to be just the | domain of a select few). As I recall, the message was, "Not | knowing Rust must not be an excuse for why you're not using | Rust." Of course developers must still contend with the | practicalities of integrating Rust into an existing C++ | codebase, so some guidelines were developed: | | https://wiki.mozilla.org/Oxidation | steveklabnik wrote: | > They failed to incorporate that into the web browser | | Rust is just over 10% of Firefox at the time of writing this | comment https://4e6.github.io/firefox-lang-stats/ | | > and decided to layoff the engineers instead. | | They didn't lay off people writing Rust in Firefox, they laid | off people working on Rust itself. | nlewycky wrote: | I don't have a photo but we used to have the same sign at Google, | with an added note "unless accompanied by ThreadSanitizer". | | At runtime: https://clang.llvm.org/docs/ThreadSanitizer.html | | At compile-time: | https://clang.llvm.org/docs/ThreadSafetyAnalysis.html | | The article points out that that: The resulting | invariants end up being documented in comments: // | These variables are protected by monitor X: | | but you can write those in C++ code and have the compiler check | them. That's what thread safety annotations are for. | kubanczyk wrote: | TIL that, so predictably, "[Go's] race detector is based on the | C/C++ ThreadSanitizer runtime library". So, I can definitely | confirm its usefulness. https://go.dev/blog/race-detector | new_stranger wrote: | Assuming you don't need bare-level performance and Go is okay, I | love how easy and approachable parallel code is with Go. Locks | and channels each offer trade offs you can easily benchmark using | the built in go test. | | I've written so many different concurrent approaches to workers, | pipelines, and multi-stage processing systems and found it always | easy to reason through ways to limit the number of workers, | prevent duplicate work, and implement backoffs. | anonymousDan wrote: | I don't know, maybe it's the sadist in me but I enjoy the | challenge of writing fast and correct multi-threaded code. It's a | valuable skill if you can master it. | SilasX wrote: | Skill? It can't be reduced to following easily checkable | heuristics and rules? | anonymousDan wrote: | I don't claim to have mastered it! | layer8 wrote: | You mean masochist? ;) | | I'm similar, though I wish programming languages would gives us | more facilities to statically double-check the correctness | reasoning. | oriolid wrote: | Masochist? Think about the poor programmer who has to | maintain GP's code :) | tanyar wrote: | This is the way. That code breaks when someone new comes | along and makes several modifications that has implications | which take a long time to rear their ugly head. | | Broken multi-threaded code behaves a lot like perfectly | coded multi-threaded code....until you get a lot of traffic | through that code, you know, like on a busy holiday weekend | when lots of revenue is flowing through the business! | ChrisMarshallNY wrote: | Low-level thread programming is a minefield. I've been doing | IRQ/concurrent programming since the 1980s, and still hate it. | Thread bugs are a nightmare. | | Concurrency needs to be "baked into" higher-level languages, | development APIs, and standard libraries. Basically, covered in | bubble-wrap. That's starting to happen. It won't be as efficient | as low-level coding, but it is likely to still have a significant | impact on the performance of most code. | | And folks that are good at low-level threading would be well- | served to work to support this. | hinkley wrote: | I studied distributed computing in college, and I spent a lot | of time not quite internalizing the fact that since this was an | elective at one of the highest ranked state schools in the | nation, probably most other people didn't have the same | information I did. | | I ended up doing a lot of concurrency work early on because I | was good at it, but over time the evidence just kept piling up | that nobody else could really follow it, and so I've used it | less and less over time, and in particular try very hard to | keep it away from the main flow of the application. It's more | something I pull out for special occasions, or to explain bugs. | | Where a lot of frameworks fail is that while re-entrance and | concurrency are different problems, they share a lot of | qualities, both computationally and with the limits of human | reasoning. Recursive code with side effects looks and acts a | lot like concurrent code, because here I am looking at some | piece of data and the moment I look away some other asshole | changed it out from under me. Most frameworks end up being a | bad study in recursion, usually in the name of reducing code | duplication. | | Pure functional people love recursive code, but it's the pure | part that avoids the problems with trying to interlace multiple | activities at once. Without that, you're trying to eat your | dessert without eating your vegetables first. | [deleted] | nick_ wrote: | Pony has concurrency baked in via high-perf actor | implementation. It's really nice. I believe Go has this as | well, but it also has low-level concurrency API as well? | skohan wrote: | Idk I think it is not as difficult as it is sometimes framed. I | think the key is to design your program/separation of concerns | in such a way as to make it easy to reason about concurrency, | and minimize the need for synchronization points. | | I think a lot of the problems arise when you just try to write | normal synchronous code, or to parallelize code which was | originally synchronous, and don't realize the implicit | constraints you had been relying on which no longer hold when | concurrency is introduced. | ChrisMarshallNY wrote: | FP is good for adapting to threading, but it has | difficulties, when applied to things like GUI programming, | async communications, or device control (places that need | threading). | | A lot of languages are getting things like async/await/actor, | etc., but even that is still too low-level for a lot of | programmers. It can easily turn into a lot of single-threaded | code. | | It needs to be completely under the surface. Swift does that | with generics (I suspect other languages do it, as well). For | example, you can say Array<Int>, or [Int]. They mean the same | thing, but one does not have the generic syntax. | | If we can do similar stuff with things like mutexes and | syncing, then it will go a long way towards highly | performant, safe, code. | skohan wrote: | It's interesting that you choose Swift as an example, | because I think in some ways Swift does a poor job of | accommodating async using the "bubble wrap" approach. | | Specifically I am talking about the default of using atomic | reference counting on all references. This makes it | somewhat idiot proof at least with respect to avoiding | NPE's, but it comes at a huge performance cost (especially | on x86 which I guess is becoming less of a concern for | Swift code). | | I think this is a fundamental issue: a big part of the | benefit of concurrency and parallelism is supposed to be | increased performance. However a lot of the ways to make | concurrency "generally safe" tend to hamstring performance, | because you have to protect against all the different types | od errors programmers can make, which basically means a lot | of expensive synchronization. | | Maybe there is a way to hide this completely, as you say, | in a performant way. To do this, I think you would almost | have to have either some kind of really clever risk-free | data structures, or else a very smart compiler. | | Another approach might be to keep concurrency at arms | length from the average programmer. I.e. hide 90% of it | inside of libraries, and expose safe interfaces which the | programmer interacts with in a mostly synchronous way. | ChrisMarshallNY wrote: | _> I think in some ways Swift does a poor job of | accommodating async using the "bubble wrap" approach._ | | I agree. Async/await has a "kludgy" feel to it (to me). | It's "added on," not "embedded in." But I'm not a | language writer, so I don't have alternatives to offer. | | I was talking about how it does generics. I think it does | that quite well. | dgb23 wrote: | To me it feels like I'm folding and knotting code. | skohan wrote: | I agree with you there. Swift has some really excellent | syntactic sugar. | hinkley wrote: | Based on a non-scientific study, I think the spatial thinkers | do great with concurrency, the visual thinkers do okay, and | everyone else is in real trouble. Which reminds me, I need to | interview my developer friend with aphantasia about how he | feels about concurrency. | | Concurrency should be the sizzle and not the steak, otherwise | you're reducing the diversity of your team rather | substantially. Good people are hard enough to find. Driving | the people you have away doesn't generally end well. | [deleted] | SleepyMyroslav wrote: | just my 2c Friday rant on old article. | | Games had to adapt to threading before I even joined. I got lucky | to work with some very efficient threaded systems. They were | never correct or safe xD. | | But I can share that if you see comment that these variables are | protected by this lock more that few times per million of lines | of code performance is not the reason why it is threaded. | | From public videos i think best explanation is available from | Sean Parent "Better Code: Concurrency" [1]. | | I really would like to see high level or safer languages or | systems to come to future game platforms. It should be possible. | It just needs economical factors to kick in. Like people finally | notice that slow and safe is not going to be any faster next year | with new hardware. | | 1. https://youtu.be/32f6JrQPV8c | muglug wrote: | I'm glad that Rust has changed this, at least in my very specific | case: I wrote a command-line application in PHP and adding | parallelisation was a long journey of trial and error and | concurrency bugs that took over a week to get right. | | I rewrote the same application in Rust. I followed the Arc/Mutex | documentation and parallelised it over a few hours. It worked | perfectly. | chasil wrote: | "If the fastest chips have N cores, a mostly-single-threaded | program can only harness (1/N)th of the available resources. As N | grows (and it is growing), parallel programs will win." | | This win will be constrained by Amdahl's law. A single | program/process that is multithreaded can effectively use only a | finite number of cores. | | To use more resources, the task must be broken into multiple | processes. | | It can be faster to run $(nproc) number of gzip processes in | parallel than to use pigz, for example. | | https://en.wikipedia.org/wiki/Amdahl's_law | spacechild1 wrote: | > This win will be constrained by Amdahl's law. A single | program/process that is multithreaded can effectively use only | a finite number of cores. | | This assumes that the workload is fixed. A classical example | would be compiling source code: you have a fixed number of | source files and the compiler is only able to parallelize so | much. | | On the other hand, there are applications where more threads | allow you to increase the workload itself. For example, a | modern DAW (digital audio workstation) is able to render tracks | in parallel; if you have more threads, you can use more tracks | with more plugins. In a computer game, more threads might mean | more complex simulations, more objects, better graphics, etc. | cortesoft wrote: | Also, the major change with having so many cores is the ability | to do lots of different things at once on a single machine. You | don't need any single process to utilize all the cores on a | machine, you take advantage of all those cores by having a | single machine do a lot do things. | ajuc wrote: | Doesn't help you at all if you want the game to run at 60 fps | and instead you can run it 8 times in parallel at 30 fps | each. | chasil wrote: | It remains important not to buy cores that you cannot use, | especially if there are other constrained components that | would have a larger performance impact. | | Sophie Wilson (creator of the ARM instruction set) spends | some time on these issues, also stressing core throttling | and shutdown due to heat. | | https://www.youtube.com/watch?v=zX4ZNfvw1cw | JoeAltmaier wrote: | There was a paper long ago that showed duality between | semaphore/locking code and message-queuing code. So folks figured | they were the same. | | Not so! semaphore/locking is very hard if more than one semaphore | exists. Look at the 'dining philosophers problem' and so on. | | But queuing! Done right, that can be proved correct through | static analysis. Do threads that consume one queue, block on | another queue? Draw the graph of queue-blocking - does it loop | anywhere? Then you could have a problem. | | I.e. if your message-consumers don't block at all, then you | cannot have a problem with deadlock. | | You CAN have a problem with messages stalling however - | languishing on a list waiting for something to complete that | might never complete. But at runtime this can be debugged fairly | easily - if all your queues are visible to the debugger. | | In the semaphore implementation, the locking situation depends on | the state of threads and their stacks, which all have to be | frisked and back-executed to find out who holds what lock etc. | Not always debugger-friendly to do. | | I favor a multi-threading environment of threads dedicated to | queues, and all the queues visible to the debugger. That kind of | setup has never done me dirty. | btilly wrote: | _I favor a multi-threading environment of threads dedicated to | queues, and all the queues visible to the debugger. That kind | of setup has never done me dirty._ | | You mean like Go does with channels? (Not sure how good their | visibility is to the debugger though.) | fmajid wrote: | It's based on Tony Hoare's _Communicating Sequential | Processes_ , and is far more comprehensible and possible to | reason about than primitives like mutexes and semaphores that | are too close to the underlying hardware implementation. | | http://www.usingcsp.com | lostcolony wrote: | Probably more like Elixir/Erlang, if we're talking | programming language. Though even there, BEAM processes are | the unit of execution, and are multiplexed on threads. Parent | references OS development elsewhere. | | Go has a couple of deviations; channels aren't queues unless | you are careful to size them > the number of things you'll | ever put in them (else the thing trying to put something on a | channel deadlocks; you can of course timeout on it, but it | presents tight coupling in that case, the goroutine sending | has to care about the state of the thing receiving), | goroutines aren't dedicated to channels (that is, there is an | m-n relationship between channels and goroutines which can | lead to a lot of incidental complexity), and, you can see | what is on a channel if you have a breakpoint, but that | assumes you can get the code to break inside of something | containing a reference to the channel. | convolvatron wrote: | its a _little_ more costly depending, but otherwise I agree | with you. it also clearly delineates what data is shared and | everything else can be assumed private...so it helps make the | overall architecture more explicit | Etheryte wrote: | This sounds interesting and I've implemented simple ideas | similar to the pattern you describe before, however I haven't | read about its use in depth. Do you happen know of an | article/book/resource that describes this along with real world | experiences? If not, would you mind writing a blog post or | article on it please? | JoeAltmaier wrote: | I had the benefit of cutting my teeth at my first job, | working on a message-passing OS. CTOS, based on RMX/86 used | messages for everything. It was a very early networked | computing system. Diskless workstations where file messages | simply got forwarded to a server etc. And all the messages | and queues were visible in the debugger! | | So I learned good thread hygiene right out of school. | dekhn wrote: | the only model I use with threads is either pre-static (start a | function on each thread with unique parameters, result is | returned to a driving function at the end) or message-queue | (each thread is blocking, waiting for a message). For the | former, I use shared memory, but it's read only and guaranteed | to live longer than any of the worker threads that use it | (passing a consted ref_ptr). For the latter, I don't share any | memory at all; the message itself contains the parameters, | fully materialized. | | I can ensure all my queues are loop-free (directed acyclic | graph of workers reading from their parent nodes and writing to | their children nodes). IIRC one of the complaints about the | petri net model is that it was unprovable that any problem | could finish, even trivial ones. | | This has worked pretty well for me and I usually don't have to | debug any truly nasty thread problems. My biggest problem tends | to be thread pools where one of the workers gets wedged, and | there's no way to cleanly kill the worker thread and steal the | work onto another thread. | JoeAltmaier wrote: | Particularly network activity! At the lowest levels the | creaky old TCP/IP stack not amenable to non-blocking. | anonymousDan wrote: | One trick I used to use when writing multithreaded java code | for debugging purposes was to never use the built in | synchronized/implicit object lock, since that made it quite | difficult to understand from a stack trace what object was | being locked. Instead, we would define an explicit inner lock | and lock on that explicitly. The class name would then show up | in stack traces making it much easier to track down. | hinkley wrote: | In cases where not all operations on the objects necessitate | the same lock semantics, this is also handy because you can | split or nest locks to reduce write-blocks-read and read- | blocks-write situations. | | But if you're finding yourself using this more than a couple | of times in a project, you're probably uncovering violations | of the Single Responsibility Principle. Especially if you're | splitting locks. | itsmemattchung wrote: | One of my _favorite_ threading papers published in 1982, | written by Andrew Birrell: | | https://s3.amazonaws.com/content.udacity-data.com/courses/ud... | | Reading about that paper in graduate school cleared up a lot of | misconceptions I had about threads | morelisp wrote: | I mean that's a nice theoretical argument, but the empirical | results are not so clear. | | https://dl.acm.org/doi/10.1145/3297858.3304069 | | _Surprisingly, our study shows that it is as easy to make | concurrency bugs with message passing as with shared memory, | sometimes even more. For example, around 58% of blocking bugs | are caused by message passing._ | | ... | | _Our study found that message passing does not necessarily | make multithreaded programs less error-prone than shared | memory. In fact, message passing is the main cause of blocking | bugs. To make it worse, when combined with traditional | synchronization primitives or with other new language features | and libraries, message passing can cause blocking bugs that are | very hard to detect._ | | I'd also add that in the modern world we've "solved" many | multi-threading problems by splitting the queues and data | stores out into multiple services. Now we don't have any | threading problems - only "distributed system" problems, and if | there's a bug there that's the architect's fault, not the | programmer's! | JoeAltmaier wrote: | Yeah so many services APIs are blocking, it's hard to keep | threads non-blocking. Often there is a way, but you have to | dig for it. | syrrim wrote: | Dining philosophers is solved by always acquiring locks in the | same order. | [deleted] | chrisseaton wrote: | ...if it's possible to do that for your application! It isn't | always, if what lock you need next depends on the value of a | previously locked value. | JoeAltmaier wrote: | Right, perhaps in a nested call or library, maybe even a | semaphore you didn't know about! | | Multithreading with semaphores/locks is about as dirty as | it can get. | syrrim wrote: | The top level comments strategy with queues seems similar, | in that it isn't always possible. It requires the queues | form a DAG, which is another word for partial ordering. | Thus, requiring the locks form a partial ordering doesn't | seem like a comparatively excessive restriction. | chrisseaton wrote: | > doesn't seem like a comparatively excessive restriction | | I think there are some applications and algorithms that | we just don't know a way to create an ordering for | realistically fine-grained locks. | | For example airline seat reservation. Someone clicking on | seats to reserve them. You can't apply locks in a total | order because... you can't read the person's mind on what | seat they're going to click next. | | Or Lee's algorithm. | | (I wrote the first half a PhD on this.) | | https://chrisseaton.com/research-symposium-2012/seaton- | irreg... | jerf wrote: | I think it's gotten easier over time, actually, and even when | this was written it was theoretically out of date. | | There is a famous paper about how bad threads are from the 1990s. | I believe it is valid on its own terms but misdiagnoses the | underlying issue. The underlying issue is this: Taking multiple | locks at once is not a sane operation at scale. Any system built | on this will fail. | | There is no magic bullet that solves all concurrency problems, | but a combination of techniques, used with discipline, take it | from insanely impossible to merely challenging: | | 1. Assign data to a thread, with only that thread able to access | it. Offer concurrency via message passing to that thread alone, | and don't allow references to escape from the thread (pointers, | whatever you want to call it). Language support for this is nice | (Erlang and Pony do it one way, Haskell another, Rust yet a | third) but not technically necessary. | | 2. Using locks is valid, and even quite helpful and useful, but | never let yourself take more than one. If you need transactions, | see #1. Don't write code that encourages people to take more than | one. Don't forget about things that may take locks without you | thinking about it (e.g., in Go, taking a mutex _and then_ trying | to send on a channel is taking two locks); this is one thing that | can be a bit of a challenge. | | 3. For more complicated things, use libraries as much as possible | that have solved the super complicated cases already. For | instance, databases that offer transactional security. You can do | this even if you don't otherwise "need" such databases. Or | libraries that offer a "parallel map" within your language, or | other canned solutions to parallelism/concurrency. | | 4. Use whatever support your language provides for "linting" your | conditions. The Go race condition checker is theoretically | insufficient but practically useful (passing it doesn't mean that | you have a safe program but it's a good start). Your language's | mileage may vary but take advantage of what you can. | | Honestly, just these things taken together can take you quite | far. Again, these aren't magical solutions. There is an | irreducible sense in which threading is harder than not | threading, and I don't expect that to ever change. There will | always be programmers that are more expert in it than others. But | it doesn't have to be an impossible challenge anymore. And to a | large degree, the challenge becomes less "solving the multithread | problem" and more just figuring out how to architect systems | under the restrictions of the patterns that work and compose | together. | | Although all that said, at a browser scale concurrency is always | going to be a problem. They gotta go fast, but that fights the | safe patterns pretty hard because they do involve a certain | amount of runtime slowdown (e.g., passing a message is always | going to be slower than just _not_ passing a message). Rust is | the only practical solution at that scale. But just as You Are | Not Google, You Are Not Writing A Browser. More normal programs | can go a long way with just what I laid out above. | dllthomas wrote: | > Language support for this is nice (Erlang and Pony do it one | way, Haskell another, Rust yet a third) but not technically | necessary. | | At one point I did some name mangling magic with the C | preprocessor for "this can only be accessed from the thread | (statically) named foo" for a project where that was relevant. | It was pleasant, although limited in how much it could | generalize - all the threads very much needed to be statically | known. | dandelany wrote: | This is an excellent comment that clearly comes from a lot of | experience. Can you say any more about why having multiple | locks is such a red flag? Is it because you start getting into | deadlock/hungry philosophers territory? Are there rules of | thumb for using multiple locks that make it more tractable or | is it just always a bad idea? | dgb23 wrote: | Not OP, but that might be an implication. Another, more | general reason is that this kind of concurrency requires | design. It can't be thought of as a defensive programming | technique. Locks are too powerful and they are by definition | a point of contention. | | I think providing a lock is like giving up control to someone | else. It's inherent coupling. To contrast, when I first | learned about them I thought of them as a protection | mechanism of an isolated part, but they are an execution | contract that everyone participates in. | jerf wrote: | "It's inherent coupling." | | This matches well with one of the beliefs I've been | polishing up over the past few years which I don't think is | well understood, which is that structured programming tends | to create more coupling than we realize in deep call | stacks. When you end up with a 2000-line call stack in | Java, that's 2000 function invocations coupled together by | the way structured programming works; for instance, at a | bare minimum, that's 1999 functions that can't progress | until the innermost one does, and that's not the only form | of coupling. | | I think the coupling induced by structured programming is | fairly light per stack frame on average, but they tend to | add up quickly because they're just so darned easy to add | more of. | | Locks are another thing that makes this really come to | light. The more stack frames between some function and | something above it in the stack that has taken a lock, the | more likely it is for execution to eventually wander back | into something trying to take a lock inadvisably. | | I mean, I've had a number of deadlocks just within an | object where it has some internal lock, and a method takes | that lock, then tries to call a method that takes the same | lock. Whoops. Easy to fix, of course, but that's the easy | case. The cases get harder in real code. | ntoskrnl wrote: | The rule of thumb for multiple locks is always acquire them | in the same order. Lock A, lock B, unlock B, unlock A. | Otherwise you risk a deadlock. If another thread locks B, | then tries to lock A, you're in trouble. | mandevil wrote: | And keeping that discipline over many developers, over many | years, is in practice impossible, which is why multiple | locks seem attractive but are actually a major problem for | complex systems. | layer8 wrote: | You can, of course, encapsulate such a lock order with | dedicated classes (or functions/closures) modeling linear | types enforcing the sequence of lock A > lock B > unlock | B > unlock A. The resulting objects can still be | misapplied, but should fail immediately with a runtime | error, making it much harder to inadvertently misuse. | jerf wrote: | But that still doesn't scale up to more locks of that | size very well, because you start trying to take even | more of them, and in dynamic orders, and etc etc. | | I know about the theoretical solution of taking locks | always in a defined order, but I don't consider it a | practical solution. If taking locks willy-nilly lets you | scale to program size X, this might give you 2X or 3X, | but that's not nearly large enough in practice. | | & dandelany, the community has chipped in and answered | for me. I endorse this subthread, at least, as I write | this comment. :) | anarazel wrote: | IME there's plenty problems where nested locks will lead | to a simpler solution than trying to avoid them. It's | important to avoid when reasonably doable, but it's also | not uncommon to take it too far and end up with a slower | and way more complicated solution. | secondcoming wrote: | What is 'browser scale'? I'd imagine most browsers spend most | of their time idle. | Slartie wrote: | If you want to see some of the most complex multithreading | systems, take a look inside modern browsers. That's what he | means by "browser scale". | | I have some knowledge into working with Chromium's internals, | and it is crazy what's going on inside there. Several | different job and message queue systems interlocking with | each other, tasks spread across multiple processes getting | split and regrouped and synced and recombined, practically | everything running async, just so a few pixels are rendered a | little faster when your multicore machine updates a small | section of a website. | | Browsers are indeed idling a lot, but if you want them to do | something, they must be as crazy fast as possible. The | competition in that space is ridiculous. | jerf wrote: | In the tens of millions to hundreds of millions of lines of | code. Browsers are some of the largest things out there. See | also desktop office suites, multi-decade CAD programs, a few | other things. | | Few programs are at that scale. IIRC, even the Linux kernel | isn't _really_ at that scale, once you clean away a few files | in the drivers that are just literally millions of #defines | or something. | | If you're writing at that scale, by all means copy the | necessary practices, but don't just casually assume your need | is super large and you've just gotta use the hard-core lock- | based concurrency because it's the only option for you. You | can get a _long_ way with much nicer architecture that is | only slightly slower. | | For instance, I write a lot of Go. I've sometimes gotten a | bit nervous about writing a goroutine (/thread) that owns | some particular data structure because I'm like "wow, a lot | of things are going to access this, is this going to become a | bottleneck in the system?" But every time I've run the | analysis carefully, the answer generally comes back not just | "no" but "heck no, not even close". Your intuition can | deceive you very easily. Locking a single structure behind | the limit that "we can only dedicate at most 1 CPU full time | to maintaining this structure" is, in practice, far less | scary than it sounds. 1 CPU is a lot of power and you | generally need a lot of other CPUs trying to access it | simultaneously to overwhelm it. | | Ahdahl's Law kinda works in your favor on this matter (for | once); for something like this to be the bottleneck, it needs | to approach accounting for (1/(CPUs - 1)) of your execution | time, and for that to be the case, those other processes | hammering this 1-CPU-locked data structure need to be doing | whatever else they are doing very quickly. As with anything | in discussions like this, it absolutely can happen! But | always check your intuition concretely before going crazy | adopting something more complicated and "higher performance"; | odds are good this isn't a problem for a given "random" data | structure. | macintux wrote: | > In this approach, threads own their data, and communicate with | message-passing. This is easier said than done, because the | language constructs, primitives, and design patterns for building | system software this way are still in their infancy. | | "Infancy"? Erlang was invented in the 80s. | tonnydourado wrote: | OP does say system software, and although Erlang can have | amazing uptime, I'm not so sure about it's raw performance | story (on the other hand, I do remember seeing some insane | benchmarks from a elixir framework, so, maybe I'm wrong). | | Also, other than Erlang and Elixir, which other reasonably | mainstream language has this as first class features? Even Rust | doesn't really put queues and message passing front and center, | it just makes it easier to get away with doing _traditional_ | thread programming. | | Things can be old in years since discovery/invention, but still | in it's infancy in usability and adoption. | Lammy wrote: | > Also, other than Erlang and Elixir, which other reasonably | mainstream language has this as first class features? | | Ruby 3.0+ with Ractors: https://docs.ruby- | lang.org/en/master/ractor_md.html#label-Co... | kubanczyk wrote: | Go has channels (`chan`) which it considers more native than | e.g. Mutex. The latter is in a stdlib, the former is a | feature of the language proper. | | Alas, the Go ecosystem could use more channels. I'd say that | the adoption is still at the infancy stage. I wonder whether | there are any practical shortcomings or is it just a | readability tradeoff (the Mutex tends to be super-readable, | where the channels not so). | moonchild wrote: | And the CSP paper dates to 1978. | bjourne wrote: | Truth be told, it is trivially easy to create deadlocks even in | a pure message-passing environment such as Erlang. Message- | passing is way oversold and doesn't solve as many problems as | people think. | TheMagicHorsey wrote: | Erlang made it pretty easy. | | But then not everyone wants to learn Erlang. And if you start | writing NIFs all crazy, you might get yourself into trouble | again. | | But it seems tractable. | | Am I being naive? | hereforphone wrote: | No, you're absolutely right: Erlang is magnificent and | superior. | ghufran_syed wrote: | I have very little experience of writing concurrent code, but | this seems relevant: | https://clojure.org/about/concurrent_programming | xigency wrote: | I don't really support this view. | | While doing anything creative with threading is in fact | challenging, in Java it's easy to safely boost performance with | threading using parallel streams and concurrent data structures. | | Even in other languages like C/C++, it's mostly worth it to | design something up-front that is naturally parallelizable and | find a safe and correct way to do it. | | And the incentive to parallelize is only increasing with the | increasing thread counts of newer hardware. Sure, it's scary, but | there's really no excuse to avoid running with threading, | multiple processes, or multiple services. | jayd16 wrote: | In my experience I saw a bunch of novice devs using parallel | streams for N<=100 lists and such, for much worse performance. | Its certainly not foolproof either or it would just be the | default. | marginalia_nu wrote: | To be fair, whether parallel streams is useful isn't a | function of N alone, but the time each operation takes. If | it's a nontrivial calculation, then by all means. | | Although parallel streams are seldom the best option in terms | of performance even for large N. They may often be faster | than sequential processing for large N, but a more thought | out approach is almost always faster still, often | significantly so. ___________________________________________________________________ (page generated 2022-06-10 23:00 UTC)