[HN Gopher] Must be this tall to write multi-threaded code (2015)
       ___________________________________________________________________
        
       Must be this tall to write multi-threaded code (2015)
        
       Author : luu
       Score  : 81 points
       Date   : 2022-06-09 18:53 UTC (1 days ago)
        
 (HTM) web link (bholley.net)
 (TXT) w3m dump (bholley.net)
        
       | bena wrote:
       | I think warnings like this are good and necessary. A lot of
       | people are going to ignore this warning simply because they
       | wrongly believe they're good enough to ignore it. And they're
       | going to try and write this massive project simply to prove that
       | the warning doesn't apply to them. And they will serve as great
       | examples of why that warning is there in the first place.
       | 
       | But there's another group. This group heeds the warning. They do
       | their best to not violate the warning. But then they run into an
       | issue. They exhaust every possible option trying to resolve the
       | issue. But eventually, they realize that if they can violate the
       | warning, just a little bit. And they're careful. And they violate
       | the warning in the least way possible and take every action
       | deliberately. And it works.
       | 
       | Because the warning doesn't mean it's impossible, it just means
       | that it is incredibly difficult and should be approached with the
       | utmost care.
       | 
       | Eventually, we do learn the pitfalls, internalize them, abstract
       | them away, etc. And then we can even do away with the warning.
       | Because it's no longer the edge of the universe, it's just
       | another stop along the way.
        
       | bee_rider wrote:
       | Just sprinkle in some OMP PARALLEL DO's it'll be fine.
        
       | bitwize wrote:
       | I knew I was right to trust Steve Summit with multithreaded stuff
       | back when I worked with him!
        
       | robocat wrote:
       | > Buggy multi-threaded code creates race conditions, which are
       | the most dangerous and time-consuming class of bugs in software.
       | > safe and scalable parallelism is achievable by minimizing or
       | eliminating concurrent access to shared mutable state. > Rust is
       | designed from the ground up to facilitate this
       | 
       | Does async code with mutable state sometimes result in race
       | conditions in rust?
       | 
       | I have had that experience with JavaScript. For example old
       | school short timeouts with a function written with a closure
       | implicitly expects the state to remain the same but a race
       | condition happens asynchronously to change the state, and then
       | the closure is called, and then the closure code fails. A problem
       | of unintuitive surprise to the programmer that can happen with
       | async code (even without thread concurrency).
        
       | buybackoff wrote:
       | I would change _to edit_ MT code. It usually starts with a clear
       | picture in mind... But even one 's own code quickly becomes
       | cryptic without lots of comments and hidden assumptions. I used
       | to have this picture printed and put above a server in a small
       | startup-like team. The whole thing ended not related to any lock
       | or interlocked call though.
        
       | dang wrote:
       | Discussed at the time:
       | 
       |  _Must Be This Tall to Write Multi-Threaded Code_ -
       | https://news.ycombinator.com/item?id=9905374 - July 2015 (112
       | comments)
        
       | eftychis wrote:
       | The problem with concurrency and parallelism is ownership --
       | mainly that is write access than coherence. I have seen again and
       | again the bad habit of people to bypass well defined set ups of
       | data ownership because they feel they are smarter, it is
       | unnecessary, we will figure it out down the road, and my
       | favourite: it doesn't solve deadlocks so why use anything as some
       | bug families remain.
       | 
       | The issue is not of frameworks or type systems but of companies
       | and people. Engineers know this stuff by training, but either are
       | new and commit hubris or management makes them (or maybe some of
       | them are simply untrained -- and somehow never went to a bank
       | (queues)).
       | 
       | P.S. For fun instance, I have seen unsafe used to bypass Rust's
       | type and ownership system to keep track of db snapshots in a
       | "nice setup." Rust was not wrong.
        
       | YesBox wrote:
       | I recently wrote some multi-threaded code (for the first time)
       | for a game I'm working on. Luckily the data I need to process is
       | in a (C++) vector, so I was able to break the data up by dividing
       | the vector size by available cores. I certainly felt 10 ft. tall
       | when it worked!
       | 
       | Another area of my code where concurrency will greatly benefit it
       | requires updating an unordered_map. Since it's positioned based
       | and not unit_id based, I'll need to figure out a way to prevent
       | data races. The best option I've thought of so far is assigning
       | thread numbers to each edge in my graph, and storing the units in
       | each edge in a queue, which should prevent data races and car
       | collisions. (edit: ha)
       | 
       | That being said, I'm the sole developer on this project and I
       | cannot imagine working on a multi-threaded application with
       | dozens of other developers. It is nice dipping my toes in the
       | water though.
        
       | kerblang wrote:
       | Most of us work in information systems, which is to say, doing
       | database stuff all day, usually with an RDBMS like MySQL,
       | Postgres etc. And what are we doing? Grabbing a bunch of
       | transactional locks. Against what? Shared mutable state, and not
       | in-memory state, but database state counting into the thousands,
       | millions, billions or more records. And if we do it wrong?
       | Deadlock, often deadlock across multiple processes. And what do
       | we do about that? We deal with it, because what else can you do?
       | 
       | You could of course throw transactions away and declare them more
       | trouble than they're worth, but that probably won't go over well.
       | I've found it helps to have a "consistent first lock", so for
       | example in a consumer-based system you'd lock the customer record
       | first, because most transactional operations happen in that
       | context. If you always have that consistent first lock then
       | deadlock can't happen.
       | 
       | My point is that if I assert "multithreading is unacceptable!",
       | most of business information systems goes out the window on the
       | same principle, because the locking is even more dastardly -
       | multi-process and persistent state instead of in-process in-
       | memory state. I don't think you could throw
       | actors/schmactors/coroutines/goroutines/etc at such usefully
       | either. And if you said "Well only the best programmers need
       | apply," well BIS is not exactly hotshot heaven, by and large.
       | It's a bunch of grunts.
       | 
       | So I agree multithreading is hard when you are locking many
       | things at once, which I try to avoid. But I just don't get this
       | thing of trying to banish it or make it the exclusive domain of
       | geniuses.
        
       | quantified wrote:
       | 2015
        
         | gumby wrote:
         | Not that anything has improved since then.
        
           | xwdv wrote:
           | Aye, it feels like we've mostly reached a plateau and arrived
           | here through short terminism.
        
           | hedora wrote:
           | Well, Rust is replacing C++ incredibly quickly.
        
             | spacechild1 wrote:
             | Citation needed.
        
               | hedora wrote:
               | Amazon somewhat quietly replaced the S3 service
               | implementation with a rust reimplementation, then
               | announced it after the transition landed without making
               | any bad headlines.
               | 
               | I've heard Microsoft has moved to Rust for all new
               | projects (there is an exception process for C++, of
               | course).
               | 
               | Parts of Chrome use it now.
               | 
               | Support for using Rust in the mainline Linux kernel
               | should land soon.
        
             | Kab1r wrote:
             | Go, Erlang/Elixir maybe, but as much as I like Rust, it has
             | not made writing _concurrent_ programs easier imo.
        
               | hedora wrote:
               | I write concurrent systems code for a living. Rust is
               | much easier to get right than go, because it uses the
               | borrow checker to statically check for correct
               | synchronization. (Check out tokio.)
               | 
               | I haven't used erlang/elixir.
        
       | ntoskrnl wrote:
       | Seems to be a theme at Mozilla.
       | 
       | > Must be this tall to use `unsafe`
       | 
       | https://i.stack.imgur.com/w9XMh.jpg
        
         | khuey wrote:
         | That's more or less the same spot in Mozilla's SF office.
        
       | kcl wrote:
       | I've written on this problem before:
       | https://kevinlawler.com/parallel
       | 
       | One issue is that the pthread model is too low level (think AND
       | and OR gates), but you must include it for completeness.
       | 
       | I agree with the comments here that say higher-level
       | multithreading support needs to be "baked in" to the language.
       | Grand Central Dispatch did a good job of raising the bar here,
       | but has its own problems, and is arguably too low level.
       | 
       | Another issue is that existing, widespread languages are too old
       | to have cooked higher-level threading support in. You really have
       | to do it from the start. C++ bolt-on parallelism might as well be
       | syntactic sugar. Newer languages, like Rust, have attempted to
       | solve concurrency problems but failed - it's not clear that the
       | designers genuinely understand concurrency at all. Goroutines
       | solve one limited issue of many and seem about as difficult as
       | asking programmers to write threaded programs.
       | 
       | To do this correctly you need a new language that cooks it in
       | from the start. There is a laundry list of things that have to be
       | done precisely. It might be possible to do this by rewriting a
       | reference implementation of an existing popular language, but the
       | development overhead and performance concerns involved in
       | backwards-compatibility make that probably infeasible. (Python
       | can't even escape it's global-interpreter lock.) So likely the
       | language that solves this problem is either unwritten or unknown.
       | 
       | Other mentions in the comments here:
       | 
       | ThreadSanitizer: It's not very good, nor can it be.
       | 
       | Erlang: solves the problem, but the actor formulation is too
       | awkward for all but Erlang-specific use-cases.
        
         | lostcolony wrote:
         | >> Erlang: solves the problem, but the actor formulation is too
         | awkward for all but Erlang-specific use-cases.
         | 
         | Err? I mean, it requires rethinking a bunch of the lessons you
         | learn when concurrency is hard and expensive and to be
         | minimized, sure. But when concurrency is easy and cheap and to
         | be embraced, a bunch of things become a lot easier and more
         | natural to write.
         | 
         | My go to example, from a production system, was a complicated
         | task scheduling system (with user and hardware interrupts to
         | handle). Sure, we could have written it using threads and a
         | priority queue or something...instead we just had one Erlang
         | process per task, with interrupts routed to that process as
         | messages (and linked timer processes that fired messages to
         | those processes for the scheduled stuff, essentially just
         | another type of interrupt/message). Super simple to write,
         | super simple to reason about, never had any sort of race
         | condition or locking issues, across years of development and
         | production use.
        
         | waynesonfire wrote:
         | > Erlang: solves the problem, but the actor formulation is too
         | awkward for all but Erlang-specific use-cases.
         | 
         | Play with Erlang a bit and you'll find the modern band-aid
         | pattern of async / await awkward.
        
       | tialaramex wrote:
       | > it's critical to explore ways to incrementally add safe
       | concurrency in C++.
       | 
       | I don't buy it. Some things are foundational and attempts to add
       | them incrementally are doomed because your foundations won't
       | support the work. After building them easy things seem easy - but
       | then hard things tear out your inadequately founded "incremental
       | addition" and make a huge mess everywhere.
       | 
       | Google has a team doing this sort of thing (indeed David Baron
       | might even be on it for all I know), for exactly the same reason,
       | they wrote a web browser in C++ and it has threading problems and
       | they wish they could fix that. So Mozilla isn't exploring terra
       | incognita here, and if there's something to be found I hope they
       | find it, but I expect nothing.
        
         | nemothekid wrote:
         | > _So Mozilla isn 't exploring terra incognita here, and if
         | there's something to be found I hope they find it, but I expect
         | nothing._
         | 
         | This post was written in 2015. Looking back I think we can
         | infer:
         | 
         | 1. They found a solution (create a new language)
         | 
         | 2. From the success of that language, it seems to deliver on
         | its promise (although it has its warts)
         | 
         | 3. They failed to incorporate that into the web browser because
         | rewriting a web browser as a momentous task that had little
         | measurable business impact; and decided to layoff the engineers
         | instead.
         | 
         | Google is a wealthier company, but they may decide that trying
         | to "fix" C++ is a more realistic endeavor.
        
           | tialaramex wrote:
           | Rust is mentioned in the blog post.
           | 
           | Firefox has a whole bunch of Rust in it, much more than it
           | did when that blog post was written. Mozilla let go some core
           | Rust people, but of course it still employs numerous Rust
           | programmers.
        
             | dblohm7 wrote:
             | (Former Mozilla employee here)
             | 
             | Rust is encouraged to be used by all developers who are
             | writing native code (it's not supposed to be just the
             | domain of a select few). As I recall, the message was, "Not
             | knowing Rust must not be an excuse for why you're not using
             | Rust." Of course developers must still contend with the
             | practicalities of integrating Rust into an existing C++
             | codebase, so some guidelines were developed:
             | 
             | https://wiki.mozilla.org/Oxidation
        
           | steveklabnik wrote:
           | > They failed to incorporate that into the web browser
           | 
           | Rust is just over 10% of Firefox at the time of writing this
           | comment https://4e6.github.io/firefox-lang-stats/
           | 
           | > and decided to layoff the engineers instead.
           | 
           | They didn't lay off people writing Rust in Firefox, they laid
           | off people working on Rust itself.
        
       | nlewycky wrote:
       | I don't have a photo but we used to have the same sign at Google,
       | with an added note "unless accompanied by ThreadSanitizer".
       | 
       | At runtime: https://clang.llvm.org/docs/ThreadSanitizer.html
       | 
       | At compile-time:
       | https://clang.llvm.org/docs/ThreadSafetyAnalysis.html
       | 
       | The article points out that that:                 The resulting
       | invariants end up being documented in comments:              //
       | These variables are protected by monitor X:
       | 
       | but you can write those in C++ code and have the compiler check
       | them. That's what thread safety annotations are for.
        
         | kubanczyk wrote:
         | TIL that, so predictably, "[Go's] race detector is based on the
         | C/C++ ThreadSanitizer runtime library". So, I can definitely
         | confirm its usefulness. https://go.dev/blog/race-detector
        
       | new_stranger wrote:
       | Assuming you don't need bare-level performance and Go is okay, I
       | love how easy and approachable parallel code is with Go. Locks
       | and channels each offer trade offs you can easily benchmark using
       | the built in go test.
       | 
       | I've written so many different concurrent approaches to workers,
       | pipelines, and multi-stage processing systems and found it always
       | easy to reason through ways to limit the number of workers,
       | prevent duplicate work, and implement backoffs.
        
       | anonymousDan wrote:
       | I don't know, maybe it's the sadist in me but I enjoy the
       | challenge of writing fast and correct multi-threaded code. It's a
       | valuable skill if you can master it.
        
         | SilasX wrote:
         | Skill? It can't be reduced to following easily checkable
         | heuristics and rules?
        
           | anonymousDan wrote:
           | I don't claim to have mastered it!
        
         | layer8 wrote:
         | You mean masochist? ;)
         | 
         | I'm similar, though I wish programming languages would gives us
         | more facilities to statically double-check the correctness
         | reasoning.
        
           | oriolid wrote:
           | Masochist? Think about the poor programmer who has to
           | maintain GP's code :)
        
             | tanyar wrote:
             | This is the way. That code breaks when someone new comes
             | along and makes several modifications that has implications
             | which take a long time to rear their ugly head.
             | 
             | Broken multi-threaded code behaves a lot like perfectly
             | coded multi-threaded code....until you get a lot of traffic
             | through that code, you know, like on a busy holiday weekend
             | when lots of revenue is flowing through the business!
        
       | ChrisMarshallNY wrote:
       | Low-level thread programming is a minefield. I've been doing
       | IRQ/concurrent programming since the 1980s, and still hate it.
       | Thread bugs are a nightmare.
       | 
       | Concurrency needs to be "baked into" higher-level languages,
       | development APIs, and standard libraries. Basically, covered in
       | bubble-wrap. That's starting to happen. It won't be as efficient
       | as low-level coding, but it is likely to still have a significant
       | impact on the performance of most code.
       | 
       | And folks that are good at low-level threading would be well-
       | served to work to support this.
        
         | hinkley wrote:
         | I studied distributed computing in college, and I spent a lot
         | of time not quite internalizing the fact that since this was an
         | elective at one of the highest ranked state schools in the
         | nation, probably most other people didn't have the same
         | information I did.
         | 
         | I ended up doing a lot of concurrency work early on because I
         | was good at it, but over time the evidence just kept piling up
         | that nobody else could really follow it, and so I've used it
         | less and less over time, and in particular try very hard to
         | keep it away from the main flow of the application. It's more
         | something I pull out for special occasions, or to explain bugs.
         | 
         | Where a lot of frameworks fail is that while re-entrance and
         | concurrency are different problems, they share a lot of
         | qualities, both computationally and with the limits of human
         | reasoning. Recursive code with side effects looks and acts a
         | lot like concurrent code, because here I am looking at some
         | piece of data and the moment I look away some other asshole
         | changed it out from under me. Most frameworks end up being a
         | bad study in recursion, usually in the name of reducing code
         | duplication.
         | 
         | Pure functional people love recursive code, but it's the pure
         | part that avoids the problems with trying to interlace multiple
         | activities at once. Without that, you're trying to eat your
         | dessert without eating your vegetables first.
        
           | [deleted]
        
         | nick_ wrote:
         | Pony has concurrency baked in via high-perf actor
         | implementation. It's really nice. I believe Go has this as
         | well, but it also has low-level concurrency API as well?
        
         | skohan wrote:
         | Idk I think it is not as difficult as it is sometimes framed. I
         | think the key is to design your program/separation of concerns
         | in such a way as to make it easy to reason about concurrency,
         | and minimize the need for synchronization points.
         | 
         | I think a lot of the problems arise when you just try to write
         | normal synchronous code, or to parallelize code which was
         | originally synchronous, and don't realize the implicit
         | constraints you had been relying on which no longer hold when
         | concurrency is introduced.
        
           | ChrisMarshallNY wrote:
           | FP is good for adapting to threading, but it has
           | difficulties, when applied to things like GUI programming,
           | async communications, or device control (places that need
           | threading).
           | 
           | A lot of languages are getting things like async/await/actor,
           | etc., but even that is still too low-level for a lot of
           | programmers. It can easily turn into a lot of single-threaded
           | code.
           | 
           | It needs to be completely under the surface. Swift does that
           | with generics (I suspect other languages do it, as well). For
           | example, you can say Array<Int>, or [Int]. They mean the same
           | thing, but one does not have the generic syntax.
           | 
           | If we can do similar stuff with things like mutexes and
           | syncing, then it will go a long way towards highly
           | performant, safe, code.
        
             | skohan wrote:
             | It's interesting that you choose Swift as an example,
             | because I think in some ways Swift does a poor job of
             | accommodating async using the "bubble wrap" approach.
             | 
             | Specifically I am talking about the default of using atomic
             | reference counting on all references. This makes it
             | somewhat idiot proof at least with respect to avoiding
             | NPE's, but it comes at a huge performance cost (especially
             | on x86 which I guess is becoming less of a concern for
             | Swift code).
             | 
             | I think this is a fundamental issue: a big part of the
             | benefit of concurrency and parallelism is supposed to be
             | increased performance. However a lot of the ways to make
             | concurrency "generally safe" tend to hamstring performance,
             | because you have to protect against all the different types
             | od errors programmers can make, which basically means a lot
             | of expensive synchronization.
             | 
             | Maybe there is a way to hide this completely, as you say,
             | in a performant way. To do this, I think you would almost
             | have to have either some kind of really clever risk-free
             | data structures, or else a very smart compiler.
             | 
             | Another approach might be to keep concurrency at arms
             | length from the average programmer. I.e. hide 90% of it
             | inside of libraries, and expose safe interfaces which the
             | programmer interacts with in a mostly synchronous way.
        
               | ChrisMarshallNY wrote:
               | _> I think in some ways Swift does a poor job of
               | accommodating async using the  "bubble wrap" approach._
               | 
               | I agree. Async/await has a "kludgy" feel to it (to me).
               | It's "added on," not "embedded in." But I'm not a
               | language writer, so I don't have alternatives to offer.
               | 
               | I was talking about how it does generics. I think it does
               | that quite well.
        
               | dgb23 wrote:
               | To me it feels like I'm folding and knotting code.
        
               | skohan wrote:
               | I agree with you there. Swift has some really excellent
               | syntactic sugar.
        
           | hinkley wrote:
           | Based on a non-scientific study, I think the spatial thinkers
           | do great with concurrency, the visual thinkers do okay, and
           | everyone else is in real trouble. Which reminds me, I need to
           | interview my developer friend with aphantasia about how he
           | feels about concurrency.
           | 
           | Concurrency should be the sizzle and not the steak, otherwise
           | you're reducing the diversity of your team rather
           | substantially. Good people are hard enough to find. Driving
           | the people you have away doesn't generally end well.
        
             | [deleted]
        
       | SleepyMyroslav wrote:
       | just my 2c Friday rant on old article.
       | 
       | Games had to adapt to threading before I even joined. I got lucky
       | to work with some very efficient threaded systems. They were
       | never correct or safe xD.
       | 
       | But I can share that if you see comment that these variables are
       | protected by this lock more that few times per million of lines
       | of code performance is not the reason why it is threaded.
       | 
       | From public videos i think best explanation is available from
       | Sean Parent "Better Code: Concurrency" [1].
       | 
       | I really would like to see high level or safer languages or
       | systems to come to future game platforms. It should be possible.
       | It just needs economical factors to kick in. Like people finally
       | notice that slow and safe is not going to be any faster next year
       | with new hardware.
       | 
       | 1. https://youtu.be/32f6JrQPV8c
        
       | muglug wrote:
       | I'm glad that Rust has changed this, at least in my very specific
       | case: I wrote a command-line application in PHP and adding
       | parallelisation was a long journey of trial and error and
       | concurrency bugs that took over a week to get right.
       | 
       | I rewrote the same application in Rust. I followed the Arc/Mutex
       | documentation and parallelised it over a few hours. It worked
       | perfectly.
        
       | chasil wrote:
       | "If the fastest chips have N cores, a mostly-single-threaded
       | program can only harness (1/N)th of the available resources. As N
       | grows (and it is growing), parallel programs will win."
       | 
       | This win will be constrained by Amdahl's law. A single
       | program/process that is multithreaded can effectively use only a
       | finite number of cores.
       | 
       | To use more resources, the task must be broken into multiple
       | processes.
       | 
       | It can be faster to run $(nproc) number of gzip processes in
       | parallel than to use pigz, for example.
       | 
       | https://en.wikipedia.org/wiki/Amdahl's_law
        
         | spacechild1 wrote:
         | > This win will be constrained by Amdahl's law. A single
         | program/process that is multithreaded can effectively use only
         | a finite number of cores.
         | 
         | This assumes that the workload is fixed. A classical example
         | would be compiling source code: you have a fixed number of
         | source files and the compiler is only able to parallelize so
         | much.
         | 
         | On the other hand, there are applications where more threads
         | allow you to increase the workload itself. For example, a
         | modern DAW (digital audio workstation) is able to render tracks
         | in parallel; if you have more threads, you can use more tracks
         | with more plugins. In a computer game, more threads might mean
         | more complex simulations, more objects, better graphics, etc.
        
         | cortesoft wrote:
         | Also, the major change with having so many cores is the ability
         | to do lots of different things at once on a single machine. You
         | don't need any single process to utilize all the cores on a
         | machine, you take advantage of all those cores by having a
         | single machine do a lot do things.
        
           | ajuc wrote:
           | Doesn't help you at all if you want the game to run at 60 fps
           | and instead you can run it 8 times in parallel at 30 fps
           | each.
        
             | chasil wrote:
             | It remains important not to buy cores that you cannot use,
             | especially if there are other constrained components that
             | would have a larger performance impact.
             | 
             | Sophie Wilson (creator of the ARM instruction set) spends
             | some time on these issues, also stressing core throttling
             | and shutdown due to heat.
             | 
             | https://www.youtube.com/watch?v=zX4ZNfvw1cw
        
       | JoeAltmaier wrote:
       | There was a paper long ago that showed duality between
       | semaphore/locking code and message-queuing code. So folks figured
       | they were the same.
       | 
       | Not so! semaphore/locking is very hard if more than one semaphore
       | exists. Look at the 'dining philosophers problem' and so on.
       | 
       | But queuing! Done right, that can be proved correct through
       | static analysis. Do threads that consume one queue, block on
       | another queue? Draw the graph of queue-blocking - does it loop
       | anywhere? Then you could have a problem.
       | 
       | I.e. if your message-consumers don't block at all, then you
       | cannot have a problem with deadlock.
       | 
       | You CAN have a problem with messages stalling however -
       | languishing on a list waiting for something to complete that
       | might never complete. But at runtime this can be debugged fairly
       | easily - if all your queues are visible to the debugger.
       | 
       | In the semaphore implementation, the locking situation depends on
       | the state of threads and their stacks, which all have to be
       | frisked and back-executed to find out who holds what lock etc.
       | Not always debugger-friendly to do.
       | 
       | I favor a multi-threading environment of threads dedicated to
       | queues, and all the queues visible to the debugger. That kind of
       | setup has never done me dirty.
        
         | btilly wrote:
         | _I favor a multi-threading environment of threads dedicated to
         | queues, and all the queues visible to the debugger. That kind
         | of setup has never done me dirty._
         | 
         | You mean like Go does with channels? (Not sure how good their
         | visibility is to the debugger though.)
        
           | fmajid wrote:
           | It's based on Tony Hoare's _Communicating Sequential
           | Processes_ , and is far more comprehensible and possible to
           | reason about than primitives like mutexes and semaphores that
           | are too close to the underlying hardware implementation.
           | 
           | http://www.usingcsp.com
        
           | lostcolony wrote:
           | Probably more like Elixir/Erlang, if we're talking
           | programming language. Though even there, BEAM processes are
           | the unit of execution, and are multiplexed on threads. Parent
           | references OS development elsewhere.
           | 
           | Go has a couple of deviations; channels aren't queues unless
           | you are careful to size them > the number of things you'll
           | ever put in them (else the thing trying to put something on a
           | channel deadlocks; you can of course timeout on it, but it
           | presents tight coupling in that case, the goroutine sending
           | has to care about the state of the thing receiving),
           | goroutines aren't dedicated to channels (that is, there is an
           | m-n relationship between channels and goroutines which can
           | lead to a lot of incidental complexity), and, you can see
           | what is on a channel if you have a breakpoint, but that
           | assumes you can get the code to break inside of something
           | containing a reference to the channel.
        
         | convolvatron wrote:
         | its a _little_ more costly depending, but otherwise I agree
         | with you. it also clearly delineates what data is shared and
         | everything else can be assumed private...so it helps make the
         | overall architecture more explicit
        
         | Etheryte wrote:
         | This sounds interesting and I've implemented simple ideas
         | similar to the pattern you describe before, however I haven't
         | read about its use in depth. Do you happen know of an
         | article/book/resource that describes this along with real world
         | experiences? If not, would you mind writing a blog post or
         | article on it please?
        
           | JoeAltmaier wrote:
           | I had the benefit of cutting my teeth at my first job,
           | working on a message-passing OS. CTOS, based on RMX/86 used
           | messages for everything. It was a very early networked
           | computing system. Diskless workstations where file messages
           | simply got forwarded to a server etc. And all the messages
           | and queues were visible in the debugger!
           | 
           | So I learned good thread hygiene right out of school.
        
         | dekhn wrote:
         | the only model I use with threads is either pre-static (start a
         | function on each thread with unique parameters, result is
         | returned to a driving function at the end) or message-queue
         | (each thread is blocking, waiting for a message). For the
         | former, I use shared memory, but it's read only and guaranteed
         | to live longer than any of the worker threads that use it
         | (passing a consted ref_ptr). For the latter, I don't share any
         | memory at all; the message itself contains the parameters,
         | fully materialized.
         | 
         | I can ensure all my queues are loop-free (directed acyclic
         | graph of workers reading from their parent nodes and writing to
         | their children nodes). IIRC one of the complaints about the
         | petri net model is that it was unprovable that any problem
         | could finish, even trivial ones.
         | 
         | This has worked pretty well for me and I usually don't have to
         | debug any truly nasty thread problems. My biggest problem tends
         | to be thread pools where one of the workers gets wedged, and
         | there's no way to cleanly kill the worker thread and steal the
         | work onto another thread.
        
           | JoeAltmaier wrote:
           | Particularly network activity! At the lowest levels the
           | creaky old TCP/IP stack not amenable to non-blocking.
        
         | anonymousDan wrote:
         | One trick I used to use when writing multithreaded java code
         | for debugging purposes was to never use the built in
         | synchronized/implicit object lock, since that made it quite
         | difficult to understand from a stack trace what object was
         | being locked. Instead, we would define an explicit inner lock
         | and lock on that explicitly. The class name would then show up
         | in stack traces making it much easier to track down.
        
           | hinkley wrote:
           | In cases where not all operations on the objects necessitate
           | the same lock semantics, this is also handy because you can
           | split or nest locks to reduce write-blocks-read and read-
           | blocks-write situations.
           | 
           | But if you're finding yourself using this more than a couple
           | of times in a project, you're probably uncovering violations
           | of the Single Responsibility Principle. Especially if you're
           | splitting locks.
        
         | itsmemattchung wrote:
         | One of my _favorite_ threading papers published in 1982,
         | written by Andrew Birrell:
         | 
         | https://s3.amazonaws.com/content.udacity-data.com/courses/ud...
         | 
         | Reading about that paper in graduate school cleared up a lot of
         | misconceptions I had about threads
        
         | morelisp wrote:
         | I mean that's a nice theoretical argument, but the empirical
         | results are not so clear.
         | 
         | https://dl.acm.org/doi/10.1145/3297858.3304069
         | 
         |  _Surprisingly, our study shows that it is as easy to make
         | concurrency bugs with message passing as with shared memory,
         | sometimes even more. For example, around 58% of blocking bugs
         | are caused by message passing._
         | 
         | ...
         | 
         |  _Our study found that message passing does not necessarily
         | make multithreaded programs less error-prone than shared
         | memory. In fact, message passing is the main cause of blocking
         | bugs. To make it worse, when combined with traditional
         | synchronization primitives or with other new language features
         | and libraries, message passing can cause blocking bugs that are
         | very hard to detect._
         | 
         | I'd also add that in the modern world we've "solved" many
         | multi-threading problems by splitting the queues and data
         | stores out into multiple services. Now we don't have any
         | threading problems - only "distributed system" problems, and if
         | there's a bug there that's the architect's fault, not the
         | programmer's!
        
           | JoeAltmaier wrote:
           | Yeah so many services APIs are blocking, it's hard to keep
           | threads non-blocking. Often there is a way, but you have to
           | dig for it.
        
         | syrrim wrote:
         | Dining philosophers is solved by always acquiring locks in the
         | same order.
        
           | [deleted]
        
           | chrisseaton wrote:
           | ...if it's possible to do that for your application! It isn't
           | always, if what lock you need next depends on the value of a
           | previously locked value.
        
             | JoeAltmaier wrote:
             | Right, perhaps in a nested call or library, maybe even a
             | semaphore you didn't know about!
             | 
             | Multithreading with semaphores/locks is about as dirty as
             | it can get.
        
             | syrrim wrote:
             | The top level comments strategy with queues seems similar,
             | in that it isn't always possible. It requires the queues
             | form a DAG, which is another word for partial ordering.
             | Thus, requiring the locks form a partial ordering doesn't
             | seem like a comparatively excessive restriction.
        
               | chrisseaton wrote:
               | > doesn't seem like a comparatively excessive restriction
               | 
               | I think there are some applications and algorithms that
               | we just don't know a way to create an ordering for
               | realistically fine-grained locks.
               | 
               | For example airline seat reservation. Someone clicking on
               | seats to reserve them. You can't apply locks in a total
               | order because... you can't read the person's mind on what
               | seat they're going to click next.
               | 
               | Or Lee's algorithm.
               | 
               | (I wrote the first half a PhD on this.)
               | 
               | https://chrisseaton.com/research-symposium-2012/seaton-
               | irreg...
        
       | jerf wrote:
       | I think it's gotten easier over time, actually, and even when
       | this was written it was theoretically out of date.
       | 
       | There is a famous paper about how bad threads are from the 1990s.
       | I believe it is valid on its own terms but misdiagnoses the
       | underlying issue. The underlying issue is this: Taking multiple
       | locks at once is not a sane operation at scale. Any system built
       | on this will fail.
       | 
       | There is no magic bullet that solves all concurrency problems,
       | but a combination of techniques, used with discipline, take it
       | from insanely impossible to merely challenging:
       | 
       | 1. Assign data to a thread, with only that thread able to access
       | it. Offer concurrency via message passing to that thread alone,
       | and don't allow references to escape from the thread (pointers,
       | whatever you want to call it). Language support for this is nice
       | (Erlang and Pony do it one way, Haskell another, Rust yet a
       | third) but not technically necessary.
       | 
       | 2. Using locks is valid, and even quite helpful and useful, but
       | never let yourself take more than one. If you need transactions,
       | see #1. Don't write code that encourages people to take more than
       | one. Don't forget about things that may take locks without you
       | thinking about it (e.g., in Go, taking a mutex _and then_ trying
       | to send on a channel is taking two locks); this is one thing that
       | can be a bit of a challenge.
       | 
       | 3. For more complicated things, use libraries as much as possible
       | that have solved the super complicated cases already. For
       | instance, databases that offer transactional security. You can do
       | this even if you don't otherwise "need" such databases. Or
       | libraries that offer a "parallel map" within your language, or
       | other canned solutions to parallelism/concurrency.
       | 
       | 4. Use whatever support your language provides for "linting" your
       | conditions. The Go race condition checker is theoretically
       | insufficient but practically useful (passing it doesn't mean that
       | you have a safe program but it's a good start). Your language's
       | mileage may vary but take advantage of what you can.
       | 
       | Honestly, just these things taken together can take you quite
       | far. Again, these aren't magical solutions. There is an
       | irreducible sense in which threading is harder than not
       | threading, and I don't expect that to ever change. There will
       | always be programmers that are more expert in it than others. But
       | it doesn't have to be an impossible challenge anymore. And to a
       | large degree, the challenge becomes less "solving the multithread
       | problem" and more just figuring out how to architect systems
       | under the restrictions of the patterns that work and compose
       | together.
       | 
       | Although all that said, at a browser scale concurrency is always
       | going to be a problem. They gotta go fast, but that fights the
       | safe patterns pretty hard because they do involve a certain
       | amount of runtime slowdown (e.g., passing a message is always
       | going to be slower than just _not_ passing a message). Rust is
       | the only practical solution at that scale. But just as You Are
       | Not Google, You Are Not Writing A Browser. More normal programs
       | can go a long way with just what I laid out above.
        
         | dllthomas wrote:
         | > Language support for this is nice (Erlang and Pony do it one
         | way, Haskell another, Rust yet a third) but not technically
         | necessary.
         | 
         | At one point I did some name mangling magic with the C
         | preprocessor for "this can only be accessed from the thread
         | (statically) named foo" for a project where that was relevant.
         | It was pleasant, although limited in how much it could
         | generalize - all the threads very much needed to be statically
         | known.
        
         | dandelany wrote:
         | This is an excellent comment that clearly comes from a lot of
         | experience. Can you say any more about why having multiple
         | locks is such a red flag? Is it because you start getting into
         | deadlock/hungry philosophers territory? Are there rules of
         | thumb for using multiple locks that make it more tractable or
         | is it just always a bad idea?
        
           | dgb23 wrote:
           | Not OP, but that might be an implication. Another, more
           | general reason is that this kind of concurrency requires
           | design. It can't be thought of as a defensive programming
           | technique. Locks are too powerful and they are by definition
           | a point of contention.
           | 
           | I think providing a lock is like giving up control to someone
           | else. It's inherent coupling. To contrast, when I first
           | learned about them I thought of them as a protection
           | mechanism of an isolated part, but they are an execution
           | contract that everyone participates in.
        
             | jerf wrote:
             | "It's inherent coupling."
             | 
             | This matches well with one of the beliefs I've been
             | polishing up over the past few years which I don't think is
             | well understood, which is that structured programming tends
             | to create more coupling than we realize in deep call
             | stacks. When you end up with a 2000-line call stack in
             | Java, that's 2000 function invocations coupled together by
             | the way structured programming works; for instance, at a
             | bare minimum, that's 1999 functions that can't progress
             | until the innermost one does, and that's not the only form
             | of coupling.
             | 
             | I think the coupling induced by structured programming is
             | fairly light per stack frame on average, but they tend to
             | add up quickly because they're just so darned easy to add
             | more of.
             | 
             | Locks are another thing that makes this really come to
             | light. The more stack frames between some function and
             | something above it in the stack that has taken a lock, the
             | more likely it is for execution to eventually wander back
             | into something trying to take a lock inadvisably.
             | 
             | I mean, I've had a number of deadlocks just within an
             | object where it has some internal lock, and a method takes
             | that lock, then tries to call a method that takes the same
             | lock. Whoops. Easy to fix, of course, but that's the easy
             | case. The cases get harder in real code.
        
           | ntoskrnl wrote:
           | The rule of thumb for multiple locks is always acquire them
           | in the same order. Lock A, lock B, unlock B, unlock A.
           | Otherwise you risk a deadlock. If another thread locks B,
           | then tries to lock A, you're in trouble.
        
             | mandevil wrote:
             | And keeping that discipline over many developers, over many
             | years, is in practice impossible, which is why multiple
             | locks seem attractive but are actually a major problem for
             | complex systems.
        
               | layer8 wrote:
               | You can, of course, encapsulate such a lock order with
               | dedicated classes (or functions/closures) modeling linear
               | types enforcing the sequence of lock A > lock B > unlock
               | B > unlock A. The resulting objects can still be
               | misapplied, but should fail immediately with a runtime
               | error, making it much harder to inadvertently misuse.
        
               | jerf wrote:
               | But that still doesn't scale up to more locks of that
               | size very well, because you start trying to take even
               | more of them, and in dynamic orders, and etc etc.
               | 
               | I know about the theoretical solution of taking locks
               | always in a defined order, but I don't consider it a
               | practical solution. If taking locks willy-nilly lets you
               | scale to program size X, this might give you 2X or 3X,
               | but that's not nearly large enough in practice.
               | 
               | & dandelany, the community has chipped in and answered
               | for me. I endorse this subthread, at least, as I write
               | this comment. :)
        
               | anarazel wrote:
               | IME there's plenty problems where nested locks will lead
               | to a simpler solution than trying to avoid them. It's
               | important to avoid when reasonably doable, but it's also
               | not uncommon to take it too far and end up with a slower
               | and way more complicated solution.
        
         | secondcoming wrote:
         | What is 'browser scale'? I'd imagine most browsers spend most
         | of their time idle.
        
           | Slartie wrote:
           | If you want to see some of the most complex multithreading
           | systems, take a look inside modern browsers. That's what he
           | means by "browser scale".
           | 
           | I have some knowledge into working with Chromium's internals,
           | and it is crazy what's going on inside there. Several
           | different job and message queue systems interlocking with
           | each other, tasks spread across multiple processes getting
           | split and regrouped and synced and recombined, practically
           | everything running async, just so a few pixels are rendered a
           | little faster when your multicore machine updates a small
           | section of a website.
           | 
           | Browsers are indeed idling a lot, but if you want them to do
           | something, they must be as crazy fast as possible. The
           | competition in that space is ridiculous.
        
           | jerf wrote:
           | In the tens of millions to hundreds of millions of lines of
           | code. Browsers are some of the largest things out there. See
           | also desktop office suites, multi-decade CAD programs, a few
           | other things.
           | 
           | Few programs are at that scale. IIRC, even the Linux kernel
           | isn't _really_ at that scale, once you clean away a few files
           | in the drivers that are just literally millions of #defines
           | or something.
           | 
           | If you're writing at that scale, by all means copy the
           | necessary practices, but don't just casually assume your need
           | is super large and you've just gotta use the hard-core lock-
           | based concurrency because it's the only option for you. You
           | can get a _long_ way with much nicer architecture that is
           | only slightly slower.
           | 
           | For instance, I write a lot of Go. I've sometimes gotten a
           | bit nervous about writing a goroutine (/thread) that owns
           | some particular data structure because I'm like "wow, a lot
           | of things are going to access this, is this going to become a
           | bottleneck in the system?" But every time I've run the
           | analysis carefully, the answer generally comes back not just
           | "no" but "heck no, not even close". Your intuition can
           | deceive you very easily. Locking a single structure behind
           | the limit that "we can only dedicate at most 1 CPU full time
           | to maintaining this structure" is, in practice, far less
           | scary than it sounds. 1 CPU is a lot of power and you
           | generally need a lot of other CPUs trying to access it
           | simultaneously to overwhelm it.
           | 
           | Ahdahl's Law kinda works in your favor on this matter (for
           | once); for something like this to be the bottleneck, it needs
           | to approach accounting for (1/(CPUs - 1)) of your execution
           | time, and for that to be the case, those other processes
           | hammering this 1-CPU-locked data structure need to be doing
           | whatever else they are doing very quickly. As with anything
           | in discussions like this, it absolutely can happen! But
           | always check your intuition concretely before going crazy
           | adopting something more complicated and "higher performance";
           | odds are good this isn't a problem for a given "random" data
           | structure.
        
       | macintux wrote:
       | > In this approach, threads own their data, and communicate with
       | message-passing. This is easier said than done, because the
       | language constructs, primitives, and design patterns for building
       | system software this way are still in their infancy.
       | 
       | "Infancy"? Erlang was invented in the 80s.
        
         | tonnydourado wrote:
         | OP does say system software, and although Erlang can have
         | amazing uptime, I'm not so sure about it's raw performance
         | story (on the other hand, I do remember seeing some insane
         | benchmarks from a elixir framework, so, maybe I'm wrong).
         | 
         | Also, other than Erlang and Elixir, which other reasonably
         | mainstream language has this as first class features? Even Rust
         | doesn't really put queues and message passing front and center,
         | it just makes it easier to get away with doing _traditional_
         | thread programming.
         | 
         | Things can be old in years since discovery/invention, but still
         | in it's infancy in usability and adoption.
        
           | Lammy wrote:
           | > Also, other than Erlang and Elixir, which other reasonably
           | mainstream language has this as first class features?
           | 
           | Ruby 3.0+ with Ractors: https://docs.ruby-
           | lang.org/en/master/ractor_md.html#label-Co...
        
           | kubanczyk wrote:
           | Go has channels (`chan`) which it considers more native than
           | e.g. Mutex. The latter is in a stdlib, the former is a
           | feature of the language proper.
           | 
           | Alas, the Go ecosystem could use more channels. I'd say that
           | the adoption is still at the infancy stage. I wonder whether
           | there are any practical shortcomings or is it just a
           | readability tradeoff (the Mutex tends to be super-readable,
           | where the channels not so).
        
         | moonchild wrote:
         | And the CSP paper dates to 1978.
        
         | bjourne wrote:
         | Truth be told, it is trivially easy to create deadlocks even in
         | a pure message-passing environment such as Erlang. Message-
         | passing is way oversold and doesn't solve as many problems as
         | people think.
        
       | TheMagicHorsey wrote:
       | Erlang made it pretty easy.
       | 
       | But then not everyone wants to learn Erlang. And if you start
       | writing NIFs all crazy, you might get yourself into trouble
       | again.
       | 
       | But it seems tractable.
       | 
       | Am I being naive?
        
         | hereforphone wrote:
         | No, you're absolutely right: Erlang is magnificent and
         | superior.
        
       | ghufran_syed wrote:
       | I have very little experience of writing concurrent code, but
       | this seems relevant:
       | https://clojure.org/about/concurrent_programming
        
       | xigency wrote:
       | I don't really support this view.
       | 
       | While doing anything creative with threading is in fact
       | challenging, in Java it's easy to safely boost performance with
       | threading using parallel streams and concurrent data structures.
       | 
       | Even in other languages like C/C++, it's mostly worth it to
       | design something up-front that is naturally parallelizable and
       | find a safe and correct way to do it.
       | 
       | And the incentive to parallelize is only increasing with the
       | increasing thread counts of newer hardware. Sure, it's scary, but
       | there's really no excuse to avoid running with threading,
       | multiple processes, or multiple services.
        
         | jayd16 wrote:
         | In my experience I saw a bunch of novice devs using parallel
         | streams for N<=100 lists and such, for much worse performance.
         | Its certainly not foolproof either or it would just be the
         | default.
        
           | marginalia_nu wrote:
           | To be fair, whether parallel streams is useful isn't a
           | function of N alone, but the time each operation takes. If
           | it's a nontrivial calculation, then by all means.
           | 
           | Although parallel streams are seldom the best option in terms
           | of performance even for large N. They may often be faster
           | than sequential processing for large N, but a more thought
           | out approach is almost always faster still, often
           | significantly so.
        
       ___________________________________________________________________
       (page generated 2022-06-10 23:00 UTC)