[HN Gopher] The Downsides of C++ Coroutines
       ___________________________________________________________________
        
       The Downsides of C++ Coroutines
        
       Author : msz-g-w
       Score  : 56 points
       Date   : 2023-08-11 06:09 UTC (1 days ago)
        
 (HTM) web link (reductor.dev)
 (TXT) w3m dump (reductor.dev)
        
       | ninepoints wrote:
       | Eh, I use them and am quite productive with them. Some of the
       | downsides I don't really buy, for example, the argument regarding
       | allocations. In a typical task engine, you're allocating state
       | per task anyways. Sure you could have custom arenas and such, but
       | you can do that with C++ coroutines also by overriding operator
       | new/delete on the promise object. The lifetime concerns are par
       | for the course when it comes to async stuff (assuming that's how
       | you're using C++ coroutines).
        
         | Conscat wrote:
         | Generators, which already exist in the stdlib, is an example
         | where we can see heap elision being useful, but is currently
         | unreliable in C++. There is a paper "Explicit Coroutine
         | Allocation" that will likely solve this in C++26. The Clang IR
         | project will also improve HALO for the future of (Clang) C++
         | projects.
        
       | shadowgovt wrote:
       | Besides that they are yet another feature in the bloated hash
       | mess that is c++?
        
       | cshokie wrote:
       | This article hits on a number of interesting points. There is a
       | lot of complexity to be aware of when using C++ coroutines. And a
       | number of "normal" practices become dangerous in them, such as
       | pass by reference.
       | 
       | That said, I think they are still very much worth it. Older
       | asynchronous programming libraries in C++ are so verbose and so
       | much worse than coroutines that it's an obvious choice to use
       | coroutines.
       | 
       | Also, there is another hazard that the author does not mention in
       | this article: RAII lock wrappers. Holding a lock across
       | suspension points is super dangerous. At best it wastes
       | performance to leave it locked when blocked. At worst it can
       | create deadlocks or corrupt the lock if it is released on a
       | different thread than it was acquired.
        
         | rockwotj wrote:
         | As someone who writes in C++ and uses coroutines everyday for
         | work, I find for our use case this is actually helpful.
         | 
         | We use seastar.io a thread per core framework and locks are
         | "async" friendly in that they yield for access instead of
         | blocking. Also embracing fully async message passing between
         | threads simplifies the programming model a ton.
        
         | mgaunard wrote:
         | I don't understand how it is any more verbose.
        
           | nazcan wrote:
           | Because you have to keep explicitly passing state between
           | each callback, rather than just using the same context (which
           | still has the ability to delete things if needed).
        
         | meindnoch wrote:
         | >RAII lock wrappers
         | 
         | You don't even need coroutines for this to be dangerous.
         | Holding locks over callback invocations is a pet peeve of mine
         | in PR reviews. Callback invocations, like suspension points,
         | can inject arbitrary operations into our code, which can easily
         | break prior invariants, yet look innocuous for the casual
         | reader.
        
         | commonlisp94 wrote:
         | > it's an obvious choice to use coroutines.
         | 
         | I agree, but the other choice is to have traditional threads of
         | execution that block. This simple strategy has delivered more
         | successful projects than any other.
        
       | mkoubaa wrote:
       | Experience teaches me that the worst time to use a new design
       | pattern or technique is _right after you learn about it_. The
       | problem in your code base you thought about while learning the
       | pattern was a useful proxy for where it could be applied, but
       | that doesn't mean it's the right fit.
       | 
       | Do it in a scratch refactoring, and wait a week or two before you
       | consider merging it. And make sure you are emotionally as ready
       | to discard as you are to land it.
        
         | TillE wrote:
         | I agree that you should always be ready and happy to discard or
         | refactor code as needed. Requirements change, your assumptions
         | may be wrong.
         | 
         | But in practice I've more often seen the opposite problem,
         | where organizations end up stuck on C++11 for a decade for no
         | technical reason. It's good to explore the new stuff and
         | eventually adopt what you can use.
        
         | tomcam wrote:
         | All true, but another trick is to do extensive web searches to
         | see what kind of problems people have had with the new
         | approach.
        
       | 7e wrote:
       | It's in fashion to dislike fibers, but they're a simpler solution
       | that, IMHO, beats coroutines for the wide majority of cases. Even
       | threads are a better solution for most cases. Coroutines are like
       | the checked exceptions of C++.
        
       | germandiago wrote:
       | I do not understand yet (open to explanations!) what is the
       | difference between stackless and stackful coroutines in the fact
       | that stackless should be cheap and even "collapsable" when nested
       | in lifetimes but if it is not the case... _stackful is cheaper_.
       | 
       | Are not stackless supposed to be more performant? In which cases?
       | Yes I know their virality, potential heap allocations, etc.
        
         | comex wrote:
         | Two differences:
         | 
         | First, stackful coroutines use the coroutine stack for
         | everything they do. Stackless coroutines can use the normal
         | thread stack for synchronous calls, and that stack can be
         | shared across any number of coroutines. Per-coroutine
         | allocation is only needed for asynchronous calls.
         | 
         | Second, for stackful coroutines you need to allocate the entire
         | stack up front, and usually you have no way of knowing how much
         | stack might be needed, so you need a conservative upper bound.
         | Normal thread stacks have sizes in megabytes. (That doesn't
         | necessarily correspond to actual memory consumption, since the
         | OS will only reserve physical memory as needed, but the
         | physical reservation for a given stack can only grow, not
         | shrink. And even just allocating the virtual space has a cost.)
         | Most of the time you can get away with stacks that are _much_
         | smaller, only a few kilobytes, but at the cost of potentially
         | crashing when you 've consumed too much stack; it's hard to
         | statically analyze maximum stack usage.
         | 
         | Stackless coroutines will, in general, only allocate memory as
         | needed for each coroutine invocation, so not only are you
         | wasting less memory, you don't have to worry about hitting an
         | arbitrary limit. Allocation elision makes things more
         | complicated since, as the blog post notes, you can end up
         | wasting some memory, but compared to stackful coroutines it's
         | peanuts. But they have the downside that heap allocations and
         | deallocations are expensive; plus, splitting a "stack" of
         | nested calls into separate heap allocations, usually far away
         | from each other in memory, is worse for cache locality.
        
       | yakubin wrote:
       | > Just like a normal function arguments are passed using
       | registers and the stack, coroutines are using the same ABI as
       | previously specified, however the code different is vastly
       | different.
       | 
       | > Finally at the end of the function the stack space initially
       | reserved get's reset to where it was initially when the function
       | first call happens then returns to the caller.
       | 
       | This post could use some editing. I'm having to reread each
       | paragraph several times to figure out its intended meaning. Most
       | sentences are separate paragraphs with careless mistakes that
       | make me feel the author was being chased by someone when writing
       | them and couldn't take a breath.
        
         | i-use-nixos-btw wrote:
         | It is due a bit of proof reading. There are some readability
         | issues.
         | 
         | That being said, it is a great article. C++ and coroutines is a
         | story that has been going on for a long time, and the result
         | surprised me. In a bad way.
         | 
         | One bit me right from the start. I copied out an example and it
         | crashed, and it turned out (after hours of searching, reading -
         | the compiler and sanitisers sure weren't any help) that the
         | problem was that I'd inadvertently made a parameter const&
         | (force of habit) and bound a temporary to it.
         | 
         | My answer to this is simply that I choose not to use
         | coroutines. If I can't force a compilation failure when I do
         | something dumb, that spooks me.
         | 
         | For a feature released in 2020 it has far too many footguns.
         | Ranges was similar when it came to lifetime footguns. It's just
         | something that makes it hard to take seriously the claims that
         | it is legacy code that is the reason C++ has a bad rap for
         | safety. Coroutines and ranges are modern features that can
         | shoot your foot off if you don't know the implementation, which
         | is kind of contrary to the point of making a friendly wrapper
         | over it all.
        
       | mgaunard wrote:
       | The best solution remains writing asynchronous code in a way that
       | is explicitly asynchronous.
       | 
       | Who would have thought?
        
       | sshaw wrote:
       | "C++" and "Coroutines". Who would have thought.
       | 
       | But, considering the accelerated releases post C++ 11, I guess
       | I'm not surprised.
        
         | pjmlp wrote:
         | Anyone that has read "Design and Evolution of C++".
        
       | jupp0r wrote:
       | There is nothing inherently asynchronous about coroutines. You
       | can use them to model concurrency or even parallelism, but that's
       | only a subset of their use cases.
        
         | Calavar wrote:
         | What are some of the other use cases?
        
           | spacechild1 wrote:
           | Generators come to my mind.
        
           | Jtsummers wrote:
           | Many coroutine uses are not asynchronous, but _synchronous_ ,
           | they block when resumed and do not execute in parallel. This
           | permits cooperative multitasking, versus preemptive (or
           | preemptive with a bunch of locks to imitate cooperative which
           | is, of course, a waste). Since they can, in principle,
           | execute within the same thread (with C++'s implementation and
           | some others _you_ the programmer can send them off to other
           | threads for execution, but that 's an explicit choice) this
           | can simplify concurrent system design and execution (in the
           | concurrency is not parallelism sense). In the single threaded
           | case, it's also faster than multithreaded asynchronous code
           | since the context switching (modulo cache misses) is greatly
           | reduced. Especially useful in the case where you want
           | synchrony and not asynchrony.
           | 
           | They're also very useful if you've ever had to create a bare
           | metal multitasking system. Much easier for state management
           | than older style "while (true)" loops with a million state
           | variables so functions can resume via a switch/case as
           | pseudo-coroutines. (Well, easier if you don't have to
           | implement the coroutine mechanism yourself.)
        
       ___________________________________________________________________
       (page generated 2023-08-12 23:00 UTC)