hngopher.com

       [HN Gopher] C++ Coroutines
       ___________________________________________________________________
        
       C++ Coroutines
        
       Author : todsacerdoti
       Score  : 98 points
       Date   : 2023-02-22 16:31 UTC (6 hours ago)
        
 (HTM) web link (nigeltao.github.io)
 (TXT) w3m dump (nigeltao.github.io)
        
       | jmull wrote:
       | > Asynchronous code (e.g. callbacks) is more efficient (letting
       | you do other work while waiting for things) but is also more
       | complicated (manually saving and restoring state).
       | 
       | This is a persistent misunderstanding about async... it is _not_
       | generally more efficient than non-async code. In fact, the
       | overhead of async may slow things down. (There can be other
       | downsides as well. E.g. debugging may be more difficult.)
       | 
       | For it to help, you need to have something better for the current
       | thread to be doing than waiting for whatever it is you're waiting
       | for.
       | 
       | Whatever that is, another worker thread could be doing it, so
       | what we're really talking about is that async may let you reduce
       | the number of threads you're using.
       | 
       | (Async is so useful in JS because it lets you reduce the number
       | of threads to one, which lets you sidestep all kinds of hairy
       | issues.)
       | 
       | That might be good enough, depending on your situation, to
       | overcome the downsides of async worthwhile. But it may very well
       | not.
       | 
       | Anyway: don't use async because it's "more efficient".
        
         | the_duke wrote:
         | You are leaving out an important aspect: context switches are
         | expensive, especially with all the mitigations introduced after
         | Spectre et al.
         | 
         | Doing more work in a single thread saves a lot of context
         | switches, especially if IO makes up a large portion of the
         | workload. And even more so with other mechanisms to minimize
         | syscalls, like io_uring on Linux.
         | 
         | Async overhead is highly dependent on the way the async runtime
         | is implemented.
         | 
         | It's true that async requires scheduling and task switching to
         | be implemented in userspace, but assuming that async is used as
         | an alternative to threads, then the OS would be doing a similar
         | amount of work anyway.
         | 
         | The OS may be able to do that more efficiently because it has
         | more information about the system. Or it may be the opposite,
         | because the application has more information about the
         | workloads.
         | 
         | "async = overhead" as a general statement is not correct.
        
           | otabdeveloper4 wrote:
           | Context switches happen regardless of whether you're using
           | kernel mode ("threads") or user mode ("async") scheduling.
           | 
           | And kernel mode actually has much more efficient and
           | performant context switches, because the kernel can access
           | hardware directly.
           | 
           | (It is sometimes useful to be able to do custom user mode
           | scheduling, but certainly not because of "context switches".)
        
             | [deleted]
        
             | the_duke wrote:
             | I don't quite follow your argument there.
             | 
             | This is unrelated to kernel threading.
             | 
             | If you have 1 thread handling 1000 requests with some async
             | io mechanism (epoll, io_uring, ...) ,instead of 1000
             | threads each handling one request, there are much fewer
             | threads fighting over CPU cores and the 1 thread can stay
             | active much longer, hence reducing the amount of context
             | switches.
             | 
             | Especially with a mechanism like io_uring, which helps
             | minimize syscalls (and hence switching to kernel threads).
        
             | otherjason wrote:
             | Do you have a citation for kernel mode having more
             | efficient context switches? What kind of direct hardware
             | access are you referring to that would be better than
             | pushing the register context onto the stack?
             | 
             | In my experience, the exact opposite is true, particularly
             | in the era of CPU mitigations that require TLB flushes upon
             | every kernel-mode context switch.
        
               | messe wrote:
               | You're right, kernel-level context switching is much
               | slower than user-level context switching.
               | 
               | User-level can also have the advantage of having more
               | actual context about the task that is running, meaning
               | that it's often able to avoid saving/restoring as much
               | data as a kernel-level switch would. See Go's green
               | threads for a great example of this kind of cooperation
               | between runtime and language.
               | 
               | > Do you have a citation for kernel mode having more
               | efficient context switches? What kind of direct hardware
               | access are you referring to that would be better than
               | pushing the register context onto the stack?
               | 
               | The closest thing to this that I can think of is on
               | 32-bit x86 which did have hardware assisted context
               | switching via TSRs.
               | 
               | As it happens, everybody stopped using it because it was
               | too slow, and a bit painful unless you fully bought into
               | x86's awful segmentation model. Early Linux kernels use
               | it if you want to see it in action.
        
           | jmull wrote:
           | Thread context switches are about scheduling threads on
           | physical cores. Async is about executing in a scope on a
           | thread. There's no direct conversion here. Packing a thread
           | with async tasks could let you reduce the number of threads
           | you have, which will probably reduce the thread context
           | switching if there is contention for cores. Whether that's
           | significantly better than the cost of async context switches
           | really depends on specifics (even if async context switch
           | cost is near zero, though we're no longer talking about C++
           | coroutines in that case). But keep in mind: If your threads
           | are mostly waiting, they aren't triggering a lot of context
           | switches. More async means fewer busier threads, less async
           | means more less busy threads. To run into a problem with less
           | async your threads need to be forcing switching at pretty
           | fine-grained level. (BTW, most blocking for IO isn't fine-
           | grained from the perspective of instructions executing on a
           | core) That kind of thing happens, but has solutions other
           | than async.
        
         | naasking wrote:
         | > Anyway: don't use async because it's "more efficient".
         | 
         | I don't think efficiency has only one meaning. For instance,
         | you can create millions of coroutines while you're limited to
         | only thousands of threads. Doesn't that mean your program is
         | making better use of your hardware and thus is more efficient?
        
           | otabdeveloper4 wrote:
           | Coroutines and threads are not that different.
           | 
           | Threads usually preallocate some stack space, while
           | coroutines usually don't. But that isn't such a big deal.
        
             | josephg wrote:
             | Coroutines also don't need to flush the TLB when they get
             | scheduled. This is a pretty big deal for performance, when
             | you have a lot of them.
        
             | naasking wrote:
             | That's not the only difference. Coroutines also typically
             | don't have nearly as many concurrency hazards because you
             | can control when you yield.
        
               | dymk wrote:
               | coroutines can be either cooperative or not, depending on
               | implementation, so it depends
        
           | jmull wrote:
           | Sure, they _can be_ more efficient in _particular
           | situations_.
           | 
           | But coroutines have overhead (so do threads). Your millions
           | of coroutines might be worse than your thousands of threads
           | in terms of making efficient use of the cpu cores your app or
           | service has available.
        
           | nmilo wrote:
           | No, your processor can only do n things at a time, where n is
           | the number of cores (probably like 8) (simplifying hugely of
           | course). It's pretty easy to fill out 8 cores with 1000s of
           | threads, the coroutines only become useful when almost all of
           | them are blocking on a socket or something, periodically
           | switching on and off to handle it.
        
             | [deleted]
        
             | naasking wrote:
             | Yes, but contexts where you're creating a million
             | coroutines are almost certainly I/O bound (like a web
             | server).
        
         | KMag wrote:
         | Agreed in general, that async code is generally less efficient
         | than epoll/kqueue/poll/select/etc. non-blocking I/O. However,
         | note that async code is often quite a bit more efficient than
         | naive Java 1.3-style (pre-NIO) tons-of-blocked-threads code.
         | 
         | Now, io_uring async code is yet another story.
         | 
         | I was working at LimeWire when Apple finally shipped a 1.4 JVM
         | as their system JVM. The whole thing got much lighter weight
         | when we could use non-blocking I/O.
        
           | junon wrote:
           | Eh? Async code almost always uses the APIs you describe.
        
             | KMag wrote:
             | The async runtime is almost always built on top of non-
             | blocking I/O APIs, and the extra layer of abstraction adds
             | overhead. You're rarely calling epoll in your async code,
             | but rather your runtime is handling all of that for you.
             | 
             | That is, unless you're using io_uring, POSIX AIO, etc., in
             | which case your async code might not have the extra layer
             | of abstraction on top of the system APIs.
        
         | mr_00ff00 wrote:
         | But a single thread with async can, and for many tasks, is more
         | efficient that a single thread without async, because you
         | aren't waiting on requests.
         | 
         | One point also is that until recently most personal computers
         | did not have multiple cores, aka you could think at the lowest
         | level that even multiple threads were just more dangerous async
         | lol
         | 
         | Even making new threads in the JVM isn't guaranteed to result
         | in more than a single OS thread.
        
           | jcelerier wrote:
           | > One point also is that until recently most personal
           | computers did not have multiple cores,
           | 
           | core 2 duo has been released 17 years ago
        
             | jcranmer wrote:
             | Dual core _started_ coming out ~2005. It took a few more
             | years for it to dominate the product lineup. Computers tend
             | to have a replacement cycle of ~5 years. Consequently, it
             | 's not until about 2012-ish that you could reasonably
             | expect a personal computer to have a multicore computer.
             | 
             | (Yeah, that's still a decade ago, but that's a good deal
             | more recent than 17 years.)
        
               | dymk wrote:
               | A decade is a long time for computers
        
         | duped wrote:
         | Asynchronous execution is more efficient if and only if the
         | time spent waiting for a call to complete is longer than the
         | time spent checking if the call has completed plus the time
         | spent dispatching to another asynchronous call (and waiting vs
         | checking if it is completed, and so on).
         | 
         | Remember that concurrency is not parallelism, it's not about
         | the number of threads being used.
        
       | danuker wrote:
       | Why implement coroutines when you have multithreading?
       | 
       | Coroutines are to me a brain-twisty way to make the most out of a
       | single core, because your language doesn't support
       | multithreading. For example: JavaScript with a single thread in
       | the browser, Python with the GIL.
       | 
       | C++ has threads which ACTUALLY run in parallel on the CPU. Why
       | bother complicating the language further?
        
         | eestrada wrote:
         | Because native multithreading is expensive when it comes to (a)
         | memory consumption per thread of execution and (b) context
         | switching. Also, if your program spends a lot of time just
         | waiting on outside resources (most notably slow/blocking IO),
         | coroutine concurrency is bother faster and more lightweight. If
         | your program is CPU bound, then coroutines/async have little-
         | to-no value over native multithreading.
         | 
         | That being said, yeah, most async workflows are obtuse and
         | difficult to follow and they force all consumers of your
         | interface to also use the async primitives if they want the
         | benefits. Go seems to be one of the few languages that does
         | async/coroutines right.
         | 
         | See: https://eli.thegreenplace.net/2018/go-hits-the-
         | concurrency-n...
        
           | rcme wrote:
           | What I like about Go is that each co-routine is entirely
           | synchronous, and the only concurrency primitives you're given
           | are channels and mutexes. This makes your code so simple and
           | easy to understand.
           | 
           | I just looked up Kotlin's co-routines, and there are a number
           | of annoying hoops you need to jump through:
           | 
           | 1. Co-routines can only be built by calling runBlocking or
           | coroutineScope.
           | 
           | 2. Functions that are async buy should act like blocking
           | calls in a co-routine need to be marked with `suspend`
           | 
           | 3. There are a bunch of primitives you need to use like
           | `launch` or `async`.
           | 
           | 4. There are built-in options to make the co-routines lazy,
           | but it makes the API feel sloppy.
           | 
           | 5. Rather than locks, Kotlin encourages explicit thread
           | dispatching. Some times you need to do this in Go too, like
           | when using a C library, but in general it's rare.
        
             | ackfoobar wrote:
             | `launch` is just the `go` keyword. `async` stores the
             | result in a `Deferred` (or `Promise`), which is an
             | abstraction for parallel decomposition.
             | 
             | Other than confining UI work to the main thread, I don't
             | think I have seen thread confinement as an alternative to
             | locks.
        
               | rcme wrote:
               | That's kind of my point. Go has the `go` keyword. Kotlin
               | has runBlocking, coroutineScope, launch, async, and
               | suspend. Go's simplicity feels nicer to me.
        
               | ackfoobar wrote:
               | `runBlocking` and `suspend` is the coloured function
               | point. Some hate it, I don't mind it. It has its
               | benefits.
               | 
               | ---
               | 
               | A quick search on SO gives this example of parallel
               | decomposition.
               | https://stackoverflow.com/questions/57662739/pattern-for-
               | fet...
               | 
               | Starting several `async` tasks, then awaiting them is
               | cleaner for me. But that's subjective.
               | 
               | ---
               | 
               | Kotlin's structured concurrency handles the context stuff
               | by default. When you handle those concerns (cancellation
               | for example) in go, it's just as complex, but more
               | verbose.
        
               | rcme wrote:
               | Go's context object is used for cancellation. I think
               | this is pretty simple and has the benefit of avoiding any
               | type of exception handling while still giving you the
               | ability to clean up anything you were doing.
               | 
               | In general, if you like Kotlin, I can see why you'd like
               | their approach to concurrency. Kotlin really likes a
               | large standard library with generic blocks that act as
               | syntactic sugar. I used to like that too, but as I've
               | gotten older I've gotten lazier. Kotlin now how too much
               | mental overhead for my old brain.
        
             | int_19h wrote:
             | The other consequence, however, is that Go coroutines only
             | work with other Go code, and bespoke Go threads and stacks
             | require the runtime to jump through a lot of hoops to do
             | FFI correctly, with the result that the ecosystem tends to
             | rewrite rather than reuse existing code in C and other
             | languages.
             | 
             | In contrast, the approach with explicit promises and await
             | can be mapped all the way down to a C struct with a
             | function and a data pointer representing a callback. That
             | is, it is C ABI compatible, and thus it can be easily
             | composed across any boundary that respects that ABI.
        
               | rcme wrote:
               | The only runtime hoop I've ever had to jump through was
               | to invoke runtime.LockOSThread() inside a go-routine. In
               | a lot of ways:                   go func() {
               | runtime.LockOSThread()             // ...         }()
               | 
               | Is simpler than creating a thread in other languages.
        
               | int_19h wrote:
               | The runtime jumps through those hoops for you as needed,
               | mostly, but there are consequences to that wrt
               | performance.
        
           | PaulDavisThe1st wrote:
           | The cost of context switching between threads sharing the
           | same address space is much lower than between tasks, because
           | there's no need for a TLB flush. It just comes down to the
           | cost of saving/restoring register state (more or less), and
           | that's a nice known fixed cost on any given processor.
           | 
           | Context switching between tasks has a variable cost, because
           | the implications of the TLB flush depend on the working set
           | size of the switched-to task. If it doesn't touch much
           | memory, the variable cost is low; if it touches a lot, the
           | variable cost can be much, much higher than the register
           | save/restore (fixed cost).
        
             | nordsieck wrote:
             | > It just comes down to the cost of saving/restoring
             | register state (more or less), and that's a nice known
             | fixed cost on any given processor.
             | 
             | There's also the cost of cache eviction. Not something you
             | have to manually manage, but it's a cost you pay
             | nonetheless. Maybe.
        
               | PaulDavisThe1st wrote:
               | Indeed, I ought to have mentioned the cost of cache
               | misses post-context switch too. It too is working set
               | size dependent, but the relationship looks different from
               | the cost of TLB misses. However, my understanding is that
               | an inter-task thread switch would also not cause cache
               | invalidation (there's no need; the address space remains
               | the same).
        
         | GuB-42 wrote:
         | Coroutines are when you don't want to run in parallel, because
         | sequence matters. A typical example is iterators, you want the
         | loop and iterator (routine and coroutine) to run in lockstep.
         | 
         | You could use threads instead, but then you risk getting in all
         | sorts of concurrency problems. And if you synchronize
         | everything, then great, you have reinvented coroutines. In
         | fact, you can think of coroutines as threads with built-in
         | synchronization, which may be how they are implemented.
        
         | [deleted]
        
         | layer8 wrote:
         | Coroutines are for when you would normally use callbacks or a
         | state machine, but handling the callback's state or the state
         | machine is complex enough that it becomes much simpler when
         | expressed as a coroutine (simpler to reason about). It isn't
         | really an alternative to multithreading.
        
         | okamiueru wrote:
         | To more elegantly write code that is async in nature like
         | IO/Web requests? Let stuff happen and just await where needed?
         | 
         | Seems like different type of concern than what you need
         | multiprocessing for.
        
         | injidup wrote:
         | Coroutines don't really have anything to do with threads. They
         | are just a way for you to model suspending and resuming
         | functions.
         | 
         | If you have every tried to implement a complicated iterator you
         | will find that you model a for loop in your head but that the
         | iteration of the loop is controlled from outside using the ++
         | operator. The ++ operator allows the iterator loop to go one
         | more time round. Normally you have to manage this with state
         | stored in some object. With coroutines you can actually write a
         | loop and co_yield from the middle of the loop to implement the
         | ++ operator.
        
           | grogers wrote:
           | This is actually really convenient.
           | 
           | In our code we have an abstraction that wraps a set using a
           | variant. Instead of implementing an iterator/begin/end based
           | on which data the variant holds, we have an iter() method
           | that unwraps the variant and loops through and co_yields.
           | Since you can consume a generator in a for-each loop, you use
           | it the exact same way (except calling iter).
           | 
           | In our case it doesn't save that much complexity, but it's
           | definitely some. I could imagine more complicated cases where
           | it is even more convenient.
        
         | 112233 wrote:
         | Coroutines This is another can be used It as another way is
         | still possible to language construct like manage object
         | lifetime, and make some to use threads control flow much easier
         | if you have coroutines to write. lambdas.
        
         | [deleted]
        
         | [deleted]
        
         | bmurphy1976 wrote:
         | Good for embedded devices (e.g. a single core ESP32).
        
         | [deleted]
        
         | jcelerier wrote:
         | even without any threads or async, coroutines are very nice for
         | writing generators and super useful when you want generic code
         | which can iterate, without leaking the container implementation
         | to client code
        
         | steveklabnik wrote:
         | Why use a hammer when you already have screwdrivers?
        
         | com2kid wrote:
         | > Why implement coroutines when you have multithreading?
         | 
         | Because coroutines are easier to think about than threads, and
         | avoid entire (large) classes of bugs.
         | 
         | Because coroutines (typically) have a lower overhead than
         | threads.
         | 
         | Because if you are IO bound and not CPU bound, and therefore
         | you don't need to spread work across cores, threads may not
         | provide a given workload with much, or any, benefit.
        
         | demindiro wrote:
         | For an I/O heavy application (not in C++) I avoid threads for
         | the main "loop" as there is a central cache that is accessed
         | often. Synchronizing access to it is likely more expensive and
         | error-prone than using just a single thread. It is also
         | possible a few hundred or thousand tasks are running
         | concurrently, which becomes very expensive very quickly with
         | threads.
         | 
         | There are a few CPU-intensive tasks but those are easy to
         | isolate and send to a threadpool.
        
         | akmittal wrote:
         | Real threads can too expensive sometimes. A single coroutine in
         | Go just use few kb of memory. So we can easily spawn hundred of
         | thousands of coroutines but not possible using threads
        
           | dragontamer wrote:
           | Your point is correct but your magnitudes are off.
           | 
           | IIRC, a thread takes up ~32kB or so worth of data in
           | pthreads/Linux, or at least... somewhere around there. So on
           | a rather cheap server with 64GB of RAM, you can easily fit
           | 1-million threads. More expensive servers have 512GB, 1TB or
           | more RAM and can easily handle this kind of load.
           | 
           | Of course, coroutines are even smaller/more efficient and can
           | go into 10-million, 100-million or higher.
        
             | Matheus28 wrote:
             | A thread on Windows is MUCH heavier than Linux. 1 MB stack
             | from what I remember plus takes a lot longer to create.
        
               | dragontamer wrote:
               | dwStackSize can be set as low as 64kB though. So if
               | you're in the habit of making millions-of-threads, maybe
               | make them smaller on Windows?
               | 
               | It does default to 1MB, but these sorts of things are
               | configurable.
        
           | com2kid wrote:
           | Ugh I really need to finish my blog post where we had async
           | with only a few dozen bytes of overhead per task.
           | 
           | Cooperative async runtimes are obscenely efficient.
        
         | dragontamer wrote:
         | > C++ has threads which ACTUALLY run in parallel on the CPU.
         | Why bother complicating the language further?
         | 
         | Actual parallelism is surprisingly slow and resource heavy in
         | practice. When everything is on one core, you keep things local
         | in L3, L2, L1.
         | 
         | However, add on a 2nd core, then you suddenly have "ping
         | ponging". Lets say core#0 has data in L1 cache, and now Core#1
         | needs to read-modify-write to it. That means the cores need to:
         | 
         | 1: Core#1 begins to mark that cache-line as exclusive.
         | 
         | 2. Core#0 needs to then respond by marking the line as invalid.
         | Then Core#0 ejects the data from L1, and passes it to Core#1.
         | 
         | 3. Core#1 can finally begin to work on the data.
         | 
         | 4. If Core#0 uses the data again, Core#1 must do step#2 again.
         | 
         | --------
         | 
         | This is called "Ping-ponging". L1 cache read/writes is
         | 1-nanosecond, but ping-pongs can be 30nanoseconds or slower,
         | 30x slower for no reason.
         | 
         | You don't want to add another core to your problem unless
         | you're _actually_ getting a speed benefit. Its more complex
         | than it may seem at first glance. You very well could add a
         | bunch of cores and then suddenly your program is way slower
         | because of these kinds of issues.
         | 
         | Another problem: False sharing. Core#0 is working on "int x",
         | and Core#1 is working on "int y", but x and y are on the same
         | cacheline. So they have to ping-pong even though they never
         | actually touch the same data.
        
           | PaulDavisThe1st wrote:
           | Your example implies some combination of (a) excessive data
           | sharing between threads (b) possible need for core pinning.
           | 
           | If thread A is going to need to read _all_ the data touched
           | by thread B, then it 's unclear why you've split the task
           | across threads. If there are still good reasons, then
           | probably pin A and B to core N (never pin anything to core 0,
           | unrelated story), and let them interleave there.
           | 
           | If that doesn't make sense, then yep, you'll have to face the
           | cost of ping-ponging.
        
       | Night_Thastus wrote:
       | EDIT: I think I follow how this works now. I didn't understand
       | the sieve itself, so the co-routines was throwing me for a loop.
       | 
       | We get all the numbers from 2 to 40, print out 2, then eliminate
       | all the entries divisible by 2. Now we're left with 3, 5, 7, 9,
       | etc.
       | 
       | We print out 3, then remove all the entries divisible by 3.
       | 
       | This repeats until we've hit the end (40). The next number after
       | a round of eliminations will always be a prime.
        
         | layer8 wrote:
         | Yeah, the sieve use-case is a bit too clever IMO to be a good
         | introductory example.
        
       | zabzonk wrote:
       | perhaps the take here is that many features of modern c++ are not
       | intended for application writers, but more for library writers?
        
         | ghostwriter wrote:
         | They serve both to a great extent, as long as code owners are
         | willing to refactor their codebases. The changes do actually
         | make code significantly safer and more pleasant to work with.
         | My favourite ones are the concepts and class template argument
         | deductions.
        
         | steveklabnik wrote:
         | I can't speak to modern C++ specifically, but in many, many
         | languages, more complex features are often for library writers
         | than application writers.
        
         | corysama wrote:
         | That's definitely a thing. A whole lot of the features added
         | since C++11 have been specifically intended to enable library
         | writers to do very complex things under the hood so they can
         | present powerful yet simple interfaces to users.
        
         | mrfox321 wrote:
         | Users of coroutine-friendly libraries will have to adopt the
         | async function coloring (co_await, co_yield, co_return,
         | Task<T>).
         | 
         | So saying that coroutines are low-level / not application code
         | is misguided.
        
           | nly wrote:
           | Not necessarily. Coroutines are invisible at the ABI level
           | (that is coroutines are just functions).
           | 
           | It's pretty easy to write a library with them (using your own
           | executor) and not reveal their use to the user.
           | 
           | Obviously sharing an executor between application and library
           | is another matter.
        
             | Joker_vD wrote:
             | > sharing an executor between application and library is
             | another matter.
             | 
             | Especially when there is not one, but several libraries,
             | and the application itself doesn't actually has or uses any
             | executors, now that's a fun challenge.
        
             | mrfox321 wrote:
             | Good point.
             | 
             | Although doesn't that defeat the purpose of structured
             | concurrency?
             | 
             | i.e. creating tasks that are plumbed to some background
             | thread/executor.
             | 
             | By directly invoking the coroutine, lifetime issues become
             | much easier to avoid.
        
       | artemonster wrote:
       | The fact that they went for stackless is a testament to how bad
       | committee-driven design process is. just a bunch of bureaucrats
       | playing important actors, similarly to stackoverflow mods.
        
         | colejohnson66 wrote:
         | Can you ELI5 why stackless is bad?
        
           | artemonster wrote:
           | For coroutines to be _really_ useful, they have to be
           | stackful. The guy that originally did the proposal for C++ is
           | also an author of multiple coro libraries and did research on
           | this. You can emulate stackful coros with stackless via
           | trampoline, but heck, you can emulate stackless coros with
           | closures and trampoline too! The requirement of this extra
           | trampoline makes their use extra convoluted and negates the
           | very slight advantages that such functionality may really
           | bring. Making stackful  "right" was hard, so a useless
           | compromise was made, which is basically like adding a
           | syntactic sugar to the language.
        
             | [deleted]
        
             | int_19h wrote:
             | It may be a compromise, but what exactly makes it
             | "useless"? A similar style of async has been in use in e.g.
             | C# for over a decade now, and it has been very successful
             | there. Yes, it is syntactic sugar - but so are e.g.
             | lambdas. What matters in practice is that it makes some
             | kinds of code much shorter and easier to read.
        
               | artemonster wrote:
               | stackful coroutines is a feature that is on par with
               | power of delimited continuations (proven fact in
               | academia, you can easily find somewhat easy to follow
               | papers on this topic), stackless coroutines is a stupid
               | gimmick. You see the "compromise"? You have argued to
               | have a case to add a car, but after long debate, politics
               | and "compomsie" you got a _TOY_ car instead.
        
         | [deleted]
        
       | malkia wrote:
       | I'm only sitting here, waiting for my thread_locals to get
       | broken.... or maybe they are handled magically somehow
        
         | jcelerier wrote:
         | .. why would they be broken ? coroutines are not related to
         | threads
        
           | grogers wrote:
           | I think the point is that if the coroutine suspends it could
           | be resumed on a different thread. So you don't really want to
           | mix thread locals and coroutines.
        
             | jcelerier wrote:
             | I mean, for sure but that's not a property of coroutines,
             | any callback system can do that
             | 
             | e.g. if you have this code with boost:
             | #include <boost/asio.hpp>         #include <chrono>
             | #include <vector>         #include <thread>
             | #include <iostream>         std::mutex print_mutex;
             | int main()         {           using namespace boost::asio;
             | using namespace std::literals;
             | io_context io_context(20);           auto work_guard =
             | make_work_guard(io_context);            for(int i = 0; i <
             | 100; i++) {             post(io_context, [&] {
             | thread_local int x = 0;               int z = x++;
             | std::lock_guard _{print_mutex};               std::cout <<
             | z << "\n";             });           }
             | std::vector<std::jthread> v;           for(int i = 0; i <
             | 20; i++)              v.emplace_back([&] {
             | io_context.run_for(2s); });         }
             | 
             | i hope you don't expect the output to be "0, 1, 2, ..., 99"
        
       ___________________________________________________________________
       (page generated 2023-02-22 23:01 UTC)