[HN Gopher] Virtual Threads: New Foundations for High-Scale Java...
       ___________________________________________________________________
        
       Virtual Threads: New Foundations for High-Scale Java Applications
        
       Author : axelfontaine
       Score  : 85 points
       Date   : 2022-09-29 18:03 UTC (4 hours ago)
        
 (HTM) web link (www.infoq.com)
 (TXT) w3m dump (www.infoq.com)
        
       | samsquire wrote:
       | This is good.
       | 
       | I implemented a userspace 1:M:N timeslicing thread, kernel thread
       | to lightweight thread multiplexer in Java, Rust and C.
       | 
       | I preempt hot for and while loops by setting the looping variable
       | to the limit from the kernel multiplexing thread.
       | 
       | It means threads cannot have resource starvation.
       | 
       | https://github.com/samsquire/preemptible-thread
       | 
       | The design is simple. But having native support as in Loom is
       | really useful.
        
         | mattgreenrocks wrote:
         | I like it! Do you have any sense for what the perf hit is for
         | making those loops less hot to enable pre-emption?
        
           | samsquire wrote:
           | There is no if statement in the hot loop or in the kernel
           | thread so there is no performance cost there.
           | 
           | The multiplexing thread is separate to the kernel thread so
           | you could say it's 1:M:N thread scheduling. I should have
           | been clearer in my comment. There is 3 types of threads.
           | 
           | The multiplexing thread timeslices the preemption of the
           | lightweight threads and kernel threads every 10 milliseconds.
           | That is it stops all the loops in the lightweight thread and
           | in the lightweight thread and causes the next lightweight
           | thread to execute.
           | 
           | So there is no overhead except for a structure variable
           | retrieval in the loop body
           | 
           | Rather than
           | 
           | For(int I = 0; I < 1000000; I++) {
           | 
           | }
           | 
           | We have
           | 
           | Register_loop(thread_loops, 0, 0, 1000000);
           | 
           | For(; thread_loops[0].index < thread_loops[0].limit;
           | thread_loops[0].index++) {
           | 
           | }
           | 
           | Handle_virtual_interrupt()
           | 
           | And in the thread multiplexer scheduler, we do this
           | 
           | thread_loops[0].index = thread_loops[0].limit
        
       | lenkite wrote:
       | Really hope this makes it to Android. (probably need to wait for
       | a decade or two though)
        
       | Blackthorn wrote:
       | So happy this is finally coming out! After years of using the
       | library that inspired this (fibers), I'm so stoked this is coming
       | to the wide outside world of Java. There's just no comparison in
       | how understandable and easy to program and debug this is compared
       | to callback and event based programming.
        
       | jeffbee wrote:
       | " Operating systems typically allocate thread stacks as
       | monolithic blocks of memory at thread creation time that cannot
       | be resized later. This means that threads carry with them
       | megabyte-scale chunks of memory to manage the native and Java
       | call stacks."
       | 
       | This extremely common misconception is not true of Linux or
       | Windows. Both Windows and Linux have demand-paged thread stacks
       | whose real size ("committed memory" in Windows) is minimal
       | initially and grows when needed.
        
         | uluyol wrote:
         | Do they shrink too? How many threads can be created before
         | address space is exhausted (even if the memory isn't backed by
         | pages, the address space is still reserved)?
        
           | jeffbee wrote:
           | You'll run out of physical memory for the first page of the
           | stack long before you run out of room in the virtual address
           | space.
        
           | tedunangst wrote:
           | The stack for any thread other than the first is just memory
           | like any other allocation. You can free it, resize it, copy
           | it elsewhere, whatever you want to do. Literally just a
           | pointer in a register. People work up weird mythologies about
           | it, but the stack can be anything you want if you're willing
           | to write code to manage it.
        
       | ccooffee wrote:
       | This is a great writeup, and reignites my interest in Java. (I've
       | long considered "Java Concurrency in Practice" to be the _best_
       | Java book ever written.)
       | 
       | I haven't been able to figure out how the "unmount" of a virtual
       | thread works. As stated in this article:
       | 
       | > Nearly all blocking points in the JDK have been adapted so that
       | when encountering a blocking operation on a virtual thread, the
       | virtual thread is unmounted from its carrier instead of blocking.
       | 
       | How would I implement this logic in my own libraries? The
       | underlying JEP 425[0] doesn't seem to list any explicit APIs for
       | that, but it does give other details not in the OP writeup.
       | 
       | [0] https://openjdk.org/jeps/425
        
         | ecshafer wrote:
         | Java Concurrency in Practice is a fantastic book. I had DL as a
         | professor for about a half dozen courses in undergrad,
         | including Concurrent and Parallel Programming. Absolutely
         | fantastic professor, with a lot of insight into how parallel
         | programming really works at the language level. One of the best
         | courses I've taken.
        
           | anonymousDan wrote:
           | Yeah Java gets a lot of grief, but I learned a lot about
           | concurrent programming from making sure I really understood
           | every line of code in this book.
        
         | pron wrote:
         | > How would I implement this logic in my own libraries?
         | 
         | There's no need to if your code is in Java. We had to change
         | low-level I/O in the JDK because it drops down to native.
         | 
         | That's not to say every Java library is virtual-thread-
         | friendly. For one, there's the issue of pinning (see the JEP)
         | that might require small changes (right now the problem is most
         | common in JDBC drivers, but they're already working on
         | addressing it). The bigger issue, mostly in low-level
         | frameworks, is implicit assumptions about a small number of
         | shared threads, whereas virtual threads are plentiful and are
         | never pooled, so they're never shared. An example of such an
         | issue is in Netty, where they allocate very large _native_
         | buffers and cache them in ThreadLocals, which assumes that the
         | number of threads is low, and that they 're reused by lots of
         | tasks.
        
           | _benedict wrote:
           | Conversely, some applications would like a leaky abstraction
           | they have some control over. Some caching will likely remain
           | beneficial to link to a carrier thread.
           | 
           | As a member of the Cassandra community I'm super excited to
           | get my hands on virtual threads come the next LTS (and
           | Cassandra's upgrade cycle), as it will permit us to solve
           | many outstanding problems much more cheaply.
           | 
           | I hope by then we'll also have facilities for controlling the
           | scheduling of virtual threads on carrier threads. I would
           | rather not wait another LTS cycle to be able to make proper
           | use of them.
        
         | Nzen wrote:
         | I don't know how they did it, but you could use that jep id as
         | a query in the jdk issue tracker [0], and then use the issue
         | tracker id to find the corresponding github issue [1]. (I had
         | hoped for commits with that prefix, but there don't seem to be
         | any for that issue.)
         | 
         | [0]
         | https://bugs.openjdk.org/browse/JDK-8277131?jql=issuetype%20...
         | 
         | [1] https://github.com/openjdk/jdk/pull/8787
        
         | chrisseaton wrote:
         | > I haven't been able to figure out how the "unmount" of a
         | virtual thread works.
         | 
         | The native stack is just memory like any other, pointed to by
         | the stack pointer. You can unmount one stack and mount another
         | by changing the stack pointer. You can also do it by copying
         | the stack out to a backing store, and copying the new thread's
         | stack back in. I think the JVM does the latter, but not an
         | expert.
        
         | galaxyLogic wrote:
         | Seems like a good development. I've been doing Node.js for last
         | few years after letting go of Java. But there's something
         | uneasy about async/await. For one thing it's difficult to debug
         | how the async functions interact.
        
         | geodel wrote:
         | Well one way is to replace "synchronized" blocks with
         | ReentrantLocks where ever you can.
        
           | cvoss wrote:
           | I would guess that LockSupport.park() and friends have also
           | been adapted to support virtual thread unmounting.
        
         | cypressious wrote:
         | Does you library use any of the JDK's blocking APIs like
         | Thread.sleep, Socket or FileInputStream directly or
         | transitively? If so, it is already compatible. The only thing
         | you should check is if you're using monitors for
         | synchronization which are currently causing the carrier thread
         | to get pinned. The recommendation is to use locks instead.
        
       | mikece wrote:
       | How does this compare to Processes in Elixir/Erlang -- is Java
       | now as lightweight and performant?
        
       | geodel wrote:
       | I think it is really important development in Java space. One
       | reason I plan to use it soon is because it does not bring in
       | complex programing model of "reactive world" and hence dependency
       | on tons of reactive libraries.
       | 
       | I tried moving plain old Tomcat based service to scalable netty
       | based reactive stack but it turned out to be too much work and an
       | alien programing model. With Loom/Virtual thread, the only thing
       | I will be looking for server supporting Virtual threads natively.
       | Helidon Nima would fit the bill here as all other frameworks/app
       | servers have so far just slapping virtual threads on their thread
       | pool based system. And unsurprisingly it is not leading to great
       | perf expected from Virtual thread based system.
        
         | RcouF1uZ4gsC wrote:
         | > How long until OS vendors introduce abstractions to make this
         | easier? Why aren't there OS-native green threads, or at the
         | very least user-space scheduling affordances for runtimes that
         | want to implement them without overhead in calling blocking
         | code?
         | 
         | Windows had has Fibers[0] for decades (IIRC since 1996 with
         | Windows NT 4.0)
         | 
         | 0. https://learn.microsoft.com/en-
         | us/windows/win32/procthread/f...
        
       | anonymousDan wrote:
       | Copying virtual stacks on a context switch sounds kind of
       | expensive. Any performance numbers available? Maybe for very deep
       | stacks there are optimizations whereby you only copy in deeper
       | frames lazily under the assumption they won't be used yet? Also,
       | what is the story with preemption - if a virtual thread spins in
       | an infinite loop, will it effectively hog the carrier thread or
       | can it be descheduled? Finally, I would be really interested to
       | see the impact on debugability. I did some related work where we
       | were trying to get the JVM to run on top of a library operating
       | system and a libc that contained a user level threading library.
       | Debugging anything concurrency related became a complete
       | nightmare since all the gdb tooling only really understood the
       | underlying carrier threads.
       | 
       | Having said all that, this sounds super cool and I think is 100%
       | the way to go for Java. Would be interesting to revisit the
       | implementation of something like Akka in light of this.
        
       | thom wrote:
       | So right now it seems like you can replace the thread pool
       | Clojure uses for futures etc with virtual threads and go ham. You
       | could even write an alternative go macro to replace the bits of
       | core.async where you're not supposed to block. Feels like Clojure
       | could be poised to benefit the most here, and what a delight it
       | is to have such a language on a modern runtime that still gets
       | shiny new features!
        
       | gigatexal wrote:
       | Reading through the source code examples has me rethinking my
       | dislike for Java. It sure seems far less verbose and kinda nice
       | actually.
        
         | marginalia_nu wrote:
         | Modern Java is _a lot_ less boilerplaty than old enterprise
         | Java.
        
       | mgraczyk wrote:
       | The section "What about async/await?", which compares these
       | virtual threads to async/await is very weak. After reading this
       | article, I came away with the impression that this is a
       | dramatically worse way to solve this problem than async/await.
       | The only benefit I see is that this will be simpler to use for
       | the (increasingly rare) programmers who are not used to async
       | programming.
       | 
       | The first objection in the article is that with async/await you
       | to may forget to use an async operation and could instead use a
       | synchronous operation. This is not a real problem. Languages like
       | JavaScript do not have any synchronous operations so you can't
       | use them by mistake. Languages like python and C# solve this with
       | simple lint rules that tell you if you make this mistake.
       | 
       | The second objection is that you have to reimplement all library
       | functions to support await. This is a bad objection because you
       | also have to do this for virtual threads. Based on how long it
       | took to add virtual threada to Java vs adding async/await to
       | other languages, it seems like virtual threads were much more
       | complicated to implement.
       | 
       | The programming model here sounds analogous to using gevent with
       | python vs python async/await. My opinion is that the gevent
       | approach will die out completely as async/await becomes better
       | supported and programmers become more familiar.
       | 
       | EDIT: Looking more at the "Related Work" section at the bottom. I
       | think I understand the problem here. The "Structured Concurrency"
       | examples are unergonomical versions of async/await. I'm not sure
       | what I'm missing but this seems like a strictly worse way to
       | write structured concurrent code.
       | 
       | Java example:                   Response handle() throws
       | ExecutionException, InterruptedException {             try (var
       | scope = new StructuredTaskScope.ShutdownOnFailure()) {
       | Future<String>  user  = scope.fork(() -> findUser());
       | Future<Integer> order = scope.fork(() -> fetchOrder());
       | scope.join();           // Join both forks
       | scope.throwIfFailed();  // ... and propagate errors
       | // Here, both forks have succeeded, so compose their results
       | return new Response(user.resultNow(), order.resultNow());
       | }         }
       | 
       | Python equivalent                   async def handle() ->
       | Response:           # scope is implicit, throwing on failure is
       | implicit.           user, order = await
       | asyncio.gather(findUser(), findOrder())                return
       | Response(user, order)
       | 
       | You could probably implement a similar abstraction in Java, but
       | you would need to pass around and manage the the scope object,
       | which seems cumbersome.
        
         | wtetzner wrote:
         | I can see you having objections to their arguments against
         | async/await, but what makes you say async/await is somehow the
         | better solution?
        
           | mgraczyk wrote:
           | There are a few reasons.
           | 
           | async/await allows you to do multiple things in parallel. I
           | don't see how you can do that in the virtual threading model,
           | although I haven't used it and only read this article. You
           | would have to spin up threads and wait for them to finish,
           | which IMO is much more complicated and hard to read.
           | 
           | javascript                   async function doTwoThings() {
           | await Promise.all([             doThingOne(),
           | doThingTwo(),           ]);         }
           | 
           | python                   async def do_two_things() {
           | await asyncio.gather(             do_thing_one(),
           | do_thing_two(),           );         }
           | 
           | Another issue is building abstractions on top of this. For
           | example how do you implement "debounce" using virtual
           | threads? You end up unnaturally reimplementing async/await
           | anyway.
           | 
           | Finally it's generally much easier implement new libraries
           | with a promise/future based async/await system than with a
           | system based on threads, but I'm not familiar enough with
           | Java to know whether this is actually a good objection. It's
           | possible they make it really easy.
        
             | Jtsummers wrote:
             | > async/await allows you to do multiple things in parallel.
             | I don't see how you can do that in the virtual threading
             | model, although I haven't used it and only read this
             | article.
             | 
             | The description of this is that the virtual threads can
             | move between platform threads, quoting from the article:
             | 
             | > The operating system only knows about platform threads,
             | which remain the unit of scheduling. To run code in a
             | virtual thread, the Java runtime arranges for it to run by
             | mounting it on some platform thread, called a carrier
             | thread. Mounting a virtual thread means temporarily copying
             | the needed stack frames from the heap to the stack of the
             | carrier thread, and borrowing the carriers stack while it
             | is mounted.
             | 
             | > When code running in a virtual thread would otherwise
             | block for IO, locking, or other resource availability, it
             | can be unmounted from the carrier thread, and any modified
             | stack frames copied are back to the heap, freeing the
             | carrier thread for something else (such as running another
             | virtual thread.) Nearly all blocking points in the JDK have
             | been adapted so that when encountering a blocking operation
             | on a virtual thread, the virtual thread is unmounted from
             | its carrier instead of blocking.
             | 
             | This allows for parallelism so long as the system is
             | multicore and the JVM has access to multiple parallel
             | threads to distribute the virtual threads across.
        
               | mgraczyk wrote:
               | Two separate threads run in parallel, but one thread
               | cannot do two subtasks in parallel without submitting
               | parallel jobs to an executor or a StructuredTaskScope
               | subtask manager. It's basically forcing the developer to
               | do all the hard work and boilerplate that async/await
               | saves you.
        
               | pron wrote:
               | > It's basically forcing the developer to do all the hard
               | work and boilerplate that async/await saves you.
               | 
               | It doesn't. Both require the exact same kind of
               | invocation by the user. Neither automatically
               | parallelises operations that aren't explicitly marked for
               | parallelisation.
        
             | e63f67dd-065b wrote:
             | > async/await allows you to do multiple things in parallel.
             | I don't see how you can do that in the virtual threading
             | model, although I haven't used it and only read this
             | article. You would have to spin up threads and wait for
             | them to finish, which IMO is much more complicated and hard
             | to read
             | 
             | I think there's a fundamental point of confusion here. In
             | both python and JS, you can't do _anything_ in parallel,
             | since node /v8 and cpython are single-threaded (yes if you
             | dip down into C you can spawn threads to your heart's
             | content). You can only do them concurrently, since only
             | when a virtual thread blocks can you move on and schedule
             | another thread in your runtime.
             | 
             | In c++ (idk the java syntax, imagine these are runtime
             | threads):                   std::thread t1(doThingOne,
             | arg1);         std::thread t2(doThingTwo, arg2);
             | t1.join();         t2.join();         // boost has a
             | join_all
             | 
             | I'm sure there's some kind of `join_all` function in Java
             | somewhere. Imo this is even more clear than your async
             | await example: we have a main thread, it spawns two
             | children, and then waits until they're done before
             | proceeding.
             | 
             | The traditional problem with async/await is that it forces
             | a "are your functions red or blue" decision up-front (see
             | classic essay
             | https://journal.stuffwithstuff.com/2015/02/01/what-color-
             | is-...).
             | 
             | > Finally it's generally much easier implement new
             | libraries with a promise/future based async/await system
             | than with a system based on threads
             | 
             | How so? Having written a bunch of libraries myself, I have
             | to say that not worrying about marking functions as async
             | or not is a great boon to development. Just let the runtime
             | handle it.
        
               | mgraczyk wrote:
               | The first part is semantics, yes I understand that python
               | is running one OS thread at a time with a GIL (for now).
               | Just pretend I used the word "concurrent" instead of
               | parallel in all the places necessary to remove the
               | semantic disagreement.
               | 
               | Whether threads and joining vs async/await is clearer is
               | a matter of taste and familiarity. I find async/await
               | much more clear because that's what I am more used to.
               | Others will disagree, that's fine. I suspect more people
               | will prefer async/await as time goes on but that's my
               | opinion.
               | 
               | > not worrying about marking functions as async or not is
               | a great boon to development.
               | 
               | I don't really see why this is a big deal. You can change
               | the function and callers can change their callsite. There
               | are automated lint steps for this in python and
               | javascript that I use all the time. It's not any
               | different to me than adding an argument or changing a
               | function name.
        
               | gbear605 wrote:
               | Part of the difference with Java is that a lot of
               | libraries haven't changed in twenty years because they
               | already work. Adding async/await would probably mean
               | writing an entirely new library and scrapping the old
               | already working code, while green threads allow the old
               | libraries to silently become better.
        
             | spullara wrote:
             | The only difference between how virtual threads work and
             | how async/await work is that you don't need to use await
             | and don't need to declare async. Just call .get() on a
             | Future when you need a value - that is basically "await".
             | void doTwoThings() {           var f1 = doThingOne();
             | var f2 = doThingTwo();           var thingOne = f1.get();
             | var thingTwo = f2.get();         }
        
               | mgraczyk wrote:
               | How do you implement doThingOne?
               | 
               | You should read the "structured concurrency" link in the
               | article. You have to explicitly wrap the call to
               | doThingOne in a future under a structured concurrency
               | scope. The code example you wrote is not going to be
               | possible in Java without implementing doThingOne in a
               | complicated way.
        
               | merb wrote:
               | great, now you have futures AND virtual threads. soo much
               | better!
        
             | Scarbutt wrote:
             | _async /await allows you to do multiple things in parallel.
             | I don't see how you can do that in the virtual threading
             | model_
             | 
             | When comparing to JS, it is the other way around. Unless
             | you are talking about IO bound tasks only where nodejs
             | delegates to a thread pool (libuv).
        
               | mgraczyk wrote:
               | With virtual threads, you need to write fork/join code to
               | do two subtasks. With async await, you call two async
               | functions and await them. So the virtual threading model
               | ends up requiring something that looks like a worse
               | version of async await to me.
        
               | Jtsummers wrote:
               | t1 = async(Task1)       t2 = async(Task2)       await t1
               | await t2            t1 = fork(Task1)       t2 =
               | fork(Task2)       t1.join()       t2.join()
               | 
               | What's the difference?
        
               | mgraczyk wrote:
               | If Java adds some nice standardized helpers like this,
               | they will look equivalent. The current proposal is not
               | this clean but that doesn't mean it won't be possible.
               | The key difference is that async/await implies
               | cooperative multitasking. Nothing else happens on the
               | thread until you call await. I find that an easier model
               | to think about, and I opt into multithreading when I need
               | it.
               | 
               | Anyway Rust does this roughly using roughly the syntax
               | you described (except no need to call "fork"). Languages
               | that use async/await do not require you to say "async" at
               | the call site.
        
         | Eduard wrote:
         | > Languages like JavaScript do not have any synchronous
         | operations so you can't use them by mistake.
         | 
         | Can you explain what you mean by this? Isn't it the opposite -
         | Javascript has a synchronous execution model?
        
           | vlovich123 wrote:
           | Aside from NodeJS-specific APIs, JS as a whole does not
           | generally have any synchronous I/O, locks, threads etc.
           | SharedArrayBuffer is probably the notable exception as it can
           | be used to build synchronous APIs that implement that
           | functionality if I'm not mistaken.
           | 
           | Unless by synchronous you meant single threaded in which case
           | JS is indeed single threaded normally (unless you're using
           | things like Web Workers).
        
           | mgraczyk wrote:
           | I mean what the article calls a "synchronous blocking
           | method", which javascript (mostly) does not have.
        
         | pron wrote:
         | async/await require yet another world that's parallel to the
         | "thread" world but requires its own "colour" and set of APIs.
         | So now you have two kinds of threads, to kinds of respective
         | APIs, and two kinds of the same concept that has to be known by
         | all of your tools (debuggers, profilers, stacktraces).
         | 
         | > This is a bad objection because you also have to do this for
         | virtual threads
         | 
         | No. We had to change _a bit_ of the implementation -- at the
         | very bottom -- but none of the APIs, as there is no viral async
         | colour that requires doubling all the APIs.
         | 
         | You're right that implementing user-mode threads is much more
         | work than async/await, which could be done in the frontend
         | compiler if you don't care about tool support (although we very
         | much do), but the result dominates async/await in languages
         | that already have threads (there are different considerations
         | in JS) as you keep all your APIs and don't need a duplicate
         | set, and a lot of existing code tools just work (with
         | relatively easy changes to accommodate for a very high number
         | of threads).
         | 
         | > The "Structured Concurrency" examples are unergonomical
         | versions of async/await.
         | 
         | They're very similar, actually.
         | 
         | We've made the Java example very explicit, but that code would
         | normally be written as:                   Response handle()
         | throws ExecutionException, InterruptedException {
         | try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
         | var user  = scope.fork(() -> findUser());                 var
         | order = scope.fork(() -> fetchOrder());
         | scope.join().throwIfFailed();                 return new
         | Response(user.resultNow(), order.resultNow());             }
         | }
         | 
         | But when the operations are homogenous, i.e. all of the same
         | type rather than different types as in the example above (Java
         | is typed), you'll do something like:                   try (var
         | scope = new StructuredTaskScope.ShutdownOnFailure()) {
         | var fs  = myTasks.stream().map(scope::fork).toList();
         | scope.join().throwIfFailed();             return
         | fs.map(Future:resultNow).toList();         }
         | 
         | Of course, you can wrap this in a higher level `gather`
         | operation, but we wanted to supply the basic building blocks in
         | the JDK. You're comparing a high-level library to built-in JDK
         | primitives.
         | 
         | Work is underway to simplify the simple cases further so that
         | you can just use the stream API without an explicit scope.
        
           | mgraczyk wrote:
           | This makes sense, especially the bit about tooling. I'm
           | unfamiliar with the state of Java tooling besides very simple
           | tasks.
           | 
           | On the other hand using things like debuggers and reading
           | stack traces in python/js "just work" for me. Maybe because
           | the tooling and the language have evolved together over a
           | longer period of time.
           | 
           | I also feel like the reimplementation of all functions to
           | support async is not a big deal because the actual pattern is
           | generally very simple. You can start by awaiting every async
           | function at the call site. New libraries can be async only.
        
             | pron wrote:
             | > On the other hand using things like debuggers and reading
             | stack traces in python/js "just work" for me. Maybe because
             | the tooling and the language have evolved together over a
             | longer period of time.
             | 
             | Well, Python and JS don't have threads, so async/await are
             | their only concurrency construct, and it's supported by
             | tools. But Java has had tooling that works with threads for
             | a very long time. Adding async/await would have required
             | teaching all of them about this new construct, not to
             | mention the need for duplicate APIs.
             | 
             | > I also feel like the reimplementation of all functions to
             | support async is not a big deal because the actual pattern
             | is generally very simple. You can start by awaiting every
             | async function at the call site.
             | 
             | First, you'd still need to duplicate existing APIs. Second,
             | the async/await (cooperative) model is inherently inferior
             | to the thread model (non-cooperative) because scheduling
             | points must be statically known. This means that adding a
             | blocking (i.e. async) operation to an existing subroutine
             | requires changing all of its callers, who might be
             | implicitly assuming there can't be a scheduling point. The
             | non-cooperative model is much more composable, because any
             | subroutine can enforce its own assumptions on scheduling:
             | If it requires mutual exclusion, it can use some kind of
             | mutex without affecting any of the subroutines it calls or
             | any that call it.
             | 
             | Of course, locks have their own composability issues, but
             | they're not as bad as async/await (which correspond to a
             | single global lock everywhere except around blocking, i.e.
             | async, calls)
             | 
             | So when is async/await more useful than threads? When you
             | add it to an existing language that didn't have threads
             | before, and so already had an implicit assumption of no
             | scheduling points anywhere. That is the case of JavaScript.
             | 
             | > New libraries can be async only.
             | 
             | But why if you already have threads? New libraries get to
             | enjoy high-scale concurrency and old libraries too!
        
               | mgraczyk wrote:
               | I agree with your point that for CPU bound tasks, the
               | threading model is going to result in better performing
               | code with less work.
               | 
               | As for the point about locks, I think this one is also a
               | question of IO-bound vs CPU bound work. For work that is
               | CPU bottlenecked, there is a performance advantage to
               | using threads vs async/await.
               | 
               | As for the tooling stuff, I'm still not really convinced.
               | Python has almost always had threads and I've worked on
               | multimillion line codebases that were in the process of
               | migrating from thread based concurrency to async/await.
               | Now JS also has threads (workers). I also use coroutines
               | in C++ where threads have existed for a long time. I've
               | never had a problem debugging async/await code in these
               | languages, even with multiple threads. I guess I just
               | have had good experiences with tooling but It doesn't
               | seem that hard to retrofit a threaded language like
               | C++/Python.
        
               | pron wrote:
               | > I guess I just have had good experiences with tooling
               | but It doesn't seem that hard to retrofit a threaded
               | language like C++/Python.
               | 
               | But why would you want to if you can make threads
               | lightweight (which, BTW, is not the case for C++)? By
               | adding async/await on top of threads you're getting
               | another incompatible and disjoint world that provides --
               | at best -- the same abstraction as the one you already
               | have.
        
               | mgraczyk wrote:
               | I think the async/await debugging experience is easier to
               | understand. For example in the structured concurrency
               | example, it seems like it would require a lot of tooling
               | support to get a readable stack trace for something like
               | this (in python)
               | 
               | Code                   import asyncio              async
               | def right(directions):           await
               | call_tree(directions)              async def
               | left(directions):           await call_tree(directions)
               | async def call_tree(directions):           if
               | len(directions) == 0:             raise Exception("call
               | stack");                if directions[0]:
               | await left(directions[1:])           else:
               | await right(directions[1:])              directions = [0,
               | 1, 0, 0, 1]         asyncio.run(call_tree(directions))
               | 
               | Trace                   Traceback (most recent call
               | last):           File "/Users/mgraczyk/tmp/test.py", line
               | 19, in <module>
               | asyncio.run(call_tree(directions))           File "/usr/l
               | ocal/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framewo
               | rk/Versions/3.9/lib/python3.9/asyncio/runners.py", line
               | 44, in run             return
               | loop.run_until_complete(main)           File "/usr/local/
               | Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Ve
               | rsions/3.9/lib/python3.9/asyncio/base_events.py", line
               | 647, in run_until_complete             return
               | future.result()           File
               | "/Users/mgraczyk/tmp/test.py", line 16, in call_tree
               | await right(directions[1:])           File
               | "/Users/mgraczyk/tmp/test.py", line 4, in right
               | await call_tree(directions)           File
               | "/Users/mgraczyk/tmp/test.py", line 14, in call_tree
               | await left(directions[1:])           File
               | "/Users/mgraczyk/tmp/test.py", line 7, in left
               | await call_tree(directions)           File
               | "/Users/mgraczyk/tmp/test.py", line 16, in call_tree
               | await right(directions[1:])           File
               | "/Users/mgraczyk/tmp/test.py", line 4, in right
               | await call_tree(directions)           File
               | "/Users/mgraczyk/tmp/test.py", line 16, in call_tree
               | await right(directions[1:])           File
               | "/Users/mgraczyk/tmp/test.py", line 4, in right
               | await call_tree(directions)           File
               | "/Users/mgraczyk/tmp/test.py", line 14, in call_tree
               | await left(directions[1:])           File
               | "/Users/mgraczyk/tmp/test.py", line 7, in left
               | await call_tree(directions)           File
               | "/Users/mgraczyk/tmp/test.py", line 11, in call_tree
               | raise Exception("call stack");         Exception: call
               | stack
        
               | pron wrote:
               | No, the existing tooling will give you such a stack trace
               | already (and you don't need any `async` or `await`
               | boilerplate, and you can even run code written and
               | compiled 25 years ago in a virtual thread). But you do
               | realise that async/await and threads are virtually the
               | same abstraction. What makes you think implementing
               | tooling for one would be harder than for the other?
        
               | mgraczyk wrote:
               | How does the tooling know to hide the call to "fork" in
               | the scoped task example?
        
       | smasher164 wrote:
       | After all the hoopla surrounding concurrency models, it seems
       | that languages are conceding that green threads are more
       | ergonomic to work with. Go and Java have it, and now .NET is even
       | experimenting with it.
       | 
       | How long until OS vendors introduce abstractions to make this
       | easier? Why aren't there OS-native green threads, or at the very
       | least user-space scheduling affordances for runtimes that want to
       | implement them without overhead in calling blocking code?
        
         | Jtsummers wrote:
         | > Why aren't there OS-native green threads, or at the very
         | least user-space scheduling affordances for runtimes that want
         | to implement them without overhead in calling blocking code?
         | 
         | Green threads are, definitionally, _not_ OS threads, they are
         | user space threads. So you will _never_ see OS-native green
         | threads as it 's an oxymoron. The way many green thread systems
         | work is to either lie to you (you really only have one OS
         | thread, the green threads exist to write concurrent code, which
         | can be much simpler, but not _parallel_ code using Pike 's
         | distinction), or to introduce multiple OS threads ("carrier
         | threads" in the terms of this article) which green threads are
         | distributed across (this is what Java is doing here, Go has
         | done for a long time, BEAM languages for a long time, and many
         | others).
         | 
         | EDIT:
         | 
         | To extend this, many people think of "green threads" as
         | lightweight threading mechanisms. That's kind of accurate for
         | many systems, but not always true. If that's the sense that's
         | meant, then OS-native lightweight threads are certainly
         | possible in the future. But there's probably not much reason to
         | add them when user space lightweight concurrency mechanisms
         | already exist, and there's no consensus on _which_ ones are
         | "best" (by whatever metric).
        
           | smasher164 wrote:
           | > If that's the sense that's meant
           | 
           | Yeah that's what I meant, a lightweight threading mechanism
           | provided by the OS.
           | 
           | > there's probably not much reason to add them when user
           | space lightweight concurrency mechanisms already exist
           | 
           | Yeah... I don't think there's consensus on that. It seems
           | that many people find OS threads to be an understandable
           | concurrency model, but find them too heavyweight. So the
           | languages end up introducing other abstractions at either the
           | type-level (which has other benefits mind you!) or runtime to
           | compensate.
        
           | Sakos wrote:
           | > To extend this, many people think of "green threads" as
           | lightweight threading mechanisms. That's kind of accurate for
           | many systems, but not always true. If that's the sense that's
           | meant, then OS-native lightweight threads are certainly
           | possible in the future. But there's probably not much reason
           | to add them when user space lightweight concurrency
           | mechanisms already exist, and there's no consensus on which
           | ones are "best" (by whatever metric).
           | 
           | Wouldn't it make sense to implement them kernel-side when
           | looking at how every programming language seems to have to
           | reinvent the wheel regarding green threads?
        
             | Jtsummers wrote:
             | Green threads (today) aren't a singular thing, the
             | definition is that they're in user space not kernel space.
             | They are implemented in a variety of ways:
             | 
             | https://en.wikipedia.org/wiki/Green_thread
             | 
             | Do you imitate a more traditional OS-thread style with
             | preemption, do you use cooperating tasks, coroutines, what?
             | Since there is no singular _best_ or consensus model, there
             | is little reason for an OS to adopt wholesale one of these
             | variations at this time.
             | 
             | The original green threads (from that page) shared one OS
             | thread and used cooperative multitasking (most coroutine
             | approaches would be analogous to this). But today, like
             | with Go and BEAM languages, they're distributed across real
             | OS threads to get parallelism. Which approach should an OS
             | adopt? And if it did, would other languages/runtimes
             | abandon their own models if it were significantly
             | different?
        
               | smasher164 wrote:
               | Preemptive threads with growable stacks. There was some
               | discussion around getting segmented stacks into the
               | kernel, but I'm not sure that's the best approach. There
               | might have to be some novel work done in making
               | contiguous stacks work in a shared address space.
        
             | wtetzner wrote:
             | I think the reasons green threads can work in languages is
             | that the runtime understands the language semantics, and
             | can take advantage of them. The OS doesn't understand the
             | language and its concurrency semantics, and only has a blob
             | of machine code to work with.
        
               | smasher164 wrote:
               | Not really tbh. The Go runtime has a work-stealing
               | scheduler and does a lot of work to provide the same
               | abstractions that pthreads have, but for goroutines.
        
         | zozbot234 wrote:
         | > How long until OS vendors introduce abstractions to make this
         | easier?
         | 
         | The OS-level abstraction is called M:N threads. It has always
         | been supported by Java on Solaris. But it's not really popular
         | elsewhere.
        
       ___________________________________________________________________
       (page generated 2022-09-29 23:00 UTC)