[HN Gopher] Achieving 5M persistent connections with Project Loo...
       Achieving 5M persistent connections with Project Loom virtual
       Author : genzer
       Score  : 271 points
       Date   : 2022-04-30 08:07 UTC (14 hours ago)
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
       | deepsun wrote:
       | How does that compare to Kotlin suspend functions?
         | jillesvangurp wrote:
         | Loom will make a great backend for kotlin's co-routines. Roman
         | Elizarov (kotlin language lead & person who is behind Kotlin's
         | co-routine framework) has already confirmed that will happen
         | and it makes a lot of sense.
         | For those who don't understand this, Kotlin's co-routine
         | framework is designed to be language neutral and already works
         | on top the major platforms that have kotlin compilers (native,
         | javascript, jvm, and soon wasm). So, it doesn't really compete
         | with the "native" way of doing concurrent, aynchronous, or
         | parallel computing on any of those platforms but simply
         | abstracts the underlying functionality.
         | It's actually a multi platform library that implements all the
         | platform specific aspects in the platform appropriate way. It's
         | also very easy to adapt existing frameworks in this space via
         | Kotlin extension functions and the JVM implementation actually
         | ships out of the box with such functions for most common
         | solutions on the JVM for this (Java's threads, futures,
         | threadpools, etc., Spring Flux, RxJava, Vert.x, etc.). Loom
         | will be just another solution in this long list.
         | If you use Spring Boot with Kotlin for example, rather than
         | dealing with Spring's Flux, you simply define your asynchronous
         | resources as suspend functions. Spring does the rest.
         | With Kotlin-js in a browser you can call Promise.toCoroutine()
         | ans async { ... }.asPromise(). That makes it really easy to
         | write asynchronous event handling in a web application for
         | example or work with javascript APIs that expect promises from
         | Kotlin. And if you use web-compose, fritz2, or even react with
         | kotlin-js, anything asynchronous, you'd likely be dealing with
         | via some kind of co-routine and suspend functions.
         | Once Loom ships, it basically will enable some nice, low level
         | optimization to happen in the JVM implementation for co-
         | routines and there will likely be some new extension functions
         | to adapt the various new Java APIs for this. Not a big deal but
         | it will probably be nice for situations with extremely large
         | amounts of co-routines and IO. Not that it's particularly
         | struggling there of course but all little bits help. It's not
         | likely to require any code updates either. When the time comes,
         | simply update your jvm and co-routine library and you should be
         | good to go.
         | richdougherty wrote:
         | I made a comment about this above:
         | https://news.ycombinator.com/item?id=31218826
         | I won't repeat it all, but the main point is that having
         | runtime support is much better than relying on compiler
         | support, even if compiler support is pretty fantastic.
         | Note that the two aren't mutally exclusive, you should still be
         | able to use coroutines after Project Loom ships, and it still
         | might make sense in many places.
         | torginus wrote:
         | While I can't answer the question directly there is an article
         | about C#-s async/await vs Go's goroutines, which compare the
         | two approaches, and while some of the stuff is probably stack-
         | specific, a lot of it is probably intrinsic to the approach:
         | - Green threads scale somewhat better, but both scale
         | ridiculously well, meaning probably you won't run into scaling
         | issues.
         | - async/await generators use way less memory than a dedicated
         | green thread, this affects both memory consumption and startup
         | time, since the process has to run around asking the OS for
         | more memory
         | - green threads are faster to execute
         | Here's the link:
         | https://alexyakunin.medium.com/go-vs-c-part-1-goroutines-vs-...
       | Andrew_nenakhov wrote:
       | Sounds like a job for Erlang.
         | speed_spread wrote:
         | Sounds like Erlang's out of a job.
       | cheradenine_uk wrote:
       | I think a lot of people are missing the point.
       | Go look at the sourcecode. Look at how simple it is - anyone who
       | has created a thread with java knows what's happening. With only
       | minor tweaks, this means your pre-existing code can take
       | advantage of this with, basically, no effort. And it retains all
       | the debuggability of traditional java thread (I.e: a stack trace
       | that makes sense!)
       | If you've spent any time at all dealing with the horrors of c#
       | async/await (Why am I here? Oh, no idea) and it's doubling of
       | your APIs to support function colouring - or, you've fought with
       | the complexities of reactive solutions in the Java space --
       | often, frankly, in the name of "scalability" that will never be
       | practically required -- this is a big deal.
       | You no longer have to worry about any of that.
         | pjmlp wrote:
         | Or inserting the occasional Task.Run() calls, as means to
         | avoiding changing the whole call stack up to Main().
           | gavinray wrote:
           | This hasn't been that much of a problem, IME
           | If you decide somewhere deep in your program you want to use
           | async operations, most languages allow you to keep the
           | invoking function/closure synchronous and return some kind of
           | Promise/Future-like value
             | pjmlp wrote:
             | Which is exactly the workaround with Task.Run(), being able
             | to integrate a library written with async/await in
             | codebases older than the feature, where no one is paying
             | for a full rewrite.
         | SemanticStrengh wrote:
         | Except Kotlin coroutines already works, can be very easily
         | integrated in existing java codebases and are much superior
         | than loom (structured concurrency, flow, etc)
           | richdougherty wrote:
           | Kotlin coroutines are amazing. They're built on very clever
           | tech that converts fairly normal source code into a state
           | machine when compiled. This has huge benefits and allows the
           | programmer to break their code up without the hassle of
           | explicitly programming callbacks, etc.
           | https://kotlinlang.org/spec/asynchronous-programming-with-
           | co...
           | However... an unavoidable fact is that converted code works
           | differently to other code. The programmer needs to know the
           | difference. Normal and converted code compose together
           | differently. The Kotlin compiler and type system helps keep
           | track, but it can't paper over everything.
           | Having lightweight thread and continuations support directly
           | in the VM makes things very much simpler for programmers (and
           | compiler writers!) since the VM can handle the details of
           | suspending/resuming and code composes together effortlessly,
           | even without compiler support, so it works across languages
           | and codebases.
           | I don't want to be critical about Kotlin. It's amazing what
           | it achieves and I'm a big fan of this stuff. Here are some
           | notes I wrote on something similar, Scala's experiments with
           | compile-time delimited continuations:
           | https://rd.nz/2009/02/delimited-continuations-in-
           | scala_24.ht...
           | I think this is a general principle about compiler features
           | vs runtime features. Having things in the runtime makes life
           | a lot easier for everyone, at the cost of runtime complexity,
           | of course.
           | Another one I'd like to see is native support for tail calls
           | in Java. Kotlin, Scala, etc have to do compile-time tricks to
           | get basic tail call support, but it doesn't work across
           | functions well.
           | Scala and Kotlin both ask the programmer to add annotations
           | where tail calls are needed, since the code gen so often
           | fails.
           | https://kotlinlang.org/docs/functions.html#tail-recursive-
           | fu...
           | https://www.scala-
           | lang.org/api/3.x/scala/annotation/tailrec....
           | https://rd.nz/2009/04/tail-calls-tailrec-and-
           | trampolines.htm...
           | As a side note, I can see that tail calls are planned for
           | Project Loom too, but I haven't heard if that's implemented
           | yet. Does anyone know the status?
           | "Project Loom is to intended to explore, incubate and deliver
           | Java VM features and APIs built on top of them for the
           | purpose of supporting easy-to-use, high-throughput
           | lightweight concurrency and new programming models on the
           | Java platform. This is accomplished by the addition of the
           | following constructs:
           | * Virtual threads
           | * Delimited continuations
           | * Tail-call elimination"
           | https://wiki.openjdk.java.net/display/loom/Main
             | SemanticStrengh wrote:
             | Coroutines are _much less_ coloured than async await
             | programming though since functions returns resolved types
             | directly instead of futures. But yes there is the notion of
             | coroutine scope but I don 't see how to supress it without
             | making it less expressive.
             | Very few people know it but Oracle is developping an
             | alternative to Loom, in parallel.
             | https://github.com/oracle/graal/pull/4114
             | BTW i expect Kotlin coroutines to leverage loom eventually.
             | As for the tailrecursive keyword, it is not a constraint
             | but a feature since it guarantee at the type level that
             | this function cannot stack overflow. Few people know there
             | is an alternative to tailrecursive, that can make any
             | function stackoverflow safe by leveraging the heap via
             | continuations
             | https://kotlinlang.org/api/latest/jvm/stdlib/kotlin/-deep-
             | re...
             | As for Java, there is universal support for tail recursion
             | at the bytecode level https://github.com/Sipkab/jvm-tail-
             | recursion
               | ohgodplsno wrote:
               | > Coroutines are much less coloured than async await
               | programming though since functions returns resolved types
               | directly instead of futures
               | Only because the compiler does its magic behind the
               | scenes and transforms it into bytecode that takes a
               | lambda with a continuation. Try calling a suspend
               | function from java or starting a job and surprise, it's
               | continuations all the way down
               | SemanticStrengh wrote:
               | yes interfacing with java is generally made via RxJava
               | and reactor. Interfacing is easy but yes nobody wants to
               | use rxjava and reactor in the first place.. I wonder
               | wether loom will enable easier interop and make the magic
               | work from java side POV
               | gavinray wrote:
               | Thanks for posting that link to Java tail recursion
               | library, super handy + didn't know about it. You need
               | tail recursion for writing expression evaluators/visitors
               | frequently.
               | I've been using an IntelliJ extension that can do magic
               | by rewriting recursive functions to stateful stack-based
               | code for performance, but it spits out very ugly code:
               | https://github.com/andreisilviudragnea/remove-recursion-
               | insp...                 > "This inspection detects
               | methods containing recursive calls (not just tail
               | recursive calls) and removes the recursion from the
               | method body, while preserving the original semantics of
               | the code. However, the resulting code becomes rather
               | obfuscated if the control flow in the recursive method is
               | complex."
               | It was this guy's whole Bachelor thesis I guess:
               | https://github.com/andreisilviudragnea/remove-recursion-
               | insp...
         | bullen wrote:
         | Agreed it's simpler, but using NIO with one OS thread per core
         | also has it's benefits.
         | The context switch (how ever small) will cause latency when
         | this solution is at saturation.
         | I think they should write four tests: fiber, NIO and each with
         | userspace networking (no kernel copying network memory) and
         | compare them.
         | Why Oracle is stalling removing the kernel for Java networking
         | is surprising to me, they allready have a VM.
           | blibble wrote:
           | there's still a context switch with NIO, you're just doing it
           | manually
           | pron wrote:
           | https://github.com/ebarlas/project-loom-comparison
             | vlovich123 wrote:
             | Shouldn't you be able to send authorization and
             | authentication requests in parallel in the async and
             | virtual threads cases?
               | threeseed wrote:
               | It is just an example so they could do anything.
               | But in the real world it is common to need information
               | from the authorization stage to use in the authentication
               | stage. For example you may have a user login with an
               | email address/password which you then pass to an LDAP
               | server in order to get a userId. This userId is then used
               | in a database to determine with objects/groups they have
               | access to.
       | the8472 wrote:
       | net.netfilter.nf_conntrack_buckets = 1966050
       | net.netfilter.nf_conntrack_max = 7864200
       | or avoid conntrack entirely
         | LinuxBender wrote:
         | For completeness sake I would add that one must also set
         | options nf_conntrack expect_hashsize=X hashsize=X
         | in /etc/modules.d/nf_conntrack.conf, X being 1/4 the size of
         | conntrack_max
       | metabrew wrote:
       | API for the server example looks... actually good, wow. Nice job!
       | Also tickled to see my erlang 1M comet blog post referenced. A
       | lifetime ago now, pre-websockets.
       | alberth wrote:
       | Is this a test of just having 5M people knock on your door?
       | Or is this a test where something actually happens (data
       | exchanges) with each connection?
       | I ask because those are two totally different workloads and
       | typically where in the later test Erlang shines.
         | bufferoverflow wrote:
         | It's an echo server. The client sends the data, the server
         | responds with the same data.
       | newskfm wrote:
       | sgtnoodle wrote:
       | I'm not a java programmer. I tried clicking 3 layers deep of
       | links, but still have no idea what virtual threads are in this
       | context. Is it a userspace thread implementation?
       | I've used explicit context switching syscalls to "mock out"
       | embedded real time OS task switching APIs. It's pretty fun and
       | useful. The context switching itself may not be any faster than
       | if the kernel does it, but the fact that it's synchronous to your
       | program flow means that you don't have to spend any overhead
       | synchronizing to mutexes, queues, etc. (You still have them, they
       | just don't have to be thread safe.)
         | grishka wrote:
         | > Is it a userspace thread implementation?
         | Yes.
       | zinxq wrote:
       | Loom sets out to give you a sane programming paradigm similar to
       | what threads do (i.e. as opposed to programming asynchronous I/O
       | in Java with some type of callback) without the overhead of
       | Operating System threads.
       | That's a very cool and a noble pursuit. But the title of this
       | article might as well have been "5M persistent connections with
       | Linux" because that's where the magic 5M connections happen.
       | I could also attempt 5M connections at the Java level using Netty
       | and asynchronous IO - no threads or Loom. Again, it'd take more
       | Linux configuration than anything else. If that configuration did
       | happen though now you can also do it in C# async/await,
       | javascript, I'm sure Erlang and anything else that does
       | Asynchronous I/O whether it's masked by something like
       | Loom/Async/Await or not.
         | simulate-me wrote:
         | As the GP said, what's cool about this is how simple the code
         | is. You might be able to achieve 5M connections in Java using
         | an event loop based solution (eg Netty), but if the connection
         | handlers need to do any async work, then they also need to be
         | written using an event loop, which is not how most people write
         | Java. Simply put, 5M connections was not possible using Java in
         | the way most people write Java.
         | [deleted]
         | pron wrote:
         | It is true that the experiment exercises the OS, but that's
         | only _part_ of the point. The other part is that it uses a
         | simple, blocking, thread-per-request model with Java 1.0
         | networking APIs. So this is  "achieving 5M persistent
         | connections with (essentially) 26-year-old code that's fully
         | debuggable and observable by the platform." This stresses both
         | the OS and the Java runtime.
         | So while you could achieve 5M in other ways, those ways would
         | not only be more complex, but also not really
         | observable/debuggable by Java platform tools.
           | cheradenine_uk wrote:
           | This.
           | Writing the sort of applications that I get involved with,
           | it's frequently the case whilst it's true that 1 OS
           | thread/java thread was a theoretical scalability limitation -
           | in practice we were never likely to hit it (and there was
           | always the 'get a bigger computer').
           | But: the complexity mavens inside our company and projects we
           | rely upon get bitten by an obsessive need to chase
           | 'scalability' /at all costs/. Which is fine, but the downside
           | to that is the negative consequences of coloured functions
           | comes into play. We end up suffering having to deal with
           | vert.x or kotlin or whatever flavour-of-the-month solution is
           | that is /inherently/ harder to reason about than a linear
           | piece of code. If you're in a c# project, the you get a
           | library that's async, and boom, game over.
           | If loom gets even within performance shouting distance of
           | those other models, it's ought to kill (for all but the
           | edgiest of edge-cases) reactive programming in the java space
           | dead. You might be able to make a case - obviously depending
           | on your use cases which are not mine - that extracting, say,
           | 50% more scalability is worth the downsides. If that number
           | is, say, 5%, then for the vast majority of projects the
           | answer is going to be 'no'.
           | I say 'ought to', as I fear the adage that "developers love
           | complexity the way moths love flames - and often with the
           | same results". I see both engineers and projects (Hibernate
           | and keycloak, IIRC) have a great deal of themselves invested
           | in their Rx position, and I already sense that they're not
           | going to give it up without a fight.
           | So: the headline number is less important than "for virtually
           | everyone you will no longer have to trade simplicity for
           | scalability". I can't wait!
             | amluto wrote:
             | Threads (whether lightweight or heavyweight) can't fully
             | replace reactive/proactive/async programming even ignoring
             | performance and scalability. Sometimes network code simply
             | needs to wait for more than one event as a matter of
             | functionality. For example, a program might need to handle
             | the availability of outgoing buffer space and _also_ handle
             | the availability of incoming data. And it might also need
             | to handle completion of a database query or incoming data
             | on a separate connection. Sure, using extra threads might
             | do it, but it's awkward.
               | pron wrote:
               | > Sure, using extra threads might do it, but it's
               | awkward.
               | It's simpler and nicer, actually -- and definitely offers
               | better tooling and observability -- especially with
               | structured concurrency: https://download.java.net/java/ea
               | rly_access/loom/docs/api/jd...
             | mike_hearn wrote:
             | A couple of points to consider.
             | 1. Demanding scalability for inappropriate projects and at
             | any cost is something I've seen too, and on investigation
             | it was usually related to former battle scars. A software
             | system that stops scaling at the wrong time can be horrific
             | for the business. Some of them never recover, the canonical
             | example being MySpace, but I've heard of other examples
             | that were less public. In finance entire multi-year IT
             | projects by huge teams have failed and had to be scrapped
             | because they didn't scale to even current business needs,
             | let alone future needs. Emergency projects to make
             | something "scale" because new customers have been on-
             | boarded, or business requirements changed, are the sort of
             | thing nobody wants to get caught up in. Over time these
             | people graduate into senior management where they become
             | architects who react to those bad experiences by insisting
             | on making scalability a checkbox to tick.
             | Of course there's also trying to make easy projects more
             | challenging, resume-driven development etc too. It's not
             | just that. But that's one way it can happen.
             | 2. Rx type models aren't just about the cost of threads. An
             | abstraction over a stream of events is useful in many
             | contexts, for example, single-threaded GUIs.
               | cheradenine_uk wrote:
               | I think my point is more that you end up having to pay
               | the costs (of Rx-style APIs) whether you need the
               | scalability or not, because the libraries end up going
               | down that route. This has sometimes felt that I'm being
               | forced to do work in order to satisfy the fringe needs of
               | some other project!
               | And sure, if you are living in a single-threaded
               | environment, your choices are somewhat limited. I,
               | personally, dislike front-end programming for exactly
               | that reason - things like RxJS feel hideously
               | overcomplicated to me. My guess is that most, though not
               | all, will much prefer the loom-style threading over
               | async/await given free choice.
               | lostcolony wrote:
               | One additional - as noted, it's been 26 years since
               | Java's founding. Project Loom has been around since at
               | least 2018 and still has no release date. It'll be cool
               | for Java projects whenever it comes out, but I
               | just...have a hard time caring right now. I can't use it
               | for old codebases currently, and new codebases I'm not
               | using one request per Java thread anyway (tbh - when it's
               | my choice I'm not choosing the JVM at all). The space has
               | moved, and continues to move. In no way to say the JVM
               | shouldn't be adopting the good ideas that come along the
               | way, that is one of the benefits of being as conservative
               | and glacial in adoption as it is, but I just...don't get
               | excited about them, or find myself in any position in
               | relation to the JVM (Java specifically, but the
               | fundamentals affect other languages) other than "ugh,
               | this again".
               | chrisseaton wrote:
               | > I'm not using one request per Java thread anyway
               | The point is with Loom you can, and you can stop putting
               | everything into a continuation and go back to straight-
               | line code.
               | lostcolony wrote:
               | >> The point is with Loom you can
               | The point I was making is that Loom isn't released,
               | stable, production ready, supported, etc, and there's no
               | still no date when it's supposed to be, so what you can
               | do with Loom in no way affects what I can do with a
               | production codebase, either new or legacy. I'm not sure
               | how you missed that from my post.
               | I'm not defending reactive programming on the JVM. I'm
               | also not defending threads as units of concurrency. I'm
               | saying I can get the benefits of Project Loom -right
               | now-, in production ready languages/libraries, outside of
               | the JVM, and I can't reasonably pick Project Loom if I
               | want something stable and supported by its creators.
               | pron wrote:
               | > and there's no still no date when it's supposed to be
               | September 20 (in Preview)
               | > I'm saying I can get the benefits of Project Loom
               | -right now-, in production ready languages/libraries,
               | outside of the JVM
               | Only sort-of. The only languages offering something
               | similar in terms of programming model are Erlang
               | (/Elixir) and Go -- both inspired virtual threads. But
               | Erlang doesn't offer similar performance, and Go doesn't
               | offer similar observbility. Neither offers the same
               | popularity.
               | lostcolony wrote:
               | I'm not saying there aren't tradeoffs, just that if I
               | need the benefits of virtual threads...I have other
               | options. I'm all for this landing on the JVM, mainly so
               | that non-Java languages there can take advantage of it
               | rather than the hoops they currently have to jump through
               | to offer a saner concurrency model, but that until it
               | does...don't care. And last I saw this feature is
               | proposed to land in preview in JDK19; not that it would,
               | and...it's still preview. Meaning the soonest we can
               | expect to see this safely available to production code is
               | next year (preview in Java is a bit weird, admittedly.
               | "This is not experimental but we can change any part of
               | it or remove it for future versions depending how things
               | go" was basically my take on it when I looked in the
               | past).
               | Meanwhile, as you say, Erlang/Elixir gives me this model
               | with 35+ years of history behind it (and no
               | libraries/frameworks in use trying to provide me a leaky
               | abstraction of something 'better'), better observability
               | than the JVM, a safer memory model for concurrent code, a
               | better model for reliability, with the main issue being
               | the CPU hit (less of a concern for IO bound workloads,
               | which is where this kind of concurrency is generally
               | impactful anyway). Go has reduced observability than
               | Java, sure, but a number of other tradeoffs I personally
               | prefer (not least of all because in most of the Java
               | shops I was in, I was the one most familiar with
               | profiling and debugging Java. The tools are there, the
               | experience amongst the average Java developer isn't), and
               | will also be releasing twice between now and next year.
               | Again, I'm not saying virtual threads from Loom aren't
               | cool (in fact, I said they were; the technical
               | achievement of making it a drop in replacement is itself
               | incredible), or that it wouldn't be useful when it
               | releases for those choosing Java, stuck with Java due to
               | legacy reasons, or using a JVM language that is now able
               | to migrate to take advantage of this to remove some of
               | the impedance mismatch between their concurrency model(s)
               | and Java's threading and the resulting caveats. Just that
               | I don't care until it does (because I've been hearing
               | about it for the past 4 years), it still doesn't put it
               | on par with the models other languages have adopted
               | (memory model matters to me quite a bit since I tend to
               | care about correct behavior under load more than raw
               | performance numbers; that said, of course, nothing is
               | preventing people from adopting safer practices
               | there...just like nothing has been in years previous.
               | They just...haven't), nor do I care about the claims
               | people make about it displacing X, Y, or Z. It probably
               | will for new code! Whenever it gets fully supported in
               | production. But there's still all that legacy code
               | written over the past two decades using libraries and
               | frameworks built to work around Java's initial 1:1
               | threading model, and which simply due to calling
               | conventions and architecture (i.e., reactive and etc)
               | would have to be rewritten, which probably won't happen
               | due to the reality of production projects, even if there
               | were clear gains in doing so (which as the great-
               | grandparent mentions, is not nearly so clearcut).
               | namdnay wrote:
               | And hopefully we can bury Reactor Core in the garden and
               | never talk about it again
               | Scarbutt wrote:
               | What has the space move to?
               | pron wrote:
               | > and still has no release date
               | JEP 425 has been proposed to target JDK 19, out September
               | 20. It will first be a "Preview" feature, which means
               | supported but subject to change, and if all goes well
               | would normally be out of Preview two releases, i.e. one
               | year, after that.
               | > I'm not using one request per Java thread anyway
               | You don't have to, but not that _only_ the thread-per-
               | request model offers you world-class observability
               | /debuggability.
               | > other than "ugh, this again".
               | Ok, although in 2022, the Java platform is still among
               | the most technologically advanced, state-of-the art,
               | software plarform out there. It stands shoulder to
               | shoulder with clang and V8 on compilation, and beats
               | everything else on GC and low-overhead observability
               | (yes, even eBPF).
           | zinxq wrote:
           | I think we're in agreement. Ignoring under the hood - Loom's
           | programming paradigm (from the viewpoint of control flow) is
           | the Threading programming paradigm. (Virtual)Thread-per-
           | connection programming is easier and far more intuitive than
           | asynchronous (i.e. callback-esque) programming.
           | I still attest though - The 5M connections in this example is
           | still a red herring.
           | Can we get to 6M? Can we get to 10M? Is that a question for
           | Loom or Java's asynchronous IO system? No - it's a question
           | for the operating system.
           | Loom and Java NIO can handle probably a billion connections
           | as programmed. Java Threads cannot - although that too is a
           | broken statement. "Linux Threads cannot" is the real
           | statement. You can't have that many for resource reasons.
           | Java Threads are just a thin abstraction on top of that.
           | Linux out of the box can't do 5M connections (last I
           | checked). It takes Linux tuning artistry to get it there.
           | Don't get me wrong - I think Loom is cool. It's attempted to
           | do the same thing as Async/Await tried - just better. But it
           | is most definitely not the only way to achieve 5MM
           | connections with Java or anything else. Possibly however,
           | it's the most friendly and intuitive way to do it.
           | *We typically vilify Java Threads for the Ram they consume.
           | Something like 1M per thread or something (tunable). Loom
           | must still use "some" ram per connection although surely far
           | far less (and of course Linux must use some amount of kernel
           | ram per connection too).
             | pron wrote:
             | > But it is most definitely not the only way to achieve 5MM
             | connections with Java or anything else. Possibly however,
             | it's the most friendly and intuitive way to do it.
             | It is the only way to achieve that many connections with
             | Java in a way that's debuggable and observable by the
             | platform and its tools, regardless of its intuitiveness or
             | friendliness to human programmers. It's important to
             | understand that this is an objective technical difference,
             | and one of the cornerstones of the project. Computations
             | that are composed in the asynchronous style are invisible
             | to the runtime. Your server could be overloaded with I/O,
             | and yet your profile will show idle thread pools.
             | Virtual threads don't just allow you to write something you
             | could do anyway in some other way. They actually do work
             | that has simply been impossible so far at that scale: they
             | allow the runtime and its tools to understand how your
             | program is composed and observe it at runtime in a
             | meaningful and helpful way.
             | One of the main reasons so many companies turn to Java for
             | their most important server-side applications is that it
             | offers unmatched observability into what the program is
             | doing (at least among other languages/platforms with
             | similar performance). But that ability was missing for
             | high-scale concurrency. Virtual threads add it to the
             | platform.
             | mike_hearn wrote:
             | I don't quite follow your argument.
             | Saying "Linux cannot handle 5M connections with one thread
             | per connection" isn't a reasonable statement because no
             | operating system can do that, they can't even get close.
             | The resource usage of a kernel thread is defined by pretty
             | fundamental limits in operating system architecture,
             | namely, that the kernel doesn't know anything about the
             | software using the thread. Any general purpose kernel will
             | be unable to provision userspace with that many threads
             | without consuming infeasible quantities of RAM.
             | The reason JVM virtual threads can do this is because the
             | JVM has deep control and understanding of the stack and the
             | heap (it compiled all the code). The reason Loom
             | scalability gets worse if you call into native code is that
             | then you're back to not controlling the stack.
             | Getting to 10M is therefore very much a question for the
             | JVM as well as the operating system. It'll be heavily
             | affected by GC performance with huge heaps, which luckily
             | modern G1 excels at, it'll be affected by the performance
             | of the JVM's userspace schedulers (ForkJoinPool etc), it'll
             | be affected by the JVM's internal book-keeping logic and
             | many other things. It stresses every level of the stack.
       | pron wrote:
       | For more information about virtual threads see
       | https://openjdk.java.net/jeps/425 (planned to preview in JDK 19,
       | out this September).
       | What's remarkable about this experiment is that it uses simple
       | 26-year-old (Java 1.0) networking APIs.
       | midislack wrote:
       | I see a lot of these making the FP of HN. But it's very difficult
       | to be impressed, or unimpressed because it's all about hardware.
       | How much hardware is everybody throwing at all of this? 5M
       | persistent connections on a Pi with mere GigE? Pretty frickin'
       | amazing. 5M persistent connections on a Threadripper with 128
       | cores and a dozen trunked 4 port 10GE NICs? Yaaaaawwwnnn snooze.
       | We need a standardized computer for benchmarking these types of
       | claims. I propose the RasPi 4 4GB model. Everybody can find one,
       | all the hardware's soldered on so no cheating is really possible,
       | etc. Then we can really shoot for efficiency.
         | shadowpho wrote:
         | Raspberry pi 4 performance changes wildly based on cooling.
         | Bare die vs heatsink vs heatsink + fan will give you wildly
         | different results.
           | midislack wrote:
           | Same is true with any computer these days. So let's go no
           | heat sink, Pi 4 4GB anyway.
       | KingOfCoders wrote:
       | Something to learn for everybody, the article is mainly about
       | Linux tuning.
         | jeroenhd wrote:
         | The Linux tuning part seems to have been inspired by these blog
         | posts from 14 years ago:
         | https://www.metabrew.com/article/a-million-user-comet-applic...
         | It's almost a little disappointing that beefy modern servers
         | only manage a x5 scale improvement, though that could be due to
         | the differences in runtime behaviour between Erlang and the
         | JVM.
       | wiradikusuma wrote:
       | The experiment is about Java app, but the tweaks are at the O/S
       | level. Does it mean any app (Java/not, Loom/not) can achieve
       | target given correct tweak?
       | Also, why are these not default for the O/S? What are we
       | compromising by setting those values?
         | mike_hearn wrote:
         | No, it doesn't. The reason the tweaks are at the OS level is
         | because, apparently, Loom-enabled JVMs already scale up to that
         | level without needing any tuning. But if you try that in C++
         | you're going to die very quickly.
           | pjmlp wrote:
           | With C++ co-routines and a runtime like HPX, not really.
           | However there are other reasons why a C++ applications
           | connected to the internet might indeed die faster than a Java
           | one.
           | gpderetta wrote:
           | There have been userspace thread libraries for c++ for
           | decades.
             | yosefk wrote:
             | Sure, I wrote some myself. Q is what libraries you can use
             | on top of the userspace thread package that are aware of
             | the userspace threads rather than just using OS APIs and
             | thus eg blocking the current OS thread.
               | gpderetta wrote:
               | There are .so interposition tricks that can be used for
               | that.
               | I think Pth used to do that for example.
               | yosefk wrote:
               | Could you elaborate?
         | toast0 wrote:
         | You need both your operating system and your application
         | environment need to be up to the task. I'd expect most
         | operating systems to be up to the task; although it might need
         | settings set. Some of the settings are things that are
         | statically allocated in non-swappable memory and you don't want
         | to waste memory on being able to to have 5M sockets open if you
         | never go over 10k. Often you'll want to reduce socket buffers
         | from defaults, which will reduce throughput per socket, but
         | target throughput per socket is likely low or you wouldn't want
         | to cram so many connections per client. You may need to
         | increase the size of the connection table and the hash used for
         | it as well; again, it wastes non-swappable ram to have it too
         | big if you won't use it.
         | For application level, it's going to depend on how you handle
         | concurrency. This post is interesting, because it's a benchmark
         | of a different way to do it in Java. You could probably do 5M
         | connections in regular Java through some explicit event loop
         | structure; but with the Loom preview, you can do it connection
         | per Thread. You would be unlikely to do it with connection per
         | Thread without Loom, since Linux threads are very unlikely to
         | scale so high (but I'd be happy to read a report showing 5M
         | Linux threads)
         | jiggawatts wrote:
         | There's always trade-offs. It would be very rare for any server
         | to reach even 100K concurrent connections, let alone 5M.
         | Optimising for that would be optimising for the 0.000001% case
         | at the expense of the common case.
         | Some back of the envelope maths:
         | https://www.wolframalpha.com/input?i=100+Gbps+%2F+5+million
         | If the server had a 100 Gbps Ethernet NIC, this would leave
         | just 20 kbps for each TCP connection.
         | I could imagine some IoT scenarios where this _might_ be a
         | useful thing, but outside of that? I doubt there 's anyone that
         | wants 20 kbps throughput in this day and age...
         | It's a good stress test however to squeeze out inefficiencies,
         | super-linear scaling issues, etc...
           | jeroenhd wrote:
           | 20kbps should be sufficient for things like chat apps if you
           | have the CPU power to actually process chat messages like
           | that. Modern apps also require attachments and those will
           | require more bandwidth, but for the core messaging
           | infrastructure without backfilling a message history I think
           | 20kbps should be sufficient. Chat apps are bursty, after all,
           | leaving you with more than just the average connection speed
           | in practice.
             | henrydark wrote:
             | I have a memory of some chat site, maybe discord, sending
             | attachments to a different server, thus exchanging the
             | bandwidth problem with extra system complexity
               | jeroenhd wrote:
               | That's how I'd solve the problem. The added complexity
               | isn't even that high, give the application an endpoint to
               | push an attachment into a distributed object store of
               | your choice, submit a message with a reference to the
               | object and persist it the moment the chat message was
               | sent. This could be done with mere bytes for the message
               | itself and some very dumb anycast-to-s3 services in
               | different data centers.
               | I'm sure I'm skipping over tons of complexity here (HTTP
               | keepalives binding clients to a single attachment host
               | for example) because I'm no chat app developer, but the
               | theoretical complexity is still relatively low.
           | Koffiepoeder wrote:
           | Open, idle websockets can be a use case for a large amount of
           | tcp connections with a small data footprint.
             | jeffbee wrote:
             | Also IMAP has this unfortunate property.
       | wiseowise wrote:
       | And how is that any different from Kotlin coroutines if you still
       | need to call Thread.startVirtualThread?
         | pjmlp wrote:
         | Native VM support instead an additional library faking it, and
         | filling .class files with needless boilerplate.
         | ferdowsi wrote:
         | Kotlin coroutines are colored and infect your whole codebase.
         | Virtual threads do not.
         | pron wrote:
         | 1. These are actual threads from the Java runtime's
         | perspective. You can step through them and profile them with
         | existing debuggers and profilers. They maintain stacktraces and
         | ThreadLocals just like platform threads.
         | 2. There is no need for a split world of APIs, some designed
         | for threads and others for coroutines (so-called "function
         | colouring"). Existing APIs, third-party libraries, and programs
         | -- even those dating back to Java 1.0 (just as this experiment
         | does with Java 1.0's java.net.ServerSocket) -- just work on
         | millions of virtual threads.
         | Normally, you wouldn't even call Thread.startVirtualThread(),
         | but just replace your platform-thread-pool-based
         | ExecutorService with an ExecutorService that spawns a new
         | virtual thread for each task
         | (Executors.newVirtualThreadPerTaskExecutor()). For more
         | details, see the JEP: https://openjdk.java.net/jeps/425
       | imranhou wrote:
       | It looks more closer to go routines, which to me begs the
       | question - where are the channels that I could use to communicate
       | between these virtual threads?
         | sdfgdfgbsdfg wrote:
         | In a library. Loom is more about adapting the JVM itself for
         | continuations and virtual threads than adding to userspace.
         | [deleted]
         | adra wrote:
         | Go's channels are simplistically a mutex in front of a queue.
         | Java has many existing objects that can do the same, it's just
         | that's not idiomatic best choice to do the same. Since green
         | threads should wake up from Object.notify(), any threads
         | blocking on the monitor should wake/consume. I'm curious how
         | scalable/performance a green thread ConcurrentDequeue would
         | stand up to go's channel.
           | Matthias247 wrote:
           | You are right. But Go Channels come also with the superpower
           | of ,,select", which allows to wait for multiple objects to
           | become ready and atomic execution of actions. I don't think
           | this part can be retrofitted on top of simple BlockingQueues.
             | sdfgdfgbsdfg wrote:
             | pron talks about this on https://cr.openjdk.java.net/~rpres
             | sler/loom/loom/sol1_part2....
       | christophilus wrote:
       | Loom looks like it's nicely solved the function coloring problem.
       | This plus Graal makes me excited to pick up Clojure again.
       | invalidname wrote:
       | This is pretty fantastic!
       | I'm very excited about the possibilities of Loom. Would love to
       | have a more realistic sample with Spring Boot that would
       | demonstrate the real world scale. I saw a few but nothing
       | remotely as ambitious as that.
         | isbvhodnvemrwvn wrote:
         | Spring Boot overhead would likely make that infeasible.
           | RhodesianHunter wrote:
           | Spring boot overhead is largely in startup time. It really
           | doesn't have much overhead there after.
           | It's largely a collection of the same libraries you would use
           | anyways glued together with a custom di system.
           | invalidname wrote:
           | I'm not saying 5M. I just want to see to what scale it would
           | get without threading issues. Spring Boot isn't THAT heavy.
       | nelsonic wrote:
       | Reminds of https://phoenixframework.org/blog/the-road-
       | to-2-million-webs... Would love to see this extended to more
       | Languages/Frameworks.
         | mike_hearn wrote:
         | In theory once Graal adds support for it, any Graal/Truffle-
         | compatible language can benefit.
         | IMHO it's only JVM+Graal that can bring this to other
         | languages. Loom relies very heavily on some fairly unique
         | aspects of the Java ecosystem (Go has these things too though).
         | One is that lots of important bits of code are implemented in
         | pure Java, like the IO and SSL stacks. Most languages rely
         | heavily on FFI to C libraries. That's especially true of
         | dynamic scripting languages but is also true of things like
         | Rust. The Java world has more of a culture of writing their own
         | implementations of things.
         | For the Loom approach to work you need:
         | a. Very tight and difficult integration between the compiler,
         | threading subsystem and garbage collector.
         | b. The compiler/runtime to control all code being used. The
         | moment you cross the FFI into code generated by another
         | compiler (i.e. a native library) you have to pin the thread and
         | the scalability degrades or is lost completely.
         | But! Graal has a trick up its sleeve. It can JIT compile lots
         | of languages, and those languages can call into each other
         | without a classical FFI. Instead the compiler sees both call
         | site and destination site, and can inline them together to
         | optimize as one. Moreover those languages include binary
         | languages like LLVM bitcode and WASM. In turn that means that
         | e.g. Python calling into a C extension can still work, because
         | the C extension will be compiled to LLVM bitcode and then the
         | JVM will take over from there. So there's one compiler for the
         | entire process, even when mixing code from multiple languages.
         | That's what Loom needs.
         | At least in theory. Perhaps pron will contradict me here
         | because I have a feeling Loom also needs the invariant that
         | there are no pointers into the stack. True for most languages
         | but not once C gets involved. I don't know to what extent you
         | could "fix" C programs at the compiler level to respect that
         | invariant, even if you have LLVM bitcode. But at least the one-
         | compiler aspect is not getting in the way.
           | kaba0 wrote:
           | With Truffle you have to map your language's semantics to
           | java ones. I am unfortunately out of my depth on the details,
           | but my guess would be that LLVM operates here with this in
           | mind in a completely safe way (I guess pointers to the stack
           | are not safe) so presumably it should work for these as well.
             | mike_hearn wrote:
             | Not exactly, no. That's the whole point of Truffle and why
             | it's such a big leap forward. You do _not_ map your
             | language 's semantics to Java semantics. You can implement
             | them on top of the JVM but bypassing Java bytecode. Your
             | language doesn't even have to be garbage collected, and
             | LLVM bitcode isn't (unless you use the enterprise version
             | which adds support for automatically converting C/C++ to
             | memory safe GCd code!).
             | So - C code running on the JVM via Sulong keeps C/C++
             | semantics. That probably means you can build pointers into
             | the stack, and then I don't know what Loom would do. Right
             | now they aren't integrated so I guess that's a research
             | question.
         | bkolobara wrote:
         | With lunatic [0] we are trying to bring this to all languages
         | that compile to WebAssembly. A few days ago I wrote about our
         | journey of bringing it to Rust:
         | https://lunatic.solutions/blog/writing-rust-the-elixir-way-1...
         | [0]: https://github.com/lunatic-solutions/lunatic
       | TYMorningCoffee wrote:
       | I was only able to get to 840,000 open connections with my
       | experiment. My machine only has 8GB of memory.
       | https://josephmate.github.io/2022-04-14-max-connections/
       | Is there anyway for the TCP connections share memory in kernel
       | space? My experiment only uses two 8 byte buffers in userspace.
         | toast0 wrote:
         | Does Linux actually allocate buffers for each socket or does it
         | just link to sk_buff's (which I understand are similar to
         | FreeBSD's mbuf's) and then limit how much storage can be
         | linked? FreeBSD has a limit on the total ram used for mbufs as
         | well, not sure about Linux.
         | Otoh, FreeBSD's maximum FD limit is set as a factor of total
         | memory pages (edit: looked it up, it's in
         | sys/kern/subr_param.c, the limit is one FD per four pages,
         | unless you edit kernel source) and you've got 2M pages with 8GB
         | ram, so you would be limited to 512k FDs total, and if you're
         | running the client on the same machine as server, that's 256k
         | connections. But 8G is not much for a server, and some phones
         | have more than that... so it's not super limiting.
         | When you're really not doing much with the connections,
         | userland tcp as suggest in a sibling, could help you squeeze in
         | more connections, but if you're going to actually do work, you
         | probably need more ram.
         | Btw, as a former WhatsApp server engineer, WhatsApp listens on
         | three ports; 80, 443, and 5222. Not that that makes a
         | significant difference in the content.
         | mh- wrote:
         | no*, and as you've discovered, the skbufs allocated by the
         | kernel will often be the limiting factor for a highly
         | concurrent socket server on linux.
         | * I don't know if someone has created some experimental
         | implementation somewhere. It would require a significant
         | overhaul of the TCP implementation in the kernel.
         | edit: check out this sibling thread about userland TCP. I think
         | this is a more interesting/likely direction to explore in.
         | https://news.ycombinator.com/item?id=31215569
       | 10000truths wrote:
       | A bit of a digression, but I'd love to see how much further one
       | could go with a memory-optimized userland TCP stack, and storing
       | the send and receive buffers on disk.
       | A TCP connection state machine consists of a few variables to
       | keep track of sequence numbers and congestion control parameters
       | (no more than 100-200 bytes total), plus the space for
       | send/receive buffers.
       | A 4 TB SSD would fit ~125 million 16-KB buffer pairs, and 125
       | million 256-byte structs would take up only 32 GB of memory. In
       | theory, handling 100 million simultaneous connections on a single
       | machine is totally doable. Of course, the per-connection
       | throughput would be complete doodoo even with the best NICs, but
       | it would still be a monumental yet achievable milestone.
         | mike_hearn wrote:
         | Presumably at 100M simultaneous connections the machine CPU
         | would be saturated with setting up and closing them, without
         | getting much actual work done. TCP connections seem too fragile
         | to make it worth trying to keep them open for really long
         | periods.
         | It's interesting to think about though, I agree. What are the
         | next scaling bottlenecks now (for JVM compatible languages)
         | threading is nearly solved?
         | There are some obvious ones. Others in the thread have pointed
         | out network bandwidth. Some use cases don't need much bandwidth
         | but do need intense routability of data between connections,
         | like chat apps, and it seems ideal for those. Still, you're
         | going to face other problems:
         | 1. If that process is restarted for any reason that's a _lot_
         | of clients that get disrupted. JVMs are quite good at hot-
         | reloading code on the fly, so it 's not inherently the case
         | that this is problematic because you could make restarts very
         | rare. But it's still a problem.
         | 2. Your CPU may be sufficient for the steady state but on
         | restart the clients will all try to reconnect at once. Adding
         | jitter doesn't really solve the issue, as users will still have
         | to wait. Handling 5M connections is great unless it takes a
         | long time to reach that level of connectivity and you are
         | depending on it.
         | 3. TCP is rarely used alone now, it usually comes with SSL.
         | Doing SSL handshakes is more expensive than setting up a TCP
         | connection (probably!). Do you need to use something like QUIC
         | instead? Or can you offload that to the NIC making this a non-
         | issue? I don't know. BTW the Java SSL stack is written in Java
         | itself so it's fully Loom compatible.
           | natdempk wrote:
           | It depends on what you do, but I think GC/memory pressure can
           | become an issue rather quickly with the default programming
           | models Java leads you towards. I end up seeing this a lot in
           | somewhat high throughput services/workers I own where
           | fetching a lot of data to handle requests and discarding it
           | afterwards leads to a lot of GC time. Curious if anyone has
           | any sage advice on this front.
           | toast0 wrote:
           | You're totally spot on that connection establishment is much
           | more challenging than steady state; with TLS or just TCP.
           | I don't think QUIC helps with that at all. Afaik, QUIC is all
           | userland, so you'd skip kernel processing, but that doesn't
           | really make establishment cheaper. And TCP+TLS establishes
           | the connection before doing crypto, so that saves effort on
           | spoofing (otoh, it increases the round trips, so pick your
           | tradeoffs).
           | One nice thing about TCP though is it's trivial to determine
           | if packets are establishing or connected; you can easily drop
           | incoming SYNs when CPU is saturated to put back pressure on
           | clients. That will work enough when crypto setup is the issue
           | as well. Operating systems will essentially do this for you
           | if you get behind on accepting on your listen sockets. (Edit)
           | syncookies help somewhat if your system gets overwelmed and
           | can't keep state for all of them half-established
           | connections, although not without tradeoffs.
           | In the before times, accelerator cards for TLS handshakes
           | were common (or at least available), but I think current NIC
           | acceleration is mainly the bulk ciphering which IMHO is more
           | useful for sending files than sending small data that I'd
           | expect in a large connection count machine. With file
           | sending, having the CPU do bulk ciphers is a RAM bottleneck:
           | the CPU needs to read the data, cipher it, and write to RAM
           | then tell the NIC to send it; if the NIC can do the bulk
           | cipher that's a read and write omitted. If it's chat data,
           | the CPU probably was already processing it, so a few cycles
           | with AES instructions to cipher it before sending it to send
           | buffers is not very expensive.
           | charcircuit wrote:
           | I think you meant to say TLS. Not SSL.
           | adra wrote:
           | I'm pretty sure the exercise was to show the absolute
           | extremes that could be achieved in a toy application and
           | possibly how easy one could achieve some level of IO blocking
           | scaling that has been harder than most other tasks in java of
           | late. More and more, heap allocations are cheaper, often with
           | sub-milli collector locks, CPU scaling has more to do with
           | what you're doing instead of the platform, but java have
           | enough tools to make your application fast.
           | For extremely IO wait bound workloads though, there was
           | always a LOT if hoops to jump through to make performance
           | strong since OS threads always have a notable stack memory
           | footprint that just doesn't scale well when you could have
           | thousands of OS threads waiting around just taking up RAM.
         | toast0 wrote:
         | It's easy to just get 4TB of ram if that's what you need; I
         | haven't scoped out what you can shove into a cheap off the
         | shelf server these days, but I'd guess around 16TB before you
         | need to get fancy servers (Edit: maybe 8TB is more realistic
         | after looking at SuperMicro's 'Ultra' servers). I think you'd
         | need a very specialized applicatjon for 100M connections per
         | server to make sense, but if you've got one, that sounds like a
         | fun challenge; my email is in my profile.
         | Moving 100M connections for maintenance will be a giant pain
         | though. You would want to spend a good amount of time on a test
         | suite so you can have confidence in the new deploys when you
         | make them. Also, the client side of testing will probably be
         | harder to scale than the server side... but you can do things
         | like run 1000 test clients with 100k outgoing connections each
         | to help with that.
       | Nullabillity wrote:
       | Loom is missing the point.
       | Time has shown that bare threads are not a viable high-level API
       | for managing concurrency. As it turns out, we humans don't think
       | in terms of locks and condvars but "to do X, I first need to know
       | Y". That maps perfectly onto futures(/promises). And once you
       | have those, you don't need all the extra complexity and hacks
       | that green threads (/"colourless async") bring in.
       | I'd take a system that combined the API of futures with the
       | performance of OS threads over the opposite combination, any day
       | of the week. But as it turns out, we don't have to choose. We can
       | have the performance of futures with the API of futures.
       | Or we can waste person-years chasing mirages, I guess. I just
       | hope I won't get stuck having to use the end product of this.
         | IshKebab wrote:
         | Threads have essentially the same API as Futures - normally you
         | have some join of join handle and you can join a set of threads
         | (the equivalent of awaiting a set of futures).
         | Threads don't require locks and condvars. You can use channels
         | and scoped joins etc. if you want.
         | Give me some async code and I'll show you an easier threaded
         | version.
         | bpicolo wrote:
         | The goroutine model in go is plenty conceptually simple for
         | concurrency. Correct me if I'm wrong, but loom seems similar in
         | that sense?
         | I don't find myself missing out on futures in Go.
         | pron wrote:
         | I think you're mixing specific synchronisation/communication
         | mechanisms with the basic concept of a thread, which is simply
         | the sequential composition of instructions _that is known and
         | observable by the runtime_. If you like the future /promise
         | API, that will work even better with threads, because then the
         | sequence is a reified concept known to the runtime and all its
         | tools. You'll be able to step through the sequence of
         | operations with a debugger; the profiler will know to associate
         | operations with their context. What API you choose to compose
         | your operations, whether you prefer message passing with no
         | shared state, shared state with locks, or a combination of the
         | two -- that's all orthogonal to threads. All they are is a
         | sequantial unit of instructions that may run concurrently to
         | other such units, _and is traceable and observable by the
         | platform and its tools_.
           | Nullabillity wrote:
           | You can implement futures by just running each future as a
           | thread, but it doesn't really give you much. It's a lot more
           | complex to write a preemptive thread scheduler + delegating
           | future scheduler than to just write a future scheduler in the
           | first place.
           | Especially when that future scheduler already exists and
           | works, and the preemptive one is a multi-year research
           | project away.
             | pron wrote:
             | It gives you a lot (aside from the ability to use existing
             | libraries and APIs): observability and debuggabillity.
             | Supporting tooling has been one of the most important
             | aspects of this project, because even those who were
             | willing to write asynchronous code, and even the few who
             | actually enjoyed it, constantly complained -- and rightly
             | so -- that they cannot easily observe, debug and profile
             | such programs. When it comes to "serious" applications,
             | observability is one of the most important aspects and
             | requirements of a system.
             | Instead of introducing new kind of sequenatial code unit
             | through all layers of tooling -- which would have been a
             | huge project anyway, we abstracted the existing thread
             | concept.
         | rvcdbn wrote:
         | Maybe threads don't work for your thinking style but your claim
         | that this is generally true is baseless and pretty well refuted
         | by languages like Go or Erlang that feature stackfull
         | threads/processes as a critical part of their best-in-class
         | concurrency stories.
           | Nullabillity wrote:
           | Erlang sidesteps the problem by avoiding mutable shared
           | state, in this context they're threads/processes in name
           | only.
           | Go is just yet another implementation of green threads that
           | is slightly less broken than prior implementations, because
           | it had the benefit of being implemented on day 1 (so the
           | whole ecosystem is green thread-aware). It's certainly
           | nowhere near "best-in-class".
             | toast0 wrote:
             | Shared mutable state is hard to work with, but Java threads
             | and Java promises both give you access to it. In either
             | case, you'd need discipline to avoid patterns which reduce
             | concurrency.
             | From the article, it seems that Loom (in preview) enables
             | the threaded model for Java to scale. IMHO, this is great
             | because you can write simple straightforward code in a
             | threaded model. You can certainly write complex code in a
             | threaded model too. Maybe there's an argument that promises
             | can be simple and straightforward too, but my experience
             | with them hasn't been very straightforward.
             | chrisseaton wrote:
             | > Erlang sidesteps the problem by avoiding mutable shared
             | state
             | Erlang is maximal shared mutable state!
             | Processes are mutable state and they're shared between
             | other processes.
         | groestl wrote:
         | If I look at a thread, I see futures all over the place.
         | They're just implicit, and the OS takes care of
         | concurrency/preemption. Sure, that means that you need
         | concurrency primitives if you access shared resources, but only
         | in the trivial case you can get away without shared state in
         | the promise/future scenario as well (i.e. glue code that ties
         | together the hard stuff). Downside is your code gets convoluted
         | and your stacktraces suck.
       | torginus wrote:
       | While impressive, I don't really see it as something practical -
       | I think scaling across processes/VMs is a much more realistic
       | approach.
       | notorandit wrote:
       | With a maximum of 64k TCP connections per single server IP, you
       | need 77 different IP on the server side. This is a fact.
         | imperio59 wrote:
         | Pretty sure you can bump that up in the kernel to hold more
         | active connections per server that 64k...
         | jauer wrote:
         | How do you figure?
         | Clients can connect to the server on the same server port, so
         | connection limit is more like 64k*2 for every Client IP-Server
         | IP pair.
           | akvadrako wrote:
           | Actually every client IP+port / server IP+port pair. Linux
           | uses 60999 - 32768 for ephemeral ports so can support 28e3^2
           | = 784 million connections per IP pair.
             | mypalmike wrote:
             | Except your service is almost certainly listening on one
             | non-ephemeral port.
             | But having "only" tens of thousands of connections per
             | client is rarely a problem in practice, apart from some
             | load testing scenarios (such as the experiment here, where
             | they opened a number of ports so they could test a large
             | number of connections with a single client machine).
               | charcircuit wrote:
               | 1 IP can correspond to multiple different clients.
         | peq wrote:
         | Isn't this limit per client ip, server ip, and server port?
         | (https://stackoverflow.com/a/2332756/303637)
         | alanfranz wrote:
         | "You need 77 ips" to do what? May be a fact or not, depending
         | on what you're doing.
         | If you suppose just one open server port, you'll probably need
         | 77 client ips to do this test to get unique socket pairs.
         | But it's a client problem, not a server one.
         | ivanr wrote:
         | I imagine that's the limit per client IP address [for a single
         | server port], no? The Linux kernel can use multiple pieces of
         | information to track connections: client IP address, client
         | port, server IP address, server port.
         | Cloudflare has some interesting blog posts on this topic:
         | - https://blog.cloudflare.com/how-we-built-spectrum/
         | - https://blog.cloudflare.com/how-to-stop-running-out-of-
         | ephem...
         | NovemberWhiskey wrote:
         | What?
         | Having run production services that had over 250,000 sockets
         | connecting to a single server port, I'm calling "nope" on that.
         | Are you thinking of the ephemeral port limit? That's on the
         | client side; not the server side. Each TCP socket pair is a
         | four-tuple of [server IP, server port, client IP, client port];
         | the uniqueness comes from the client IP/port part in the server
         | case.
         | jeroenhd wrote:
         | You don't really need 77 IP addresses (the 64k limit for TCP is
         | per client IP, per source port, per server IP) but even if you
         | did, your average IPv6 server will have a few billion
         | available. Every client can connect to a server IP of their own
         | if you ignore the practical limits of the network acceleration
         | and driver stack. If you're somehow dealing with this scale, I
         | doubt you'll be stuck with pure legacy IP addressing.
         | The real problem with such a setup is that you're not left with
         | a whole lot of bandwidth per connection, even if you ignore
         | things like packet loss and retransmits mucking up the
         | connections. Most VPS servers have a 1gbps connection, with 5
         | million clients that leaves 200 bytes per second of concurrent
         | bandwidth for TCP signaling and data to flow through. You'll
         | need a ridiculous network card for a single server to deal with
         | such a load, in the terabits per second range.
       (page generated 2022-04-30 23:00 UTC)