[HN Gopher] Async message-oriented architectures compared to syn...
       ___________________________________________________________________
        
       Async message-oriented architectures compared to synchronous REST-
       based systems
        
       Author : stolsvik
       Score  : 102 points
       Date   : 2023-02-12 18:48 UTC (4 hours ago)
        
 (HTM) web link (mats3.io)
 (TXT) w3m dump (mats3.io)
        
       | mdaniel wrote:
       | Making your own license is the new JavaScript framework, I guess:
       | https://github.com/centiservice/mats3/blob/v0.19.4-2023-02-1...
        
         | Pet_Ant wrote:
         | Not open source.
         | 
         | > Noncompete
         | 
         | > Any purpose is a permitted purpose, except for providing to
         | others any product that competes with the software.
        
           | stolsvik wrote:
           | I do not say that it is _Open Source_ either.
           | 
           | From the front page: "Free to use, source on github.
           | Noncompete licensed - PolyForm Perimeter."
           | 
           | Feel free to comment on this: Is this a complete deal breaker
           | for all potential users?
        
             | mixedCase wrote:
             | Can't speak for all potential users, but the license is in
             | fact a complete deal-breaker for me and any client I've
             | worked with given the FOSS tools available in the
             | ecosystem.
             | 
             | But then, there's also the "Java-only" which is a complete
             | deal-breaker in any client I've worked with doing
             | {micro,}services.
             | 
             | Then there's the "what the hell does this actually do"
             | deal-breaker when trying to explain it to some decision
             | makers, and the "we already have queues and K8s to solve
             | all of those issues" deal-breaker when explaining it to
             | most fellow SWEs/SREs.
        
               | stolsvik wrote:
               | Hahaha, that's rough! :-)
               | 
               | I'll tell you one thing: "What the hell does this
               | actually do?!" is extremely spot on! I am close to amazed
               | to how hard it is to explain this library. It really does
               | provide value, but it is evidently exceptionally hard to
               | explain.
               | 
               | I first and foremost believe that this is due to the
               | massive prevalence of sync REST/RPC style coding, and
               | that messaging is only pulled up as a solution when you
               | get massive influxes of e.g. inbound reports - where you
               | actually want the _queue_ aspect of a message broker. Not
               | the async-ness.
               | 
               | I've tried to lay this out multiple times, e.g. here:
               | https://mats3.io/docs/message-oriented-rpc/, and in the
               | link for this post itself.
        
             | lazyasciiart wrote:
             | That comment is quoting from the Polyform license. If it
             | doesn't represent your position, you may have made a bad
             | choice in license.
        
               | stolsvik wrote:
               | I was referring to the "not open source". I edited my
               | comment to be more specific.
        
             | jgilias wrote:
             | Your license would not be a dealbreaker for me in an SME
             | commercial setting. AGPL would be a dealbreaker.
        
               | stolsvik wrote:
               | Thanks a bunch! Seriously. And I agree that AGPL is
               | pretty harsh - I have a feeling that this is typically
               | used in a "try before you buy" situation, where there is
               | a commercial license on the side.
        
             | tinco wrote:
             | Well yeah of course, it's a direct contradiction of the
             | rising tide lifts all boats principle. Do you think
             | Kubernetes would have any traction at all if it had a
             | clause that it couldn't be used on AWS?
             | 
             | If it can't be adopted by the industry as a whole, then it
             | can't be considered an industry standard. It wouldn't fly
             | at my organization anyway, not even looking up what
             | PolyForm is.
        
               | stavros wrote:
               | As far as I can see, this doesn't say it can't be used on
               | AWS, it only says Amazon can't launch its own service
               | that uses this software to compete with itself. It's too
               | short to really tell what "compete" entails, though.
        
               | stolsvik wrote:
               | You are correct, this is meant as an AWS/GCP/Azure
               | preventor. ElasticSearch situation. That is, AFAIU, the
               | intention of the license I adopted. The "examples" part
               | spells it pretty directly out, as I also try to do here:
               | https://centiservice.com/license/
               | 
               | You may definitely use it anywhere you like.
        
               | stavros wrote:
               | I'm really in favor of something like that. AWS using
               | your own FOSS software to choke your revenue stream is a
               | blight on FOSS, so good for you for using that license.
        
               | stolsvik wrote:
               | Thank you!
        
               | mejutoco wrote:
               | I see your point, and could not avoid thinking if
               | ElasticSearch would have more revenue if AWS could not
               | offer it directly.
        
               | mixedCase wrote:
               | Do you believe the fat ugly monster that is ElasticSearch
               | would've had anywhere near its current adoption rates if
               | it had a non-OSI license from the start?
               | 
               | It would've been completely overshadowed by some other
               | Lucene-based wrapper or maybe some even better
               | alternative would've come along earlier.
        
               | indymike wrote:
               | I built on Elastic early on over Solr and several others
               | because it was open source and seemed to be better. I
               | would have selected a different Lucene wrapper if I had
               | known where Elastic was going.
        
               | mejutoco wrote:
               | Algolia does pretty well I believe. I could be wrong.
        
             | jagged-chisel wrote:
             | It's muddy/unknown enough that no one in a commercial
             | enterprise can entertain shipping a service using your
             | project.
        
             | marginalia_nu wrote:
             | Honestly I'm pretty annoyed by the "how dare you give the
             | source away under other terms than the ones I would
             | prefer"-type reactions that crop up from time to time. It's
             | an incredibly entitled attitude and is not a good look for
             | the open source community in general.
             | 
             | Like by all means, share code with GPL or Apache or MIT or
             | whatever, but don't get mad when someone selects another
             | license, including non-free ones with weird
             | incompatibilities.
        
               | jagged-chisel wrote:
               | Those kinds of complaints are indeed entitled. At the
               | same time, there's no problem pointing out that fewer
               | people and organizations can select a dependency with an
               | unconventional, unknown license.
               | 
               | You're welcome to license your projects however you see
               | fit. But when you get to a point that no one is using
               | your stuff, you have to be ready to hear "it's the
               | license."
        
               | lazyasciiart wrote:
               | Your comment is "how dare you complain about licensing".
               | What you are responding to is "huh, weird license, won't
               | use, that's a shame".
        
             | indymike wrote:
             | Yes, it is a deal breaker for me.
        
             | delusional wrote:
             | Well computing is all about redefining problems in terms of
             | other atoms. A messaging service is really just a series of
             | ALU operations and memory writes, which is then a series of
             | nand gates.
             | 
             | It seems incredibly muddy to me what "competing" would mean
             | in that sense. If I make something with this it could be
             | argued that my system built on top of MATS is just
             | immaterial configuration that was intended to be done by
             | the user. That the authors intention was for the end user
             | to use MATS themselves, and that I'm therefore in
             | competition with the product.
             | 
             | A non programming example would be hammers and houses. You
             | could imagine that if I build you a house, you'd be less
             | likely to need to buy a hammer (to build your own) making
             | my house competition for the hammer.
             | 
             | I wouldn't touch this at all.
        
         | stolsvik wrote:
         | Not entirely my own:
         | https://polyformproject.org/licenses/perimeter/1.0.0/
         | 
         | https://centiservice.com/license/
        
       | [deleted]
        
       | mrkeen wrote:
       | > Transactionality: Each endpoint has either processed a message,
       | done its work (possibly including changing something in a
       | database), and sent a message, or none of it.
       | 
       | Sounds too good to be true. Would love to hear more.
        
         | stolsvik wrote:
         | Well, okay, you're right! It is _nearly_ true, though. :) I 've
         | written a bit about it here: https://mats3.io/using-
         | mats/transactions-and-redelivery/
        
       | revskill wrote:
       | In our production apps, all network issues is resolved by simple
       | rate-limiter.
        
       | guhcampos wrote:
       | Oh yes, another "this thing I sell is an actual silver bullet"
       | post.
       | 
       | Message busses are great. RPC is too. There are use cases for
       | both. Saying one is "better" than the other is silly, and in this
       | case, a shame.
       | 
       | There are loads of message passing libraries out there, based on
       | all kinds of backend, from RabbitMQ, to NATS, to Redis, to Kafka.
       | This does not innovate over anything, it's just shameless
       | marketing.
        
         | stolsvik wrote:
         | This is unfair. I made Mats so that I could use messaging in a
         | simpler form. Nothing else.
         | 
         | Mats is an API that can be implemented on top of any _queue-
         | based_ message broker - which excludes Kafka. But it definitely
         | includes ApacheMQ (which is what we use), Artemis and hence
         | RedHat 's MQ (which the tests runs), and RabbitMQ (whose JMS
         | implementation is too limited to directly be used, but I do
         | hope to implement Mats on top of it at some point). Probably
         | also NATS. Probably also Apache Pulsar, which I just recently
         | realized have a JMS client.
         | 
         | You could even implement it on top of ZeroMQ, and implement it
         | on top of any database, particularly Postgres since it has
         | those "queue extensions" NOTIFY and SKIP LOCKED.
         | 
         | edit: I actually have an feature-issue exploring such an
         | implementation: https://github.com/centiservice/mats3/issues/15
        
           | latchkey wrote:
           | > _ApacheMQ_
           | 
           | I hope ActiveMQ Artemis is better than the 'classic' version
           | and that is what you're using. The last time I used it
           | probably a decade ago now, there were so many issues with it,
           | that it was a complete train wreck at scale. I would be very
           | hesitant to pick that one up again.
        
       | jeffbee wrote:
       | If you pretend that your message bus has zero producer impedance
       | and costs nothing then this analysis makes great sense. If you
       | have ever operated or paid for this type of scheme in the real
       | world then you will have some doubts.
        
         | stolsvik wrote:
         | I guess you'd say the same about cloud functions and lambdas,
         | then? To which I agree.
         | 
         | Paying per message would require the message cost to be pretty
         | small. Might want to evaluate setting up a broker yourself if
         | the cost starts getting high.
        
       | robertlagrant wrote:
       | Having done a reasonable amount of messaging code in my time, I
       | would say the final form of this sort of thing might look more
       | like Cadence[0] than anything like this.
       | 
       | [0] https://github.com/uber/cadence
        
         | stolsvik wrote:
         | Cadence is a workflow management system. As is Temporal, Apache
         | Beam, Airbnb Airflow, Netflix Conductor, Spotify Luigi, and
         | even things like Github Actions, Google Cloud Workflows, Azure
         | Service Fabric, AWS SWF, Power Automate.
         | 
         | A primary difference is that those are _external systems_ ,
         | where you define the flows inside that system - the system then
         | "calling out" to get pieces of the flow done.
         | 
         | Mats is an "internal" system: You code your flows inside the
         | service. It is meant to directly replace synchronously calling
         | out to REST services, instead enabling async messaging but with
         | the added bonus of being _as simple as_ using REST services.
         | 
         | But yes, I see the point.
        
           | MuffinFlavored wrote:
           | Is GitHub Actions really similar enough to Temporal/Cadence
           | to be included in the list?
        
             | stolsvik wrote:
             | Hmm. Maybe not. But they sure have much in common: You
             | define a set of things that should be done, triggered by
             | something - either a schedule, an event (oftentimes a
             | repository event, but it doesn't have to), or from another
             | Github action.
        
       | eBombzor wrote:
       | Why is this better over Kafka?
        
         | stolsvik wrote:
         | As far as I understand, Kafka is positioning itself to be the
         | leading _Event Sourcing_ solution.
         | 
         | I view event sourcing to be fundamentally different from
         | message passing. For a long time I tried to love event
         | sourcing, but I see way to many problems with it. The primary
         | problem I see is that you then end up with a massive source of
         | events, which any service can subscribe to as they see fit. How
         | is this different from having one gigantic spaghetti database?
         | Also, event migrations over time.
         | 
         | RPC and messaging feels to me to be much clearer separated: I
         | own the Accounts, and you own the Orders. We explicitly
         | communicate when we need to.
         | 
         | I see benefits on both sides, but have firmly landed on _not_
         | event sourcing.
        
       | hbrn wrote:
       | 1. Anything that is connected to user interface should be
       | synchronous by default.
       | 
       | 2. You can't predict which parts of your system will be connected
       | to user interface.
       | 
       | 3. Here's the worst part: _async messaging is viral_. A service
       | that depends on async service becomes async too.
       | 
       | You should be very cautions introducing async messaging to your
       | systems. The only parts that should be allowed to be async are
       | the ones that can afford to fail.
       | 
       | I spend good amount of time trying to work around these dumb
       | enterprise patterns when building products on top of async APIs.
       | You are literally forced to build inferior products just because
       | someone thought that async messaging is so great. It's great for
       | everybody, _except the final user_.
       | 
       | Async processing is not a virtue, it's a necessity for high
       | load/high throughput systems.
       | 
       | The reason SOA failed many years ago is precisely the async
       | message bus.
        
         | stolsvik wrote:
         | We clearly do not agree.
         | 
         | Wrt. sync processing when using Mats:
         | https://mats3.io/docs/sync-async-bridge/
         | 
         | But my better solution is instead to pull the async-ness all
         | the way out to the client: https://matssocket.io/
         | 
         | Also, I have another take on the SOA failure, mentioned here:
         | https://mats3.io/about/
         | 
         | It was definitely not because of async, at least as I remember
         | it.
        
           | SpaghettiX wrote:
           | I appreciate some events can be asynchronous for clients, for
           | example: actions taken by other users, or events generated by
           | the system. However, I do think implementation details (using
           | async in the server) should be encapsulated from clients:
           | when users save a new document, it's much easier for the
           | client to receive a useful albeit delayed response, rather
           | than "event submitted", wait for the result on a stream. Of
           | course, other relevant clients may need to hear about that
           | event too. The service architecture should not affect / make-
           | life-harder for clients.
           | 
           | Therefore I think disagree with both parent and grandparent
           | comments. Use each when they make sense, not "synchronous by
           | default" (grandparent comment, though I do think there are
           | good points made), or "asynchronous based on service
           | architecture" (parent comment).
           | 
           | > But my better solution is to pull the async-ness all the
           | way out to the client: https://matssocket.io/
           | 
           | Is that a solution that you use? I took a look at matssocket
           | https://www.npmjs.com/package/matssocket, it currently has 2
           | weekly downloads. :thinking:.
        
             | stolsvik wrote:
             | To make a point out of it: This is not _event based_ in the
             | event sourcing way of thinking. It is using messages. You
             | put a message on a queue, someone else picks it up. Mats
             | implements a request /reply paradigm on top ("messaging
             | with a call stack").
             | 
             | In the interactive, synchronous situation, you do not "wait
             | for an event" per se. You wait for a specific reply. When
             | using the MatsFuturizer (https://mats3.io/docs/sync-async-
             | bridge/), it is _extremely_ close to how you would have
             | used a HttpClient or somesuch.
             | 
             | MatsSocket: The Dart/Flutter implementation is used in a
             | production mobile app. For the Norwegian market only,
             | though.
             | 
             | The JS implementation is used in an internal solution.
             | 
             | Would have been really nice with a bit more usage, yes. It
             | is actually pretty nice, IMHO! ;-)
        
         | toast0 wrote:
         | > Async processing is not a virtue, it's a necessity for high
         | load/high throughput systems.
         | 
         | > 1. Anything that is connected to user interface should be
         | synchronous by default.
         | 
         | If everything UI is synchronous, you prevent users from
         | acheiving high throughput. Sometimes that's fine, but sometimes
         | it's not.
         | 
         | It's simple to wait for a response to a request sent via
         | asynchronous messaging. It's not simple to split a synchronous
         | API into send and receive parts. However, REST is HTTP and
         | there's lots of async HTTP libraries out there.
        
       | samsquire wrote:
       | Thanks for this.
       | 
       | I love the idea of breaking up a flow into separately scheduled
       | but still linear message flow.
       | 
       | I wrote about a similar idea in ideas2
       | 
       | https://github.com/samsquire/ideas2#84-communication-code-sl...
       | 
       | The idea is that I enrich my code with comments and a transpiler
       | schedules different parts of the code to different machines and
       | inserts communication between blocks.
       | 
       | I read about how Zookeeper algorithm for transactionality and
       | robustness to messages being dropped, which is interesting
       | reading.
       | 
       | https://zookeeper.apache.org/doc/r3.4.13/zookeeperInternals....
       | 
       | How does Mats compare?
       | 
       | LMAX disruptor has a pattern where you split up each side of an
       | IO request into two events, to avoid blocking in an handler. So
       | you would always insert a new event to handle an IO response.
        
       | derefr wrote:
       | > Back-pressure (e.g. slowing down the entry-points) can easily
       | be introduced if queues becomes too large.
       | 
       | ...which presumably includes load-shedding to stop misbehaving
       | components from overloading the queues; at which point, unless
       | you want clients to just lose track of the things they wanted
       | done when they get a "we're too busy to handle this right now"
       | response, you've essentially circled back around to clients
       | having to use a client with REST-like "synchronous/blocking
       | requests with retry/backpressure" semantics -- just where the
       | requests that are being synchronously-blocked on are "register
       | this as a work-item and give me an ID to check on its status"
       | rather than "do this entire job and tell me the result."
       | 
       | And if you're doing that, why force the client to think in terms
       | of async messaging at all? Just let them do REST, and hide the
       | queue under the API layer of the receiver.
        
         | Supermancho wrote:
         | > you've essentially circled back around to clients having to
         | use a client with REST-like "synchronous/blocking requests with
         | retry/backpressure" semantics
         | 
         | Yes, they both do the same thing. That's not even the starting
         | point of the discussion. The implementation from HTTP to a
         | message queue (mailbox system) is the discussion point.
         | 
         | Having the caller (who needs work done) wait to be informed
         | when the work is done (or not done) is less deterministic than
         | telling the callee how long before the work doesn't matter
         | anymore. The callee gives back a transaction ID/is provided a
         | callerID or is unavailable, and the caller knows (very quickly)
         | it's not going to get done or knows where to look for the work
         | (or abandon it). Either way, it allows for optimization on both
         | sides.
        
         | tass wrote:
         | This is where I always end up. You can have queues which give
         | you certain benefits, but there'a a lot of stuff to be built on
         | top to make it as operationally simple as http.
        
           | stolsvik wrote:
           | I will argue that this simplicity is exactly what Mats
           | provides. At least that is the intention.
        
             | revskill wrote:
             | I don't see the code on webpage to explain things.
             | Simplicity means you can explain complex things with simple
             | code.
             | 
             | Because English is ambigous and subjective. Just use code ?
        
               | stolsvik wrote:
               | There is code here: https://mats3.io/docs/message-
               | oriented-rpc/ .. and here: https://mats3.io/docs/mats-
               | flow-initiation/ .. and here: https://mats3.io/docs/sync-
               | async-bridge/ .. and here:
               | https://mats3.io/docs/springconfig/ .. and here:
               | https://mats3.io/background/what-is-mats/ .. and here:
               | https://mats3.io/using-mats/endpoints-and-initiations/
               | 
               | .. and on the github page here:
               | https://github.com/centiservice/mats3/blob/main/README.md
               | 
               | .. and you are advised to explore the code here:
               | https://mats3.io/docs/explore/
        
               | cerved wrote:
               | yes but there's sadly no code in what you posted
        
         | charrondev wrote:
         | The system I'm currently on is currently moving a lot of work
         | into queues. Some operations, like "change the criteria of this
         | rank" could be anywhere between 5 seconds (if the number of
         | users of the criteria to evaluate are small) or 10+ hours if we
         | need re-evaluate the rules against 10m+ users.
         | 
         | In this case we write our jobs as generators that can be
         | paused, serialized and picked up again later. We give the job 5
         | seconds synchronously, then if it passes that time, queue and
         | job and let the client know a job has been registered.
         | 
         | The users account holds the IDs of the jobs as well as some
         | basic information about the the tasks they have queued. There
         | is a rest endpoint to return the current status of the jobs and
         | information about them (what are they doing, what's their
         | progress, how much work remains).
         | 
         | The client will negotiate a web socket connection with a
         | different service to be notified whenever progress is made on
         | the job and the client can then check the endpoint for the
         | latest status.
        
           | latchkey wrote:
           | That 5 seconds is going to bite you.
           | 
           | There is going to be some sort of stall in the future that
           | causes all of your jobs to hit that 5 seconds and everything
           | is going to start to back up and cause other problems up the
           | line that are really hard to test for in advance.
           | 
           | You're better off designing a system that doesn't rely on
           | some arbitrary number of seconds (why not 4 or 6 seconds?) to
           | begin with.
        
             | naasking wrote:
             | Yes, non-determinism is the bane of distributed systems. It
             | should be minimized whenever possible.
        
         | naasking wrote:
         | > And if you're doing that, why force the client to think in
         | terms of async messaging at all? Just let them do REST, and
         | hide the queue under the API layer of the receiver.
         | 
         | Yes, exactly. And on top of that, async messaging implicitly
         | introduces DoS vulnerabilities exactly because of the buffering
         | required. At least with sync messaging exposing a queue in the
         | API layer, you opt-into this vulnerability.
        
           | stolsvik wrote:
           | As mentioned here: https://mats3.io/background/system-of-
           | services/
           | 
           | .. Mats is meant to be an inter-service communication
           | solution.
           | 
           | It is explicitly _not_ meant to be your front-facing
           | endpoints. If you are DoS 'ed, it would be from your own
           | services. Of course, that might still happen, but then things
           | would not have been much better if you used sync comms.
           | 
           | It is true that you can bridge from sync to the async world
           | of Mats using the MatsFuturizer (https://mats3.io/docs/sync-
           | async-bridge/), but then you still have your e.g. Servlet
           | Container as the front-facing entity.
           | 
           | (Also check out https://matssocket.io/, though)
        
         | stolsvik wrote:
         | Well, yes - there is nothing with Mats that you cannot do with
         | any other communication form, if you code it up. When you say
         | "register this as a work-item and give me an ID to check on its
         | status", you've implemented a queue, right?
         | 
         | The intention is that Mats gives you an easy way to perform
         | async message-oriented communications. Somewhat of a bonus, you
         | can also use it for synchronous tasks, using the MatsFuturizer,
         | or MatsSocket. A queue can handle transient peaks of load much
         | better than direct synchronous code. It is also quite simple to
         | scale out. But if you do get into problems of getting too much
         | traffic for the system to process, you will have to handle that
         | - and Mats does not currently have any magic for performing
         | e.g. load shedding, so you're on your own. (I have several
         | thoughts on this. E.g. monitor the queue sizes, and deny any
         | further initiations if the queues are too large).
         | 
         | Wrt. synchronous comms, Mats do provide a nice feature, where
         | you can mark a Mats Flow as "interactive", meaning that some
         | human is waiting for the result. This results in the flow
         | getting priority on every stage it passes through - so that if
         | it competes with internal, more batchy processes, it will cut
         | the lines.
        
           | derefr wrote:
           | > A queue can handle transient peaks of load much better than
           | direct synchronous code.
           | 
           | Whether a workload is being managed upon creation using a
           | work queue within the backend, has nothing to do with the
           | semantics of the communications protocol used to talk about
           | the state of said workload. You can arbitrarily combine these
           | -- for example, DBMSes have the unusual combination of having
           | a stateful connection-oriented protocol for scheduling
           | blocking workloads, but also having the ability to introspect
           | the state of those ongoing workloads with queries on other
           | connections.
           | 
           | My point is that clients in a distributed system can
           | literally never do "fire and forget" messaging _anyway_ --
           | which is the supposed advantage of an  "asynchronous message-
           | oriented communications" protocol over a REST-like one. Any
           | client built to do "fire and forget" messaging, when used at
           | scale, always, always ends up needing some sort of outbox-
           | queue abstraction, where the outbox controller is internally
           | doing synchronous blocking retries of RPC calls to get an
           | acknowledgement that a message got safely pushed into the
           | queue and can be locally forgotten.
           | 
           | And that "outbox" is a _leaky abstraction_ , because in
           | trying to expose "fire and forget" semantics to its caller,
           | it has no way of imposing backpressure on its caller. So the
           | client's outbox overflows. Every time.
           | 
           | This is why Google famously switched every internal protocol
           | they use _away_ from using message queues /busses with
           | asynchronous "fire and forget" messaging, _toward_
           | synchronous blocking RPC calls between services. With an
           | explicitly-synchronous workload-submission protocol (which
           | may as well just be over a request-oriented protocol like
           | HTTP, as gRPC is), all operational errors and backpressure
           | get bubbled back up from the workload-submission client
           | library to its caller, where the caller can then have logic
           | to decide the business-logic-level response that is most
           | appropriate, for each particular fault, in each particular
           | calling context.
           | 
           | Message queues are the quintessential "smart pipe", trying to
           | make the network handle all problems itself, so that the
           | nodes (clients and backends) connected via such a network can
           | be naive to some operational concerns. But this will never
           | truly solve the problems it sets out to solve, as the _policy
           | knowledge_ to properly drive the decision-making for the
           | _mechanism_ that handles operational exigencies in message-
           | handling, isn 't available "within the network"; it lives
           | only at the edges, in the client and backend application code
           | of each service. Those exigencies -- those failures and edge-
           | case states -- must be pushed out to the client or backend,
           | so that policy can be applied. And if you're doing that, you
           | may as well move the mechanism to enforce the policy there,
           | too. At which point you're back to a dumb pipe, with smart
           | nodes.
        
             | jgilias wrote:
             | Is there something I can read about Google switching to
             | sync RPC? Like a blog post or something like that?
             | 
             | Thanks!
        
             | stolsvik wrote:
             | "Not everybody is Google"
             | 
             | These concepts has worked surprisingly well for us for
             | nearly a decade. We're not Google-sized, but this
             | architecture should work well for a few more orders of
             | magnitude traffic.
             | 
             | Also, you can mix and match. If you have some parts of your
             | system with absolutely massive traffic, then don't use this
             | there, then.
             | 
             | Note that we very seldom use "fire and forget" (aka
             | "send(..)"). We use the request-replyTo paradigm much more.
             | Which is basically the basic premise of Mats, as an
             | abstraction over pure "forward-only" messaging.
        
               | derefr wrote:
               | > Note that very we use "fire and forget" very seldom
               | (aka "send(..)"). We use the request-replyTo paradigm
               | much more. Which is basically the basic premise of Mats,
               | as an abstraction over pure "forward-only" messaging.
               | 
               | That doesn't help one bit. You're still firing-and-
               | forgetting the request itself. The reply (presumably with
               | a timeout) ensures that the client doesn't sit around
               | forever waiting for a lost message; but it does nothing
               | prevent badly-written request logic from overloading your
               | backend (or overloading the queue, or "bunging up" the
               | queue such that it'll be ~forever before your backend
               | finishes handling the request spike and gets back to
               | processing normal workloads.)
               | 
               | > If you have some parts of your system with absolutely
               | massive traffic, then don't use this there, then.
               | 
               | I'm not talking about massive _intended_ traffic. These
               | problems come from _failures in the architecture of the
               | system to inherently bound requests to the current scale
               | of the system_ (where autoscaling changes the  "current
               | scale of the system" before such limits kick in.)
               | 
               | So, for example, there might be an endpoint in your
               | system that allows the caller to trigger logic that does
               | O(MN) work (the controller for that endpoint calls
               | service X O(M) times, and then for each response from X,
               | calls service Y O(N) times); where it's fully expected
               | that this endpoint takes 60+ seconds to return a
               | response. The endpoint was designed to serve the need of
               | some existing internal team, who calls it for reporting
               | once per day, with a batch-size N=2. But, unexpectedly, a
               | new team, building a new component, with a new use-case
               | for the same endpoint, writes logic that begins calling
               | the endpoint once every 20 seconds, with a batch-size of
               | 20. Now the queues for the services X and Y called by
               | this endpoint are filling faster than they're emptying.
               | 
               | No DDoS is happening; the requests are quite small, and
               | in networking terms, quite sparse. Everything is working
               | as intended -- and yet it'll all fall over, because
               | you've chosen yourself into a protocol where there's no
               | _inherent, by-default_ mechanism for  "the backend is
               | overloaded" to apply backpressure to make _new requests
               | from the frontend_ stop coming (as it would in a
               | synchronous RPC protocol, where 1. you can 't submit a
               | request on an open socket when it's in the "waiting for
               | reply" state; and 2. you can't get a new open socket if
               | the backend isn't calling accept(2)); and you didn't
               | think that this endpoint would be one that gets called
               | much, so you didn't bother to think about explicitly
               | implementing such a mechanism.
        
               | stolsvik wrote:
               | Relying on the e.g. Servlet Container not being able to
               | handle requests seems rather bad to me. That is a very
               | rough error handling.
               | 
               | We seem to have come to the exact opposite conclusions
               | wrt. this. Your explanations are entirely in line with
               | mine, but I found this "messy" error handling to be
               | exactly what I wanted to avoid.
               | 
               | There is one particular point where we might not be in
               | line: I made Mats first and formost _not_ for the
               | synchronus situation, where there is a user waiting. This
               | is the  "bonus" part, where you can actually do that with
               | the MatsFuturizer, or the MatsSocket.
               | 
               | I first and foremost made it for internal, batch-like
               | processes like "we got a new price (NAV) for this fund,
               | we now need to settle these 5000 waiting orders". In that
               | case, the work is bounded, and an error situation with
               | not-enough-threads would be extremely messy. Queues
               | solves this 100%.
               | 
               | I've written some about my thinking on the About page:
               | https://mats3.io/about/
        
               | derefr wrote:
               | > Relying on the e.g. Servlet Container not being able to
               | handle requests seems rather bad to me. That is a very
               | rough error handling.
               | 
               | It's one of those situations where the simplest "what you
               | get by accident with a single-threaded non-evented
               | server" solution, and the most fancy-and-complex
               | solution, actually look alike from a client's
               | perspective.
               | 
               | What you actually want is that each of your backends
               | monitors its own resource usage, and flags itself as
               | unhealthy in its readiness-check endpoint when it's
               | approaching its known per-backend maximum resource
               | capacity along any particular dimension -- threads,
               | memory usage, DB pool checked-out connections, etc.
               | (Which can be measured quite predictably, because you're
               | very likely running these backends in containers or VMs
               | that enforce bounds on these resources, and then scaling
               | the resulting predictable-consumption workload-runners
               | horizontally.) This readiness-check failure then causes
               | the backend to be removed from consideration as an
               | upstream for your load-balancer / routing target for your
               | k8s Service / etc; but existing connected flows continue
               | to flow, gradually draining the resource consumption on
               | that backend, until it's low enough that the backend
               | begins reporting itself as healthy again.
               | 
               | Meanwhile, if the load-balancer gets a request and finds
               | that it currently has _no_ ready upstreams it can route
               | to (because they 're all unhealthy, because they're all
               | at capacity) -- then it responds with a 503. Just as if
               | all those upstreams had crashed.
               | 
               | > Your explanations are entirely in line with mine, but I
               | found this "messy" error handling to be exactly what I
               | wanted to avoid.
               | 
               | Well, yes, but that's my point made above: this error
               | handling is "messy" precisely because it's an _encoding
               | of user intent_. It 's irreducible complexity, because
               | it's something where you want to make the decision of
               | what to do differently in each case -- e.g. a call from A
               | to X might consider the X response critical (and so
               | failures should be backoff-retried, and if retries
               | exceeded, the whole job failed and rescheduled for
               | later); while a call from B to X might consider the X
               | response only a nice-to-have optimization over
               | calculating the same data itself, and so it can try once,
               | give up, and keep going.
               | 
               | > I made Mats first and formost not for the synchronus
               | situation, where there is a user waiting.
               | 
               | I said nothing about users-as-in-humans. We're presumably
               | both talking about a Service-Oriented Architecture here;
               | perhaps even a microservice-oriented architecture. The
               | "users" of Service X, above, are Service A and Service B.
               | There's a Service X client library, that both Service A
               | and Service B import, and make calls to Service X
               | through. But these are still, necessarily, _synchronous_
               | requests, since the further computations of Services A
               | and B are _dependent on_ the response from Service X.
               | 
               | Sure, you can queue the requests to Services A and B as
               | long as you like; but _once they 're running_, they're
               | going to sit around waiting on the response from Service
               | X (because they have nothing better to be doing while the
               | Service X response-promise resolves.) Whether or not the
               | Service X request is synchronous or asynchronous doesn't
               | matter to them; they have a synchronous (though not
               | timely) _need_ for the data, within their own
               | asynchronous execution.
               | 
               | Is this not the common pattern you see for inter-service
               | requests within your own architecture? If not, then what
               | is?
               | 
               | If what you're really talking about here is forward-only
               | propagation of values -- i.e. never needing _a response_
               | (timely or not) from most of the messages you send in the
               | first place -- then you 're not really talking about a
               | messaging protocol. You're talking about a dataflow
               | programming model, and/or a distributed CQRS/ES event
               | store -- both of which can and often are implemented on
               | top of message queues to great effect, and neither of
               | which purport to be sensible to use to build RPC request-
               | response code on top of.
        
               | stolsvik wrote:
               | To your latter part: This is exactly the point: Using
               | messages and queues makes the flows take whatever time it
               | takes. Settling of the mentioned orders are not time
               | critical - well, at least not in the way a request from a
               | user sitting on his phone logging in to see his holdings
               | is time critical. So therefore, if it takes 1 second, or
               | 1 hour, doesn't matter all that much.
               | 
               | The big point is that _none_ of the flows will fail. They
               | will all pass through _as fast as possible_ , literally,
               | and will never experience any failure mode resulting from
               | randomly exhausted resources. You do not need to make any
               | precautions for this - backpressure, failure handling,
               | retries - as it is inherent in how a messaging-based
               | system works.
               | 
               | Also, if a user logs into the system, and one of the
               | login-flows need a same service as the settling flows,
               | then that flow will "cut the line" since they are marked
               | "interactive".
        
               | bcrosby95 wrote:
               | RPC works for just non Google and Google scale. This is
               | one of the times where, IMHO, you can skip the middle
               | section. Novices resort to RPC, Google resorts to RPC,
               | and in the mid tier you have something where messaging
               | can step in.
               | 
               | Why not skip it? Use RPC like a novice. If it becomes
               | problematic, start putting in compensating measures.
        
         | peoplefromibiza wrote:
         | > And if you're doing that, why force the client to think in
         | terms of async messaging at all? Just let them do REST, and
         | hide the queue under the API layer of the receiver.
         | 
         | because REST is stupid
         | 
         | REST is request -> response, single connection, one direction
         | only, which is a very limited way to model messaging .
         | 
         | There is more than one communication mode and bidirectional
         | messaging is a thing.
         | 
         | REST also offers no control whatsoever over the communication
         | channel, so you are stuck with the configuration set on the
         | server side
         | 
         | which might or might not be correct for your use case
         | 
         | See RSocket for an example of a message driven protocol which
         | solves most of the shortcomings of REST
         | 
         | on the bright side REST is also stupid simple
         | 
         | which is why is so widely deployed, it doesn't require thinking
         | 
         | > response, you've essentially circled back around to clients
         | having to use a client with REST-like
         | 
         | no, because you did not block there waiting for the timeout
         | which defaults to 30 seconds for HTTP
         | 
         | and even if you abandon on the client side, the server will
         | still process the request, there's no way to abort it once it's
         | been started.
        
           | naasking wrote:
           | > REST is request -> response, single connection, one
           | direction only, which is a very limited way to model
           | messaging
           | 
           | You seem to be implying that being limited is a bad thing.
           | Constraints are important to keep problems tractable.
        
             | klabb3 wrote:
             | In my experience the amount of serialized (network
             | blocking) calls needed under the request reply paradigm
             | always grows over time as the application gets larger.
             | 
             | At least this limitation can cause massive complexity once
             | perf optimizations are needed. I think that's important to
             | factor in when we're talking about the issues with large
             | and resilient systems in either paradigm.
             | 
             | Personally I like message passing because it's more true to
             | the underlying protocol (TCP or UDP) and actually interops
             | quite well (all things considered) with request-reply
             | systems - it just requires two separate messages and a
             | request id which is standard practice in request-response
             | anyway. The inverse is not true though: we have like 10
             | different Jacky solutions in the last decade for sending
             | server initiated messages to clients.
        
       | rkangel wrote:
       | If I understand this right, this is basically the Erlang/Elixir
       | OTP programming model, but across microservices rather than
       | across a single (potentially distributed) VM. To be clear - that
       | is a _good_ thing.
       | 
       | One of the core concepts of OTP (effectively the Erlang standard
       | library) is the GenServer. A GenServer processes incoming
       | messages, mutates state if appropriate and sends responses. The
       | OTP machinery means that this "send a message and wait for a
       | response" is just a straight function call with return value to
       | the caller. OTP takes care of all the edge cases (like when the
       | process at the other end goes away half way through). This means
       | that your code is just a straight series of synchronous function
       | calls, which may be sending messages underneath to do things or
       | get data, but you don't have to care. It's a lovely system to
       | work in, and makes complicated systems feel simple.
       | 
       | The elements communicating are in Erlang terminology 'processes'
       | - but not OS processes, they are instead lightweight userspace
       | schduled things - very lightweight to create. Erlang has built in
       | distribution that allows you to connect multiple running
       | machines, and then the same message passing works across network
       | boundaries. You're still limited to the BEAM VM though. This is
       | the 'full' microservice version of that.
        
       | lp4vn wrote:
       | I think this article is kind of misleading.
       | 
       | You use messaging for asynchronous communication and REST for
       | synchronous communication. The article makes me believe that
       | using REST for synchronous communication is a kind of deprecated
       | alternative in front of message passing.
        
       | zmmmmm wrote:
       | This reminds me more of Apache Camel[0] than other things it's
       | being compared to.
       | 
       | > The process initiator puts a message on a queue, and another
       | processor picks that up (probably on a different service, on a
       | different host, and in different code base) - does some
       | processing, and puts its (intermediate) result on another queue
       | 
       | This is almost exactly the definition of message routing (ie:
       | Camel).
       | 
       | I'm a bit doubtful about the pitch because the solution is
       | presented as enabling you to maintain synchronous style
       | programming while achieving benefits of async processing. This
       | just isn't true, these are fundamental tradeoffs. If you need a
       | synchronous answer back then no amount of queuing, routing,
       | prioritisation, etc etc will save you when the fundamental
       | resource providing that is unavailable, and the ultimate outcome
       | that your synchronous client now hangs indefinitely waiting for a
       | reply message instead of erroring hard and fast is not desirable
       | at all. If you go into this ad hoc, and build in a leaky
       | abstraction that asynchronous things are are actually synchronous
       | and vice versa, before you know it you are going to have unstable
       | behaviour or even worse, deadlocks all over your system and the
       | worst part - the true state of the system is now hidden in which
       | messages are pending in transient message queues everywhere.
       | 
       | What really matters here is to fundamentally design things from
       | the start with patterns that allow you to be very explicit about
       | what needs to be synchronous vs async (building on principles of
       | idempotency, immutability, coherence, to maximise the cases where
       | async is the answer).
       | 
       | The notion of Apache Camel is to make all these decisions a first
       | class elements of your framework and then to extract out the
       | routing layer as a dedicated construct. The fact it generalises
       | beyond message queues (treating literally anything that can
       | provide a piece of data as a message provider) is a bonus.
       | 
       | [0] https://camel.apache.org/
        
         | hummus_bae wrote:
         | > The ultimate outcome that your synchronous client now hangs
         | indefinitely waiting for a reply message instead of erroring
         | hard and fast is not desirable at all.
         | 
         | Async frameworks don't eliminate the possibility of long
         | running processes, that continue to process long after
         | responding a request - this is still possible with specific
         | libraries/frameworks, they'll only take away the synchronous
         | interface and provide as asychonous one instead.
         | 
         | It is also important to note that error handling will be
         | different between these 2 paradigms and it's important,
         | whatever the most suitable one is, to acknowledge this since it
         | forces us (developers) to handle the potential errors
         | differently depending on the approach we choose.
        
           | zmmmmm wrote:
           | I think you're stating exactly my point?
           | 
           | The pitch of MATS is that it let's:
           | 
           | > developers code message-based endpoints that themselves may
           | "invoke" other such endpoints, in a manner that closely
           | resembles the familiar synchronous "straight down" linear
           | code style
           | 
           | In other words, they want to encourage you to feel like you
           | are coding a synchronous workflow while actually coding an
           | asynchronous one. You are pointing out that error handling
           | needs to be different b/w these paradigms and you are
           | correct, but that is only the start of it. A framework that
           | papers over the differences is at very high risk of just
           | creating a massive number of leaky abstractions that don't
           | show up in the happy scenario but come back and bite you
           | heavily when things go wrong.
           | 
           | (I'm saying this as a long time user of Camel which models
           | this exact concept heavily and also experiences many of these
           | issues)
        
             | stolsvik wrote:
             | Hmm. I want to distance this library pretty far from Camel!
             | 
             | Wrt. "papering over": Not really. I make it "feel like"
             | you're coding straight down, sequential, linear, _as if_
             | you 're coding synchronously.
             | 
             | But if you look at the examples, e.g. at the very start of
             | the Walkthrough: https://mats3.io/docs/message-oriented-
             | rpc/, you'll understand that you are actually coding
             | completely message-driven: Each stage is a completely
             | separate little "server", picking up messages from one
             | queue, and most often putting a new message onto another
             | queue.
             | 
             | It is true that the error handling is very different. Don't
             | code errors! You cannot throw an exception back to the
             | caller. You can however make for "error return" style DTOs,
             | but otherwise, if you have an actual error, it'll "pop out"
             | of the _Mats Fabric_ and end up on a DLQ. This is nice! It
             | is not just a WARN or ERROR log-line in some log that no-
             | one will see until way later, if ever: It immediately
             | demands your attention.
             | 
             | I wrote quite a long answer to something similar on a
             | Reddit-thread a month ago: https://www.reddit.com/r/program
             | ming/comments/1059jpv/messag...
        
               | zmmmmm wrote:
               | > Hmm. I want to distance this library pretty far from
               | Camel!
               | 
               | I'm curious what makes you distinguish it heavily from
               | Camel? From everything you say it sounds to me like you
               | are building Camel - or at least the routing part of it
               | :-)
        
       | ngrilly wrote:
       | How is it different from NATS.io that solves most of the problems
       | listed? (except the transactional aspect but I'm not convinced
       | it's a good thing to have the same tool do everything)
        
         | stolsvik wrote:
         | I see references to NATS multiple times, but I fail to see how
         | it solves what Mats aims to solve?
         | 
         | Mats could be implemented on top of NATS, i.e. use NATS as a
         | backend, instead of JMS. (We use ActiveMQ as the broker)
        
       | adamckay wrote:
       | The articles notes about async messaging architectures being
       | superior to REST-based systems seems rather disingenuous, in my
       | opinion, as it's seemingly only considering the most basic REST
       | API deployed on a single node as the alternative.
       | 
       | For example:
       | 
       | > High Availability: For each queue, you can have listeners on
       | several service instances on different physical servers, so that
       | if one service instance or one server goes down, the others are
       | still handling messages.
       | 
       | This is negated in a REST-based system with the use of an API
       | gateway / simple load balancer and multiple upstream nodes.
       | 
       | > Location Transparency [Elastic systems need to be adaptive and
       | continuously react to changes in demand, they need to gracefully
       | and efficiently increase and decrease scale.]: Service Location
       | Discovery is avoided, as messages only targets the logical queue
       | name, without needing information about which nodes are currently
       | consuming from that queue.
       | 
       | Fair enough, service discovery is another challenge, but it's not
       | hugely complex with modern API gateways and arguably no more
       | complex with running and maintaining a message queue with
       | associated workers. You've also got a risk in a distributed
       | messaging system used by multiple teams that one service
       | publishes messages into a queue that has been deprecated and has
       | no consumers listening anymore.
       | 
       | > Scalability / Elasticity: It is easy to increase the number of
       | nodes (or listeners per node) for a queue, thereby increasing
       | throughput, without any clients needing reconfiguration. This can
       | be done runtime, thus you get elasticity where the cluster grows
       | or shrinks based on the load, e.g. by checking the size of
       | queues.
       | 
       | Same as HA, solved with a load balancer.
       | 
       | > Transactionality: Each endpoint has either processed a message,
       | done its work (possibly including changing something in a
       | database), and sent a message, or none of it.
       | 
       | > Resiliency / Fault Tolerance: If a node goes down mid-way in
       | processing, the transactional aspect kicks in and rolls back the
       | processing, and another node picks up. Due to the automatic
       | retry-mechanism you get in a message based system, you also get
       | fault tolerance: If you get a temporary failure (database is
       | restarted, network is reconfigured), or you get a transient error
       | (e.g. a concurrency situation in the database), both the database
       | change and the message reception is rolled back, and the message
       | broker will retry the message.
       | 
       | These seem to be arguing the same point, and perhaps this is
       | solved in the Mats library but as a general advantage of async
       | message queues over synchronous REST calls, the message broker
       | retrying the message or messages being lost isn't a given -
       | they're difficult to get entirely right in both architectures.
       | 
       | > Monitoring: All messages pass by the Message Broker, and can be
       | logged and recorded, and made statistics on, to whatever degree
       | one wants.
       | 
       | >Debugging: The messages between different parts typically share
       | a common format (e.g. strings and JSON), and can be inspected
       | centrally on the Message Broker.
       | 
       | Centralising via an API gateway can also offer these.
        
         | stolsvik wrote:
         | Well. My point is that messaging _inherently_ have all these
         | features, without needing any other tooling.
         | 
         | The combination of transactionality and retrying is hard to
         | achieve with REST, don't you think? It is actually pretty
         | mesmerizing how our system handles screwups like a database
         | going down, or some nodes crashing, or pretty much any failure:
         | The flows might stop up for a few moments, but once things are
         | back in place, all the flows just complete as if nothing
         | happened. I shudder when thinking of how we would have handled
         | such failures if we used sync processing.
         | 
         | The one big deal is the concept of "state is on the wire": The
         | process/flow "lives in the message" - not as a transient
         | memory-bound concept on the stack of a thread.
        
       | ay wrote:
       | Makes me think of https://grugbrain.dev/:
       | 
       | Microservices
       | 
       | grug wonder why big brain take hardest problem, factoring system
       | correctly, and introduce network call too
       | 
       | seem very confusing to grug
        
       | eterps wrote:
       | "better than RPC" would be a more accurate title.
        
         | stolsvik wrote:
         | Well, I actually call Mats "Message-Oriented Async RPC".
        
       | eikenberry wrote:
       | Reminds me of the old Protocol vs. API debate.
       | 
       | http://wiki.c2.com/?ApiVsProtocol=
        
         | stolsvik wrote:
         | Isn't that more of the difference between JMS and AMQP?
        
       | weatherlight wrote:
       | _chuckles in erlang_
        
         | drkrab wrote:
         | Yep
        
       | weatherlight wrote:
       | "Virding's First Rule of Programming:
       | 
       | Any sufficiently complicated concurrent program in another
       | language contains an ad hoc informally-specified bug-ridden slow
       | implementation of half of Erlang." -- Robert Virding
        
       | nitwit005 wrote:
       | > Messaging naturally provides high availability, scalability,
       | location transparency, prioritization, stage transactionality,
       | fault tolerance, great monitoring, simple error handling, and
       | efficient and flexible resource management.
       | 
       | What is "stage transactionality"? If I do a Google search for it,
       | I just find this page.
        
         | stolsvik wrote:
         | Hehe, okay. It was meant to mean "Each stage is processed in a
         | transaction". Kinda hard to get down into a list. But my
         | wording evidently didn't make anything clearer!
         | 
         | If you read a few more pages, then it should hopefully become
         | clearer. This page is specifically talking about it:
         | https://mats3.io/using-mats/transactions-and-redelivery/ - but
         | as it is one of the primary points of why I made Mats, it is
         | mentioned multiple places, e.g. here:
         | https://mats3.io/background/what-is-mats/
         | 
         | This is not Mats-specific - it is directly using functionality
         | provided by the message broker, via JMS.
        
       ___________________________________________________________________
       (page generated 2023-02-12 23:00 UTC)