[HN Gopher] Async message-oriented architectures compared to syn... ___________________________________________________________________ Async message-oriented architectures compared to synchronous REST- based systems Author : stolsvik Score : 102 points Date : 2023-02-12 18:48 UTC (4 hours ago) (HTM) web link (mats3.io) (TXT) w3m dump (mats3.io) | mdaniel wrote: | Making your own license is the new JavaScript framework, I guess: | https://github.com/centiservice/mats3/blob/v0.19.4-2023-02-1... | Pet_Ant wrote: | Not open source. | | > Noncompete | | > Any purpose is a permitted purpose, except for providing to | others any product that competes with the software. | stolsvik wrote: | I do not say that it is _Open Source_ either. | | From the front page: "Free to use, source on github. | Noncompete licensed - PolyForm Perimeter." | | Feel free to comment on this: Is this a complete deal breaker | for all potential users? | mixedCase wrote: | Can't speak for all potential users, but the license is in | fact a complete deal-breaker for me and any client I've | worked with given the FOSS tools available in the | ecosystem. | | But then, there's also the "Java-only" which is a complete | deal-breaker in any client I've worked with doing | {micro,}services. | | Then there's the "what the hell does this actually do" | deal-breaker when trying to explain it to some decision | makers, and the "we already have queues and K8s to solve | all of those issues" deal-breaker when explaining it to | most fellow SWEs/SREs. | stolsvik wrote: | Hahaha, that's rough! :-) | | I'll tell you one thing: "What the hell does this | actually do?!" is extremely spot on! I am close to amazed | to how hard it is to explain this library. It really does | provide value, but it is evidently exceptionally hard to | explain. | | I first and foremost believe that this is due to the | massive prevalence of sync REST/RPC style coding, and | that messaging is only pulled up as a solution when you | get massive influxes of e.g. inbound reports - where you | actually want the _queue_ aspect of a message broker. Not | the async-ness. | | I've tried to lay this out multiple times, e.g. here: | https://mats3.io/docs/message-oriented-rpc/, and in the | link for this post itself. | lazyasciiart wrote: | That comment is quoting from the Polyform license. If it | doesn't represent your position, you may have made a bad | choice in license. | stolsvik wrote: | I was referring to the "not open source". I edited my | comment to be more specific. | jgilias wrote: | Your license would not be a dealbreaker for me in an SME | commercial setting. AGPL would be a dealbreaker. | stolsvik wrote: | Thanks a bunch! Seriously. And I agree that AGPL is | pretty harsh - I have a feeling that this is typically | used in a "try before you buy" situation, where there is | a commercial license on the side. | tinco wrote: | Well yeah of course, it's a direct contradiction of the | rising tide lifts all boats principle. Do you think | Kubernetes would have any traction at all if it had a | clause that it couldn't be used on AWS? | | If it can't be adopted by the industry as a whole, then it | can't be considered an industry standard. It wouldn't fly | at my organization anyway, not even looking up what | PolyForm is. | stavros wrote: | As far as I can see, this doesn't say it can't be used on | AWS, it only says Amazon can't launch its own service | that uses this software to compete with itself. It's too | short to really tell what "compete" entails, though. | stolsvik wrote: | You are correct, this is meant as an AWS/GCP/Azure | preventor. ElasticSearch situation. That is, AFAIU, the | intention of the license I adopted. The "examples" part | spells it pretty directly out, as I also try to do here: | https://centiservice.com/license/ | | You may definitely use it anywhere you like. | stavros wrote: | I'm really in favor of something like that. AWS using | your own FOSS software to choke your revenue stream is a | blight on FOSS, so good for you for using that license. | stolsvik wrote: | Thank you! | mejutoco wrote: | I see your point, and could not avoid thinking if | ElasticSearch would have more revenue if AWS could not | offer it directly. | mixedCase wrote: | Do you believe the fat ugly monster that is ElasticSearch | would've had anywhere near its current adoption rates if | it had a non-OSI license from the start? | | It would've been completely overshadowed by some other | Lucene-based wrapper or maybe some even better | alternative would've come along earlier. | indymike wrote: | I built on Elastic early on over Solr and several others | because it was open source and seemed to be better. I | would have selected a different Lucene wrapper if I had | known where Elastic was going. | mejutoco wrote: | Algolia does pretty well I believe. I could be wrong. | jagged-chisel wrote: | It's muddy/unknown enough that no one in a commercial | enterprise can entertain shipping a service using your | project. | marginalia_nu wrote: | Honestly I'm pretty annoyed by the "how dare you give the | source away under other terms than the ones I would | prefer"-type reactions that crop up from time to time. It's | an incredibly entitled attitude and is not a good look for | the open source community in general. | | Like by all means, share code with GPL or Apache or MIT or | whatever, but don't get mad when someone selects another | license, including non-free ones with weird | incompatibilities. | jagged-chisel wrote: | Those kinds of complaints are indeed entitled. At the | same time, there's no problem pointing out that fewer | people and organizations can select a dependency with an | unconventional, unknown license. | | You're welcome to license your projects however you see | fit. But when you get to a point that no one is using | your stuff, you have to be ready to hear "it's the | license." | lazyasciiart wrote: | Your comment is "how dare you complain about licensing". | What you are responding to is "huh, weird license, won't | use, that's a shame". | indymike wrote: | Yes, it is a deal breaker for me. | delusional wrote: | Well computing is all about redefining problems in terms of | other atoms. A messaging service is really just a series of | ALU operations and memory writes, which is then a series of | nand gates. | | It seems incredibly muddy to me what "competing" would mean | in that sense. If I make something with this it could be | argued that my system built on top of MATS is just | immaterial configuration that was intended to be done by | the user. That the authors intention was for the end user | to use MATS themselves, and that I'm therefore in | competition with the product. | | A non programming example would be hammers and houses. You | could imagine that if I build you a house, you'd be less | likely to need to buy a hammer (to build your own) making | my house competition for the hammer. | | I wouldn't touch this at all. | stolsvik wrote: | Not entirely my own: | https://polyformproject.org/licenses/perimeter/1.0.0/ | | https://centiservice.com/license/ | [deleted] | mrkeen wrote: | > Transactionality: Each endpoint has either processed a message, | done its work (possibly including changing something in a | database), and sent a message, or none of it. | | Sounds too good to be true. Would love to hear more. | stolsvik wrote: | Well, okay, you're right! It is _nearly_ true, though. :) I 've | written a bit about it here: https://mats3.io/using- | mats/transactions-and-redelivery/ | revskill wrote: | In our production apps, all network issues is resolved by simple | rate-limiter. | guhcampos wrote: | Oh yes, another "this thing I sell is an actual silver bullet" | post. | | Message busses are great. RPC is too. There are use cases for | both. Saying one is "better" than the other is silly, and in this | case, a shame. | | There are loads of message passing libraries out there, based on | all kinds of backend, from RabbitMQ, to NATS, to Redis, to Kafka. | This does not innovate over anything, it's just shameless | marketing. | stolsvik wrote: | This is unfair. I made Mats so that I could use messaging in a | simpler form. Nothing else. | | Mats is an API that can be implemented on top of any _queue- | based_ message broker - which excludes Kafka. But it definitely | includes ApacheMQ (which is what we use), Artemis and hence | RedHat 's MQ (which the tests runs), and RabbitMQ (whose JMS | implementation is too limited to directly be used, but I do | hope to implement Mats on top of it at some point). Probably | also NATS. Probably also Apache Pulsar, which I just recently | realized have a JMS client. | | You could even implement it on top of ZeroMQ, and implement it | on top of any database, particularly Postgres since it has | those "queue extensions" NOTIFY and SKIP LOCKED. | | edit: I actually have an feature-issue exploring such an | implementation: https://github.com/centiservice/mats3/issues/15 | latchkey wrote: | > _ApacheMQ_ | | I hope ActiveMQ Artemis is better than the 'classic' version | and that is what you're using. The last time I used it | probably a decade ago now, there were so many issues with it, | that it was a complete train wreck at scale. I would be very | hesitant to pick that one up again. | jeffbee wrote: | If you pretend that your message bus has zero producer impedance | and costs nothing then this analysis makes great sense. If you | have ever operated or paid for this type of scheme in the real | world then you will have some doubts. | stolsvik wrote: | I guess you'd say the same about cloud functions and lambdas, | then? To which I agree. | | Paying per message would require the message cost to be pretty | small. Might want to evaluate setting up a broker yourself if | the cost starts getting high. | robertlagrant wrote: | Having done a reasonable amount of messaging code in my time, I | would say the final form of this sort of thing might look more | like Cadence[0] than anything like this. | | [0] https://github.com/uber/cadence | stolsvik wrote: | Cadence is a workflow management system. As is Temporal, Apache | Beam, Airbnb Airflow, Netflix Conductor, Spotify Luigi, and | even things like Github Actions, Google Cloud Workflows, Azure | Service Fabric, AWS SWF, Power Automate. | | A primary difference is that those are _external systems_ , | where you define the flows inside that system - the system then | "calling out" to get pieces of the flow done. | | Mats is an "internal" system: You code your flows inside the | service. It is meant to directly replace synchronously calling | out to REST services, instead enabling async messaging but with | the added bonus of being _as simple as_ using REST services. | | But yes, I see the point. | MuffinFlavored wrote: | Is GitHub Actions really similar enough to Temporal/Cadence | to be included in the list? | stolsvik wrote: | Hmm. Maybe not. But they sure have much in common: You | define a set of things that should be done, triggered by | something - either a schedule, an event (oftentimes a | repository event, but it doesn't have to), or from another | Github action. | eBombzor wrote: | Why is this better over Kafka? | stolsvik wrote: | As far as I understand, Kafka is positioning itself to be the | leading _Event Sourcing_ solution. | | I view event sourcing to be fundamentally different from | message passing. For a long time I tried to love event | sourcing, but I see way to many problems with it. The primary | problem I see is that you then end up with a massive source of | events, which any service can subscribe to as they see fit. How | is this different from having one gigantic spaghetti database? | Also, event migrations over time. | | RPC and messaging feels to me to be much clearer separated: I | own the Accounts, and you own the Orders. We explicitly | communicate when we need to. | | I see benefits on both sides, but have firmly landed on _not_ | event sourcing. | hbrn wrote: | 1. Anything that is connected to user interface should be | synchronous by default. | | 2. You can't predict which parts of your system will be connected | to user interface. | | 3. Here's the worst part: _async messaging is viral_. A service | that depends on async service becomes async too. | | You should be very cautions introducing async messaging to your | systems. The only parts that should be allowed to be async are | the ones that can afford to fail. | | I spend good amount of time trying to work around these dumb | enterprise patterns when building products on top of async APIs. | You are literally forced to build inferior products just because | someone thought that async messaging is so great. It's great for | everybody, _except the final user_. | | Async processing is not a virtue, it's a necessity for high | load/high throughput systems. | | The reason SOA failed many years ago is precisely the async | message bus. | stolsvik wrote: | We clearly do not agree. | | Wrt. sync processing when using Mats: | https://mats3.io/docs/sync-async-bridge/ | | But my better solution is instead to pull the async-ness all | the way out to the client: https://matssocket.io/ | | Also, I have another take on the SOA failure, mentioned here: | https://mats3.io/about/ | | It was definitely not because of async, at least as I remember | it. | SpaghettiX wrote: | I appreciate some events can be asynchronous for clients, for | example: actions taken by other users, or events generated by | the system. However, I do think implementation details (using | async in the server) should be encapsulated from clients: | when users save a new document, it's much easier for the | client to receive a useful albeit delayed response, rather | than "event submitted", wait for the result on a stream. Of | course, other relevant clients may need to hear about that | event too. The service architecture should not affect / make- | life-harder for clients. | | Therefore I think disagree with both parent and grandparent | comments. Use each when they make sense, not "synchronous by | default" (grandparent comment, though I do think there are | good points made), or "asynchronous based on service | architecture" (parent comment). | | > But my better solution is to pull the async-ness all the | way out to the client: https://matssocket.io/ | | Is that a solution that you use? I took a look at matssocket | https://www.npmjs.com/package/matssocket, it currently has 2 | weekly downloads. :thinking:. | stolsvik wrote: | To make a point out of it: This is not _event based_ in the | event sourcing way of thinking. It is using messages. You | put a message on a queue, someone else picks it up. Mats | implements a request /reply paradigm on top ("messaging | with a call stack"). | | In the interactive, synchronous situation, you do not "wait | for an event" per se. You wait for a specific reply. When | using the MatsFuturizer (https://mats3.io/docs/sync-async- | bridge/), it is _extremely_ close to how you would have | used a HttpClient or somesuch. | | MatsSocket: The Dart/Flutter implementation is used in a | production mobile app. For the Norwegian market only, | though. | | The JS implementation is used in an internal solution. | | Would have been really nice with a bit more usage, yes. It | is actually pretty nice, IMHO! ;-) | toast0 wrote: | > Async processing is not a virtue, it's a necessity for high | load/high throughput systems. | | > 1. Anything that is connected to user interface should be | synchronous by default. | | If everything UI is synchronous, you prevent users from | acheiving high throughput. Sometimes that's fine, but sometimes | it's not. | | It's simple to wait for a response to a request sent via | asynchronous messaging. It's not simple to split a synchronous | API into send and receive parts. However, REST is HTTP and | there's lots of async HTTP libraries out there. | samsquire wrote: | Thanks for this. | | I love the idea of breaking up a flow into separately scheduled | but still linear message flow. | | I wrote about a similar idea in ideas2 | | https://github.com/samsquire/ideas2#84-communication-code-sl... | | The idea is that I enrich my code with comments and a transpiler | schedules different parts of the code to different machines and | inserts communication between blocks. | | I read about how Zookeeper algorithm for transactionality and | robustness to messages being dropped, which is interesting | reading. | | https://zookeeper.apache.org/doc/r3.4.13/zookeeperInternals.... | | How does Mats compare? | | LMAX disruptor has a pattern where you split up each side of an | IO request into two events, to avoid blocking in an handler. So | you would always insert a new event to handle an IO response. | derefr wrote: | > Back-pressure (e.g. slowing down the entry-points) can easily | be introduced if queues becomes too large. | | ...which presumably includes load-shedding to stop misbehaving | components from overloading the queues; at which point, unless | you want clients to just lose track of the things they wanted | done when they get a "we're too busy to handle this right now" | response, you've essentially circled back around to clients | having to use a client with REST-like "synchronous/blocking | requests with retry/backpressure" semantics -- just where the | requests that are being synchronously-blocked on are "register | this as a work-item and give me an ID to check on its status" | rather than "do this entire job and tell me the result." | | And if you're doing that, why force the client to think in terms | of async messaging at all? Just let them do REST, and hide the | queue under the API layer of the receiver. | Supermancho wrote: | > you've essentially circled back around to clients having to | use a client with REST-like "synchronous/blocking requests with | retry/backpressure" semantics | | Yes, they both do the same thing. That's not even the starting | point of the discussion. The implementation from HTTP to a | message queue (mailbox system) is the discussion point. | | Having the caller (who needs work done) wait to be informed | when the work is done (or not done) is less deterministic than | telling the callee how long before the work doesn't matter | anymore. The callee gives back a transaction ID/is provided a | callerID or is unavailable, and the caller knows (very quickly) | it's not going to get done or knows where to look for the work | (or abandon it). Either way, it allows for optimization on both | sides. | tass wrote: | This is where I always end up. You can have queues which give | you certain benefits, but there'a a lot of stuff to be built on | top to make it as operationally simple as http. | stolsvik wrote: | I will argue that this simplicity is exactly what Mats | provides. At least that is the intention. | revskill wrote: | I don't see the code on webpage to explain things. | Simplicity means you can explain complex things with simple | code. | | Because English is ambigous and subjective. Just use code ? | stolsvik wrote: | There is code here: https://mats3.io/docs/message- | oriented-rpc/ .. and here: https://mats3.io/docs/mats- | flow-initiation/ .. and here: https://mats3.io/docs/sync- | async-bridge/ .. and here: | https://mats3.io/docs/springconfig/ .. and here: | https://mats3.io/background/what-is-mats/ .. and here: | https://mats3.io/using-mats/endpoints-and-initiations/ | | .. and on the github page here: | https://github.com/centiservice/mats3/blob/main/README.md | | .. and you are advised to explore the code here: | https://mats3.io/docs/explore/ | cerved wrote: | yes but there's sadly no code in what you posted | charrondev wrote: | The system I'm currently on is currently moving a lot of work | into queues. Some operations, like "change the criteria of this | rank" could be anywhere between 5 seconds (if the number of | users of the criteria to evaluate are small) or 10+ hours if we | need re-evaluate the rules against 10m+ users. | | In this case we write our jobs as generators that can be | paused, serialized and picked up again later. We give the job 5 | seconds synchronously, then if it passes that time, queue and | job and let the client know a job has been registered. | | The users account holds the IDs of the jobs as well as some | basic information about the the tasks they have queued. There | is a rest endpoint to return the current status of the jobs and | information about them (what are they doing, what's their | progress, how much work remains). | | The client will negotiate a web socket connection with a | different service to be notified whenever progress is made on | the job and the client can then check the endpoint for the | latest status. | latchkey wrote: | That 5 seconds is going to bite you. | | There is going to be some sort of stall in the future that | causes all of your jobs to hit that 5 seconds and everything | is going to start to back up and cause other problems up the | line that are really hard to test for in advance. | | You're better off designing a system that doesn't rely on | some arbitrary number of seconds (why not 4 or 6 seconds?) to | begin with. | naasking wrote: | Yes, non-determinism is the bane of distributed systems. It | should be minimized whenever possible. | naasking wrote: | > And if you're doing that, why force the client to think in | terms of async messaging at all? Just let them do REST, and | hide the queue under the API layer of the receiver. | | Yes, exactly. And on top of that, async messaging implicitly | introduces DoS vulnerabilities exactly because of the buffering | required. At least with sync messaging exposing a queue in the | API layer, you opt-into this vulnerability. | stolsvik wrote: | As mentioned here: https://mats3.io/background/system-of- | services/ | | .. Mats is meant to be an inter-service communication | solution. | | It is explicitly _not_ meant to be your front-facing | endpoints. If you are DoS 'ed, it would be from your own | services. Of course, that might still happen, but then things | would not have been much better if you used sync comms. | | It is true that you can bridge from sync to the async world | of Mats using the MatsFuturizer (https://mats3.io/docs/sync- | async-bridge/), but then you still have your e.g. Servlet | Container as the front-facing entity. | | (Also check out https://matssocket.io/, though) | stolsvik wrote: | Well, yes - there is nothing with Mats that you cannot do with | any other communication form, if you code it up. When you say | "register this as a work-item and give me an ID to check on its | status", you've implemented a queue, right? | | The intention is that Mats gives you an easy way to perform | async message-oriented communications. Somewhat of a bonus, you | can also use it for synchronous tasks, using the MatsFuturizer, | or MatsSocket. A queue can handle transient peaks of load much | better than direct synchronous code. It is also quite simple to | scale out. But if you do get into problems of getting too much | traffic for the system to process, you will have to handle that | - and Mats does not currently have any magic for performing | e.g. load shedding, so you're on your own. (I have several | thoughts on this. E.g. monitor the queue sizes, and deny any | further initiations if the queues are too large). | | Wrt. synchronous comms, Mats do provide a nice feature, where | you can mark a Mats Flow as "interactive", meaning that some | human is waiting for the result. This results in the flow | getting priority on every stage it passes through - so that if | it competes with internal, more batchy processes, it will cut | the lines. | derefr wrote: | > A queue can handle transient peaks of load much better than | direct synchronous code. | | Whether a workload is being managed upon creation using a | work queue within the backend, has nothing to do with the | semantics of the communications protocol used to talk about | the state of said workload. You can arbitrarily combine these | -- for example, DBMSes have the unusual combination of having | a stateful connection-oriented protocol for scheduling | blocking workloads, but also having the ability to introspect | the state of those ongoing workloads with queries on other | connections. | | My point is that clients in a distributed system can | literally never do "fire and forget" messaging _anyway_ -- | which is the supposed advantage of an "asynchronous message- | oriented communications" protocol over a REST-like one. Any | client built to do "fire and forget" messaging, when used at | scale, always, always ends up needing some sort of outbox- | queue abstraction, where the outbox controller is internally | doing synchronous blocking retries of RPC calls to get an | acknowledgement that a message got safely pushed into the | queue and can be locally forgotten. | | And that "outbox" is a _leaky abstraction_ , because in | trying to expose "fire and forget" semantics to its caller, | it has no way of imposing backpressure on its caller. So the | client's outbox overflows. Every time. | | This is why Google famously switched every internal protocol | they use _away_ from using message queues /busses with | asynchronous "fire and forget" messaging, _toward_ | synchronous blocking RPC calls between services. With an | explicitly-synchronous workload-submission protocol (which | may as well just be over a request-oriented protocol like | HTTP, as gRPC is), all operational errors and backpressure | get bubbled back up from the workload-submission client | library to its caller, where the caller can then have logic | to decide the business-logic-level response that is most | appropriate, for each particular fault, in each particular | calling context. | | Message queues are the quintessential "smart pipe", trying to | make the network handle all problems itself, so that the | nodes (clients and backends) connected via such a network can | be naive to some operational concerns. But this will never | truly solve the problems it sets out to solve, as the _policy | knowledge_ to properly drive the decision-making for the | _mechanism_ that handles operational exigencies in message- | handling, isn 't available "within the network"; it lives | only at the edges, in the client and backend application code | of each service. Those exigencies -- those failures and edge- | case states -- must be pushed out to the client or backend, | so that policy can be applied. And if you're doing that, you | may as well move the mechanism to enforce the policy there, | too. At which point you're back to a dumb pipe, with smart | nodes. | jgilias wrote: | Is there something I can read about Google switching to | sync RPC? Like a blog post or something like that? | | Thanks! | stolsvik wrote: | "Not everybody is Google" | | These concepts has worked surprisingly well for us for | nearly a decade. We're not Google-sized, but this | architecture should work well for a few more orders of | magnitude traffic. | | Also, you can mix and match. If you have some parts of your | system with absolutely massive traffic, then don't use this | there, then. | | Note that we very seldom use "fire and forget" (aka | "send(..)"). We use the request-replyTo paradigm much more. | Which is basically the basic premise of Mats, as an | abstraction over pure "forward-only" messaging. | derefr wrote: | > Note that very we use "fire and forget" very seldom | (aka "send(..)"). We use the request-replyTo paradigm | much more. Which is basically the basic premise of Mats, | as an abstraction over pure "forward-only" messaging. | | That doesn't help one bit. You're still firing-and- | forgetting the request itself. The reply (presumably with | a timeout) ensures that the client doesn't sit around | forever waiting for a lost message; but it does nothing | prevent badly-written request logic from overloading your | backend (or overloading the queue, or "bunging up" the | queue such that it'll be ~forever before your backend | finishes handling the request spike and gets back to | processing normal workloads.) | | > If you have some parts of your system with absolutely | massive traffic, then don't use this there, then. | | I'm not talking about massive _intended_ traffic. These | problems come from _failures in the architecture of the | system to inherently bound requests to the current scale | of the system_ (where autoscaling changes the "current | scale of the system" before such limits kick in.) | | So, for example, there might be an endpoint in your | system that allows the caller to trigger logic that does | O(MN) work (the controller for that endpoint calls | service X O(M) times, and then for each response from X, | calls service Y O(N) times); where it's fully expected | that this endpoint takes 60+ seconds to return a | response. The endpoint was designed to serve the need of | some existing internal team, who calls it for reporting | once per day, with a batch-size N=2. But, unexpectedly, a | new team, building a new component, with a new use-case | for the same endpoint, writes logic that begins calling | the endpoint once every 20 seconds, with a batch-size of | 20. Now the queues for the services X and Y called by | this endpoint are filling faster than they're emptying. | | No DDoS is happening; the requests are quite small, and | in networking terms, quite sparse. Everything is working | as intended -- and yet it'll all fall over, because | you've chosen yourself into a protocol where there's no | _inherent, by-default_ mechanism for "the backend is | overloaded" to apply backpressure to make _new requests | from the frontend_ stop coming (as it would in a | synchronous RPC protocol, where 1. you can 't submit a | request on an open socket when it's in the "waiting for | reply" state; and 2. you can't get a new open socket if | the backend isn't calling accept(2)); and you didn't | think that this endpoint would be one that gets called | much, so you didn't bother to think about explicitly | implementing such a mechanism. | stolsvik wrote: | Relying on the e.g. Servlet Container not being able to | handle requests seems rather bad to me. That is a very | rough error handling. | | We seem to have come to the exact opposite conclusions | wrt. this. Your explanations are entirely in line with | mine, but I found this "messy" error handling to be | exactly what I wanted to avoid. | | There is one particular point where we might not be in | line: I made Mats first and formost _not_ for the | synchronus situation, where there is a user waiting. This | is the "bonus" part, where you can actually do that with | the MatsFuturizer, or the MatsSocket. | | I first and foremost made it for internal, batch-like | processes like "we got a new price (NAV) for this fund, | we now need to settle these 5000 waiting orders". In that | case, the work is bounded, and an error situation with | not-enough-threads would be extremely messy. Queues | solves this 100%. | | I've written some about my thinking on the About page: | https://mats3.io/about/ | derefr wrote: | > Relying on the e.g. Servlet Container not being able to | handle requests seems rather bad to me. That is a very | rough error handling. | | It's one of those situations where the simplest "what you | get by accident with a single-threaded non-evented | server" solution, and the most fancy-and-complex | solution, actually look alike from a client's | perspective. | | What you actually want is that each of your backends | monitors its own resource usage, and flags itself as | unhealthy in its readiness-check endpoint when it's | approaching its known per-backend maximum resource | capacity along any particular dimension -- threads, | memory usage, DB pool checked-out connections, etc. | (Which can be measured quite predictably, because you're | very likely running these backends in containers or VMs | that enforce bounds on these resources, and then scaling | the resulting predictable-consumption workload-runners | horizontally.) This readiness-check failure then causes | the backend to be removed from consideration as an | upstream for your load-balancer / routing target for your | k8s Service / etc; but existing connected flows continue | to flow, gradually draining the resource consumption on | that backend, until it's low enough that the backend | begins reporting itself as healthy again. | | Meanwhile, if the load-balancer gets a request and finds | that it currently has _no_ ready upstreams it can route | to (because they 're all unhealthy, because they're all | at capacity) -- then it responds with a 503. Just as if | all those upstreams had crashed. | | > Your explanations are entirely in line with mine, but I | found this "messy" error handling to be exactly what I | wanted to avoid. | | Well, yes, but that's my point made above: this error | handling is "messy" precisely because it's an _encoding | of user intent_. It 's irreducible complexity, because | it's something where you want to make the decision of | what to do differently in each case -- e.g. a call from A | to X might consider the X response critical (and so | failures should be backoff-retried, and if retries | exceeded, the whole job failed and rescheduled for | later); while a call from B to X might consider the X | response only a nice-to-have optimization over | calculating the same data itself, and so it can try once, | give up, and keep going. | | > I made Mats first and formost not for the synchronus | situation, where there is a user waiting. | | I said nothing about users-as-in-humans. We're presumably | both talking about a Service-Oriented Architecture here; | perhaps even a microservice-oriented architecture. The | "users" of Service X, above, are Service A and Service B. | There's a Service X client library, that both Service A | and Service B import, and make calls to Service X | through. But these are still, necessarily, _synchronous_ | requests, since the further computations of Services A | and B are _dependent on_ the response from Service X. | | Sure, you can queue the requests to Services A and B as | long as you like; but _once they 're running_, they're | going to sit around waiting on the response from Service | X (because they have nothing better to be doing while the | Service X response-promise resolves.) Whether or not the | Service X request is synchronous or asynchronous doesn't | matter to them; they have a synchronous (though not | timely) _need_ for the data, within their own | asynchronous execution. | | Is this not the common pattern you see for inter-service | requests within your own architecture? If not, then what | is? | | If what you're really talking about here is forward-only | propagation of values -- i.e. never needing _a response_ | (timely or not) from most of the messages you send in the | first place -- then you 're not really talking about a | messaging protocol. You're talking about a dataflow | programming model, and/or a distributed CQRS/ES event | store -- both of which can and often are implemented on | top of message queues to great effect, and neither of | which purport to be sensible to use to build RPC request- | response code on top of. | stolsvik wrote: | To your latter part: This is exactly the point: Using | messages and queues makes the flows take whatever time it | takes. Settling of the mentioned orders are not time | critical - well, at least not in the way a request from a | user sitting on his phone logging in to see his holdings | is time critical. So therefore, if it takes 1 second, or | 1 hour, doesn't matter all that much. | | The big point is that _none_ of the flows will fail. They | will all pass through _as fast as possible_ , literally, | and will never experience any failure mode resulting from | randomly exhausted resources. You do not need to make any | precautions for this - backpressure, failure handling, | retries - as it is inherent in how a messaging-based | system works. | | Also, if a user logs into the system, and one of the | login-flows need a same service as the settling flows, | then that flow will "cut the line" since they are marked | "interactive". | bcrosby95 wrote: | RPC works for just non Google and Google scale. This is | one of the times where, IMHO, you can skip the middle | section. Novices resort to RPC, Google resorts to RPC, | and in the mid tier you have something where messaging | can step in. | | Why not skip it? Use RPC like a novice. If it becomes | problematic, start putting in compensating measures. | peoplefromibiza wrote: | > And if you're doing that, why force the client to think in | terms of async messaging at all? Just let them do REST, and | hide the queue under the API layer of the receiver. | | because REST is stupid | | REST is request -> response, single connection, one direction | only, which is a very limited way to model messaging . | | There is more than one communication mode and bidirectional | messaging is a thing. | | REST also offers no control whatsoever over the communication | channel, so you are stuck with the configuration set on the | server side | | which might or might not be correct for your use case | | See RSocket for an example of a message driven protocol which | solves most of the shortcomings of REST | | on the bright side REST is also stupid simple | | which is why is so widely deployed, it doesn't require thinking | | > response, you've essentially circled back around to clients | having to use a client with REST-like | | no, because you did not block there waiting for the timeout | which defaults to 30 seconds for HTTP | | and even if you abandon on the client side, the server will | still process the request, there's no way to abort it once it's | been started. | naasking wrote: | > REST is request -> response, single connection, one | direction only, which is a very limited way to model | messaging | | You seem to be implying that being limited is a bad thing. | Constraints are important to keep problems tractable. | klabb3 wrote: | In my experience the amount of serialized (network | blocking) calls needed under the request reply paradigm | always grows over time as the application gets larger. | | At least this limitation can cause massive complexity once | perf optimizations are needed. I think that's important to | factor in when we're talking about the issues with large | and resilient systems in either paradigm. | | Personally I like message passing because it's more true to | the underlying protocol (TCP or UDP) and actually interops | quite well (all things considered) with request-reply | systems - it just requires two separate messages and a | request id which is standard practice in request-response | anyway. The inverse is not true though: we have like 10 | different Jacky solutions in the last decade for sending | server initiated messages to clients. | rkangel wrote: | If I understand this right, this is basically the Erlang/Elixir | OTP programming model, but across microservices rather than | across a single (potentially distributed) VM. To be clear - that | is a _good_ thing. | | One of the core concepts of OTP (effectively the Erlang standard | library) is the GenServer. A GenServer processes incoming | messages, mutates state if appropriate and sends responses. The | OTP machinery means that this "send a message and wait for a | response" is just a straight function call with return value to | the caller. OTP takes care of all the edge cases (like when the | process at the other end goes away half way through). This means | that your code is just a straight series of synchronous function | calls, which may be sending messages underneath to do things or | get data, but you don't have to care. It's a lovely system to | work in, and makes complicated systems feel simple. | | The elements communicating are in Erlang terminology 'processes' | - but not OS processes, they are instead lightweight userspace | schduled things - very lightweight to create. Erlang has built in | distribution that allows you to connect multiple running | machines, and then the same message passing works across network | boundaries. You're still limited to the BEAM VM though. This is | the 'full' microservice version of that. | lp4vn wrote: | I think this article is kind of misleading. | | You use messaging for asynchronous communication and REST for | synchronous communication. The article makes me believe that | using REST for synchronous communication is a kind of deprecated | alternative in front of message passing. | zmmmmm wrote: | This reminds me more of Apache Camel[0] than other things it's | being compared to. | | > The process initiator puts a message on a queue, and another | processor picks that up (probably on a different service, on a | different host, and in different code base) - does some | processing, and puts its (intermediate) result on another queue | | This is almost exactly the definition of message routing (ie: | Camel). | | I'm a bit doubtful about the pitch because the solution is | presented as enabling you to maintain synchronous style | programming while achieving benefits of async processing. This | just isn't true, these are fundamental tradeoffs. If you need a | synchronous answer back then no amount of queuing, routing, | prioritisation, etc etc will save you when the fundamental | resource providing that is unavailable, and the ultimate outcome | that your synchronous client now hangs indefinitely waiting for a | reply message instead of erroring hard and fast is not desirable | at all. If you go into this ad hoc, and build in a leaky | abstraction that asynchronous things are are actually synchronous | and vice versa, before you know it you are going to have unstable | behaviour or even worse, deadlocks all over your system and the | worst part - the true state of the system is now hidden in which | messages are pending in transient message queues everywhere. | | What really matters here is to fundamentally design things from | the start with patterns that allow you to be very explicit about | what needs to be synchronous vs async (building on principles of | idempotency, immutability, coherence, to maximise the cases where | async is the answer). | | The notion of Apache Camel is to make all these decisions a first | class elements of your framework and then to extract out the | routing layer as a dedicated construct. The fact it generalises | beyond message queues (treating literally anything that can | provide a piece of data as a message provider) is a bonus. | | [0] https://camel.apache.org/ | hummus_bae wrote: | > The ultimate outcome that your synchronous client now hangs | indefinitely waiting for a reply message instead of erroring | hard and fast is not desirable at all. | | Async frameworks don't eliminate the possibility of long | running processes, that continue to process long after | responding a request - this is still possible with specific | libraries/frameworks, they'll only take away the synchronous | interface and provide as asychonous one instead. | | It is also important to note that error handling will be | different between these 2 paradigms and it's important, | whatever the most suitable one is, to acknowledge this since it | forces us (developers) to handle the potential errors | differently depending on the approach we choose. | zmmmmm wrote: | I think you're stating exactly my point? | | The pitch of MATS is that it let's: | | > developers code message-based endpoints that themselves may | "invoke" other such endpoints, in a manner that closely | resembles the familiar synchronous "straight down" linear | code style | | In other words, they want to encourage you to feel like you | are coding a synchronous workflow while actually coding an | asynchronous one. You are pointing out that error handling | needs to be different b/w these paradigms and you are | correct, but that is only the start of it. A framework that | papers over the differences is at very high risk of just | creating a massive number of leaky abstractions that don't | show up in the happy scenario but come back and bite you | heavily when things go wrong. | | (I'm saying this as a long time user of Camel which models | this exact concept heavily and also experiences many of these | issues) | stolsvik wrote: | Hmm. I want to distance this library pretty far from Camel! | | Wrt. "papering over": Not really. I make it "feel like" | you're coding straight down, sequential, linear, _as if_ | you 're coding synchronously. | | But if you look at the examples, e.g. at the very start of | the Walkthrough: https://mats3.io/docs/message-oriented- | rpc/, you'll understand that you are actually coding | completely message-driven: Each stage is a completely | separate little "server", picking up messages from one | queue, and most often putting a new message onto another | queue. | | It is true that the error handling is very different. Don't | code errors! You cannot throw an exception back to the | caller. You can however make for "error return" style DTOs, | but otherwise, if you have an actual error, it'll "pop out" | of the _Mats Fabric_ and end up on a DLQ. This is nice! It | is not just a WARN or ERROR log-line in some log that no- | one will see until way later, if ever: It immediately | demands your attention. | | I wrote quite a long answer to something similar on a | Reddit-thread a month ago: https://www.reddit.com/r/program | ming/comments/1059jpv/messag... | zmmmmm wrote: | > Hmm. I want to distance this library pretty far from | Camel! | | I'm curious what makes you distinguish it heavily from | Camel? From everything you say it sounds to me like you | are building Camel - or at least the routing part of it | :-) | ngrilly wrote: | How is it different from NATS.io that solves most of the problems | listed? (except the transactional aspect but I'm not convinced | it's a good thing to have the same tool do everything) | stolsvik wrote: | I see references to NATS multiple times, but I fail to see how | it solves what Mats aims to solve? | | Mats could be implemented on top of NATS, i.e. use NATS as a | backend, instead of JMS. (We use ActiveMQ as the broker) | adamckay wrote: | The articles notes about async messaging architectures being | superior to REST-based systems seems rather disingenuous, in my | opinion, as it's seemingly only considering the most basic REST | API deployed on a single node as the alternative. | | For example: | | > High Availability: For each queue, you can have listeners on | several service instances on different physical servers, so that | if one service instance or one server goes down, the others are | still handling messages. | | This is negated in a REST-based system with the use of an API | gateway / simple load balancer and multiple upstream nodes. | | > Location Transparency [Elastic systems need to be adaptive and | continuously react to changes in demand, they need to gracefully | and efficiently increase and decrease scale.]: Service Location | Discovery is avoided, as messages only targets the logical queue | name, without needing information about which nodes are currently | consuming from that queue. | | Fair enough, service discovery is another challenge, but it's not | hugely complex with modern API gateways and arguably no more | complex with running and maintaining a message queue with | associated workers. You've also got a risk in a distributed | messaging system used by multiple teams that one service | publishes messages into a queue that has been deprecated and has | no consumers listening anymore. | | > Scalability / Elasticity: It is easy to increase the number of | nodes (or listeners per node) for a queue, thereby increasing | throughput, without any clients needing reconfiguration. This can | be done runtime, thus you get elasticity where the cluster grows | or shrinks based on the load, e.g. by checking the size of | queues. | | Same as HA, solved with a load balancer. | | > Transactionality: Each endpoint has either processed a message, | done its work (possibly including changing something in a | database), and sent a message, or none of it. | | > Resiliency / Fault Tolerance: If a node goes down mid-way in | processing, the transactional aspect kicks in and rolls back the | processing, and another node picks up. Due to the automatic | retry-mechanism you get in a message based system, you also get | fault tolerance: If you get a temporary failure (database is | restarted, network is reconfigured), or you get a transient error | (e.g. a concurrency situation in the database), both the database | change and the message reception is rolled back, and the message | broker will retry the message. | | These seem to be arguing the same point, and perhaps this is | solved in the Mats library but as a general advantage of async | message queues over synchronous REST calls, the message broker | retrying the message or messages being lost isn't a given - | they're difficult to get entirely right in both architectures. | | > Monitoring: All messages pass by the Message Broker, and can be | logged and recorded, and made statistics on, to whatever degree | one wants. | | >Debugging: The messages between different parts typically share | a common format (e.g. strings and JSON), and can be inspected | centrally on the Message Broker. | | Centralising via an API gateway can also offer these. | stolsvik wrote: | Well. My point is that messaging _inherently_ have all these | features, without needing any other tooling. | | The combination of transactionality and retrying is hard to | achieve with REST, don't you think? It is actually pretty | mesmerizing how our system handles screwups like a database | going down, or some nodes crashing, or pretty much any failure: | The flows might stop up for a few moments, but once things are | back in place, all the flows just complete as if nothing | happened. I shudder when thinking of how we would have handled | such failures if we used sync processing. | | The one big deal is the concept of "state is on the wire": The | process/flow "lives in the message" - not as a transient | memory-bound concept on the stack of a thread. | ay wrote: | Makes me think of https://grugbrain.dev/: | | Microservices | | grug wonder why big brain take hardest problem, factoring system | correctly, and introduce network call too | | seem very confusing to grug | eterps wrote: | "better than RPC" would be a more accurate title. | stolsvik wrote: | Well, I actually call Mats "Message-Oriented Async RPC". | eikenberry wrote: | Reminds me of the old Protocol vs. API debate. | | http://wiki.c2.com/?ApiVsProtocol= | stolsvik wrote: | Isn't that more of the difference between JMS and AMQP? | weatherlight wrote: | _chuckles in erlang_ | drkrab wrote: | Yep | weatherlight wrote: | "Virding's First Rule of Programming: | | Any sufficiently complicated concurrent program in another | language contains an ad hoc informally-specified bug-ridden slow | implementation of half of Erlang." -- Robert Virding | nitwit005 wrote: | > Messaging naturally provides high availability, scalability, | location transparency, prioritization, stage transactionality, | fault tolerance, great monitoring, simple error handling, and | efficient and flexible resource management. | | What is "stage transactionality"? If I do a Google search for it, | I just find this page. | stolsvik wrote: | Hehe, okay. It was meant to mean "Each stage is processed in a | transaction". Kinda hard to get down into a list. But my | wording evidently didn't make anything clearer! | | If you read a few more pages, then it should hopefully become | clearer. This page is specifically talking about it: | https://mats3.io/using-mats/transactions-and-redelivery/ - but | as it is one of the primary points of why I made Mats, it is | mentioned multiple places, e.g. here: | https://mats3.io/background/what-is-mats/ | | This is not Mats-specific - it is directly using functionality | provided by the message broker, via JMS. ___________________________________________________________________ (page generated 2023-02-12 23:00 UTC)