[HN Gopher] RabbitMQ vs. Kafka - An Architect's Dilemma (Part 1) ___________________________________________________________________ RabbitMQ vs. Kafka - An Architect's Dilemma (Part 1) Author : gslin Score : 78 points Date : 2023-09-19 18:54 UTC (4 hours ago) (HTM) web link (eranstiller.com) (TXT) w3m dump (eranstiller.com) | dividedbyzero wrote: | NATS (https://nats.io/) is another option, though I'm not sure if | it's still considered a viable Kafka replacement. | fuzzy2 wrote: | NATS is something else, but it's awesome. It has awesome | throughput and latency out of the box (without Jetstream), | while using little resources. | | I'd recommend considering it, especially as an alternative to | RabbitMQ. | speedgoose wrote: | I only tested NATS using JetStream and I struggled with the | throughput in Python. I probably used it wrong. But your | comment may imply that jetstream is slow. | to11mtm wrote: | I think sometimes the client bindings are/were in need of | improvement. | | As an example, the C# API was originally very 'go-like' and | written to .NET Framework, didn't take advantage of a lot | of newer features... to the point a 3rd party client was | able to get somewhere between 3-4x the throughput. This is | now finally being rectified with a new C# client, however | it wouldn't surprise me if other languages have similar | pains. | | I haven't tested JetStream but my general understanding is | that you do have to be mindful of the different options for | publishing; especially in the case of JetStream, a | synchronous publish call can be relatively time consuming; | it's better to async publish and (ab)use the future for | flow control as needed. | KaiserPro wrote: | we used it for some low latency stuff in python. it was | about 10ms to enqueue at worst. However we were using raw | NATS, and had a clear SLA that meant that the queue was | allowed to be offline or messages lost, so long as we | notified the right services. | klabb3 wrote: | It's true FOSS, and the server is standalone Go binary that's | so small it can even be embedded. Lots of language bindings for | clients. Has persistence, durability, and nicely aligns into a | raft-like cluster in a DC without a separate orchestrator. | | I'm a big fan - never understood why it's not at the top of the | list in these tech reviews. | nhumrich wrote: | Rabbitmq is FOSS, has lots of language bindings. It has | persistence, durability, and doesn't require a separate | orchestrator. | vorpalhex wrote: | Different set of promises. NATS is great but has a different | tradeoff bargain from Rabbit or Kafka. | KaiserPro wrote: | If I'm not buying a message bus in as a service, then NATS is | great for pub/sub and or message passing system | | it is simple to configure, has good documentation, and | excellent integration into most languages. It guarantees | uptime, and thats about it. It clusters really well, so you can | swap out instances, or scale in/out as you need. | [deleted] | bicijay wrote: | Message ordering is an illusion. Unless you track/store the | messages on the client and are willing to deal with stuck queues | due to failures in one "poisoned" message. | klabb3 wrote: | There are different kinds of order. Yes, there's no total order | in a distributed system, but you can have certain partial order | guarantees. It's nice if something is added before it's | updated, for instance. | rafaelturk wrote: | Couldn't agree more, messages should be completely agnostic | from one-another. If you have a decent event-driven | architecture, you don't need kafka. and you can be happy with | Redis or RabbitMQ | tannhaeuser wrote: | I think this is an excellent article. The only thing I'd add is | that RabbitMQ is an implementation of AQMP (optionally v1.0) as a | standardized broker service protocol so is designed to be | interchangeable with other extant implementations such as Apache | Active MQ and Cupid whereas Kafka is one-of-a-kind software. | Beyond that RabbitMQ has standardized client libs and frameworks | in Java land if that matters to you - it did matter in the | original context of message queue middlewares and SOA from where | AMQP originated and where enterprise messaging sees major use. | OTOH Kafka, with caveats, is in principle more "web scale" - | though that is far from a free ride. | purpleblue wrote: | If someone is asking if they should decide between RabbitMQ vs | Kafka, they should 100% use RabbitMQ. It means they have no idea | what they're dealing with, the architectural differences, and the | investment that the company needs in order to use Kafka. | | So use RabbitMQ. | postalrat wrote: | Redis isn't an option? | rafaelturk wrote: | Yes, Don't why is not mentioned in the article, maybe because | is more barebones. | noitpmeder wrote: | I'd also like someone with experience to contrast redis for | these use cases. | FridgeSeal wrote: | I'm personally a fan of Kafka. I think the design of persisting | the messages, and tracking offsets for progress instead of | message acknowledgments is a much cleaner and more versatile | design. | | You can get all the same advantages of message acknowledgments, | but now you can also replay queues, let different applications | use the messages (handy for cross cutting event/notification | systems) and you get better scaling properties-which doesn't hurt | at the small scale, and provides further scaling when you need | it. | whalesalad wrote: | > You can get all the same advantages of message | acknowledgments, but now you can also replay queues | | with rmq you can reject/nack a message and have it put back on | the queue. rmq is not well suited for long term historical | retention inside queues a-la kafka's logs but it is possible to | do. | | > let different applications use the messages (handy for cross | cutting event/notification systems) | | rmq also does a publish once and fanout to multiple queues to | support this. data is replicated so that could be a deal | breaker, but it is possible. | | how often have you had to diagnose a stuck consumer or some | other kind of offset glitch where a consumer is unable to | resume where it left off? | | not knocking kafka here but I do think it is a tool you should | reach for when you need to solve a very hyper focused problem, | while rabbit is a tool more suited to most cases where queuing | is required. kafka is a code smell in a lot of organizations | from my experience - most do not need it. | raducu wrote: | > afka is a code smell in a lot of organizations from my | experience - most do not need it. | | Kafka is really nice if you don't care that much about | latency during peak load and you don't have absurd processing | times for messages. | raducu wrote: | > You can get all the same advantages of message | acknowledgments. | | Maybe 95% of cases, but not all. | | Long message processing time really kills kafka in a way it | doesn't kill Rabbit Mq. Combine it with inherent read | paralelism being limited to the number of partitions. Add in | high variability of message rates and bingo, that's like 90% of | the issues I've had with kafka over the years. | KaiserPro wrote: | > now you can also replay queues | | yeahnah, that leads to people treating queues like databases | (I'm looking at you new york times, you know what you did | wrong) | | its either a queue, or a pubsub, either way its ephemeral. Once | its gone, it should stay gone. thats what database, object | stores or filesystems are for. | | Kafka is a beast, has lots of bells and whistles and grinds to | a halt when you look at it funny. Yes, it can scale, but also | it can just sulk. | | rabbit has it's own set of problems, and frankly it's probably | not choose either anymore. | officialchicken wrote: | > (I'm looking at you new york times, you know what you did | wrong) | | You're going to have to be a tiny bit more specific here. NYT | is THE factory of wrongness for sure. In every dimension. Are | we talking "yellow cake" wrong, or somewhere else on the | severity of f'up scale... | supermatt wrote: | The same MQ patterns as mentioned in the article (exactly once, | consumer groups) can also be done in kafka, contrary to what the | article suggests. | robertlagrant wrote: | > one is a message broker, and the other is a distributed | streaming platform | | I think this is an odd way of putting it. One is smart messaging; | dumb clients. The other is dumb messaging; smart clients. It | turns out the latter (i.e. Kafka) scales wonderfully so you can | send more data, but you add complexity to your clients, who can't | just now pluck messages off a queue to process, or have messages | retry upon the first 3 failures, as they could with RabbitMQ. | | Having said that, Kafka lets you keep all your data, so you don't | have to worry about losing messages to unexpected interactions | between RabbitMQ rules. But having said _that_ , now you have to | store all your data. | supermatt wrote: | > One is smart messaging; dumb clients. The other is dumb | messaging; smart clients. | | All the smartness of the messaging can be implemented in the | smart clients. Then you can expose that as a smart messaging | api to dumb clients. | | The most obvious example is kafka streams which exposes a | "simple" api rather than dealing directly with kafka, but | obviously you could create a less featurefull wrapper than | that. | waynesonfire wrote: | And reimplement rabbitmq? Great idea. Let's do it in rust | too. | [deleted] | NovemberWhiskey wrote: | > _All the smartness of the messaging can be implemented in | the smart clients._ | | How do you do, for example, a queue with priorities client | side without it being insanity? That's a relatively basic | AMQP thing. Or managing the number of redeliveries for a | message that's being repeatedly rejected. | | You can absolutely try to build some of this with a look- | aside shared data store that all clients have to depend on in | order to emulate having the capability in the broker, but you | just introduced another common point of failure in addition | to the messaging infrastructure. Life is too short for this. | supermatt wrote: | I totally agree that you cant do a lot of AMQP stuff. As | you noted, you can build some of it by managing state via | transactional producers, etc - but you definitely cant do | everything. The biggest gripe for me is actually dynamic | "queue" creation, patterns for topics, etc. So I use an MQ | for an MQ ;) | | I'm just saying you can "dumb down" the client side on | kafka by creating an abstraction layer (or one of the many | higher level libs that already do that). | eddythompson80 wrote: | I can't help but think that this just gives you the worst of | both worlds. You are now on the hook managing that non- | standard "smart" wrapper which will quickly just become the | status quo for the project. Anyone wanting to change how it | works needs to understand exactly how "smart" you made it and | all the side effects that will come with making a change | there. | | I pushed against knative in our company particularly for that | reason. Like we wanna use kafka because [Insert kafka sales | pitch], but we don't want our developers to utilize any of | the kafka features. We're just gonna define the kafka client | in some yaml format and have our clients handle an http | request per message. It didn't make sense to me. | supermatt wrote: | Thats kind of like saying dont use any software libraries | because they all use the standard lib indirectly so you may | as well just use that? | | Its just an abstraction layer to make things less effort. | eddythompson80 wrote: | yeah, don't wrap all calls to a standard lib in another | homegrown or non-standard single-digit user lib that | makes changes in all sort of subtle ways. There are | plenty of C++ projects that make their own or wrap stdlib | and they are always a big wtf. | | It's one thing to have an abstraction for kafka in your | code, it's another to wrap the client in a smart client | that reimplements something like rabbitmq, and much worse | a smart service. | supermatt wrote: | > don't wrap all calls to a standard lib | | Im not saying to expose the same primitives - what would | be the point in that? I am saying that EVERY lib you use | will be using the standard lib or some abstraction of it | to perform its own utility. | | > It's one thing to have an abstraction for kafka in your | code, it's another to wrap the client in a smart client, | and much worse a smart service. | | That abstraction is exactly what i am talking about. Why | write 50 lines of boilerplate multiple times throughout | your code when you can wrap that up in a single function | call and expose THAT as your client. You know thats | exactly what you will end up doing on any non-trivial | project. Or you could use a lib that already does that - | such as the "official" kafka streams lib. | robertlagrant wrote: | This would be my instinct too. | raducu wrote: | > who can't just now pluck messages off a queue to process | | The problem is you cannot mark individual messages as read, for | a given consumer&partition you can only update the offset for a | partition. | | If a certain message processing takes very long, all other | messages in that partition will have to wait. | | Also, with kafka, the max read concurrency is equal to the | number of partitions, for something like rabbitMq it is much | higher; but you do get nice message ordering for any given | partition in kafka which you do not get in RabbitMq (afik); you | are also get some really nice data locality with kafka because | unless the consumers get the partitions re-assigned, all | messages for the same key are served on the same physical | consumer. | math wrote: | Worth noting that Kafka is getting queues: https://cwiki.apac | he.org/confluence/display/KAFKA/KIP-932%3A... | ryanjshaw wrote: | > The problem is you cannot mark individual messages as read, | for a given consumer&partition you can only update the offset | for a partition. | | Hence "smart clients". If you MUST process every message at | least once, you will anyway be tracking messages individually | on the client (e.g. a DB or file system plus logic for | idempotent message processing) and thus disable auto-offset | commits back to the cluster for your consumer. | | RabbitMQ says "let me track this for you", Kafka says "you | already need to track this so why duplicate the data in the | cluster and complicate the protocol". | | If you don't have reliable persistent storage available and | insist on using the Kafka cluster to track offsets, you can | track processed offsets in memory and whenever your lowest | processed offset moves forward, you have your consumer commit | that offset manually as part of its message loop. | | If your service restarts your downstream commands need to be | idempotent of course because you will reconsume messages you | may have previously processed, but this would be the case | regardless of Kafka or RabbitMQ unless you're using | distributed transactions (yuck). | | > If a certain message processing takes very long, all other | messages in that partition will have to wait. | | You can stream messages into a buffer and process them in | parallel, and commit the low watermark offset whenever it | changes, as described above. I've implemented this in .NET | with Channels and saturate the CPUs with no problem. | lmm wrote: | > You can stream messages into a buffer and process them in | parallel, and commit the low watermark offset whenever it | changes, as described above. I've implemented this in .NET | with Channels and saturate the CPUs with no problem. | | And there are libraries that will manage all this for you | e.g. https://github.com/line/decaton | rafaelturk wrote: | Nice post! RabbitMq is battle tested, exceptionally fast and low | resources app. Capable of handling millions of | transactions/second. RabbitMQ will handle vast majority of | usecases. I'm puzzled why often startups, or even banks use | Kafka, soley because is hype. Kafka on the order hand requires | massive CPUs, Memory, often requiring its own K8S cluster just to | be alive. | rafaelturk wrote: | If your have a clean event-driven architecture, ie messages are | completely agnostic and decoupled from one-another you don't | need Kafka. | bhouston wrote: | If you use Confluent Kafka, the billing is pretty high. About 4 | years ago it was much cheaper, but then they completely revamped | the pricing to something ridiculous. I found that switching to | Google Pub/Sub, at least if it meets your needs, is cheaper. | esafak wrote: | I see they offer Kafka's exactly-once delivery: | https://cloud.google.com/blog/products/data-analytics/cloud-... | AndyPa32 wrote: | Yes, I can confirm that. Confluent is the most expensive part | of our current infrastructure. | bhouston wrote: | Can you switch away from it? Or do you need its advanced | features? | BWStearns wrote: | My comment is mostly about part 2 of this post, but wrt message | ordering being a kafka "win" I'd raise the point that in the | actual use case of "a consumer fails in some way to process the | message" you can still end up with out of order processing of the | consumer's input since you might want to dump them into a DLQ or | something. The fact that the message isn't reappended to the | topic by default for processing is kind of an academic point no? | | Unrelatedly, I've been looking at Pulsar lately. Anyone have | experience with Pulsar and either RMQ/Kafka want to throw out | some opinions from having tried both? | whalesalad wrote: | One is a tomato, the other is an orange. From a distance they | might look alike but they really are two completely different | tools. This is a pretty solid explanation of the differences with | good illustrations. | | Rabbit can do everything Kafka does - and much more - in a more | configurable manner. Kafka is highly optimized for essentially | one use case and does that well. Nothing in life is free, there | are trade-offs everywhere. I am not privy to which one is | theoretically faster - but once you reach that question methinks | the particular workload is the deciding factor. | JohnMakin wrote: | I am not an expert in either and have only worked with Kafka. | At a past job I had to write a connector job to parse and | sanitize some extremely dirty, unstructured data and pass it | along somewhere else. RabbitMQ supports this? What is the one | use case of kafka? I think you have it backwards. | whalesalad wrote: | > parse and sanitize some extremely dirty, unstructured data | and pass it along somewhere else | | can you be more specific? that to me sounds like hello world | for either of these tools. "santize data" is an application | level concern that neither rabbit or kafka would handle. as | far as "pass along somewhere else" again both tools can do. | JohnMakin wrote: | It was a Sink Connector. I don't know what it was or wasn't | supposed to do but I was asked to do it, as is often the | case in tech. I could have done any number of | transformations in that process though, which I'm not sure | rabbitmq supports | whalesalad wrote: | It sounds to me like you aren't really even sure what you | built. I have operated both rabbit and kafka at scale I | definitely do not have it backwards :) | JohnMakin wrote: | No, I'm not, because it was years ago, and I'm asking for | clarification because what was said immediately sounded | wrong to me (I've managed a lot of rabbitmq deployments) | and you've not really given one other than an appeal to | authority. guess I have my answer. Can't find anything | that suggests rabbitmq natively supports anything like | sink connectors. thanks. | NovemberWhiskey wrote: | > _I am not an expert in either and have only worked with | Kafka._ | | > _I've managed a lot of rabbitmq deployments_ | | ... ? | JohnMakin wrote: | You do not need to be an expert in something's internal | workings to manage/monitor a deployment. Surely this does | not need to be explained further. | lnenad wrote: | A classic exchange on the internet. | FINDarkside wrote: | [flagged] | supermatt wrote: | > Can't find anything that suggests rabbitmq natively | supports anything like sink connectors | | Kafka doesnt natively support them either. That would be | Kafka Connect. I guess you could use it as an MQ, but it | wouldnt be a very good one. Its more used as a data | integration platform. If you want more MQ-like | functionality OOTB on top of Kafka you would want to use | something like Kafka Streams instead. | JohnMakin wrote: | Thanks for this clarification, this is what I was after. | KaiserPro wrote: | Rabbit is an arse to scale past one broker. It was possible, | but a pain, that might have changed. | | Kafka is just a pain full stop. | icedchai wrote: | At a previous company, about 10 years ago, we had roughly 10 | RabbitMQ instances (brokers), all isolated. The system was | essentially partitioned by queue server. We had a directory- | ish service that would associate clients with their assigned | node. It worked well, except if a client got too large we | might have to move them to another queue. | nhumrich wrote: | The official rabbitmq controller for kubernetes is a breeze. | Scales wonderfully without almost any effort. ___________________________________________________________________ (page generated 2023-09-19 23:00 UTC)