[HN Gopher] MQTT vs. Kafka: An IoT Advocate's Perspective
       ___________________________________________________________________
        
       MQTT vs. Kafka: An IoT Advocate's Perspective
        
       Author : teleforce
       Score  : 90 points
       Date   : 2023-03-14 18:36 UTC (4 hours ago)
        
 (HTM) web link (www.influxdata.com)
 (TXT) w3m dump (www.influxdata.com)
        
       | speedgoose wrote:
       | I hear from everyone using Kafka in production that it is hell
       | unless you use Confluent.
       | 
       | I gave a try to NATS JetStreams but I havn't been convinced by
       | the performances of the Python client, nor the JavaScript one. I
       | don't have extreme data, I just need descent performances.
       | 
       | I'm thinking about giving a try to RabbitMQ streams. I have been
       | very happy with RabbitMQ, the MQTT plugin isn't fully working
       | (the big one is that retained messages are not sent to wildcard
       | subscribers), but it should work with AMQP.
        
         | outworlder wrote:
         | > I'm thinking about giving a try to RabbitMQ
         | 
         | We went the opposite route. Kafka has been much better. Up to a
         | certain volume, both solved the problem. When RabbitMQ required
         | too much tuning, a decision was made to go to Kafka, and it's
         | been stellar.
         | 
         | Both are pretty good, but understand that there are too many
         | variables involved and you can't really escape production hell
         | indefinitely, regardless of what you pick. What changes is when
         | you are going to see the flames, and what is going to spark
         | them.
        
         | EdwardDiego wrote:
         | Welp, here's a dissenting opinion - it's not.
         | 
         | I've run self-managed, sorta managed (MSK), fully managed
         | (Confluent Cloud), and somewhat managed (Strimzi).
         | 
         | It is complex, yes, but it solves a very complicated problem.
         | The issue tends to arise when people use it when simpler
         | alternatives exist for their problem.
        
         | amath wrote:
         | Have you tried Pulsar or Redpanda? Both seem mature enough and
         | provide decent performance to probably meet your needs. What I
         | hear is that Redpanda is a lot easier to manage than Kafka.
        
           | speedgoose wrote:
           | I haven't took the time but I should. Redpanda is written in
           | C++ and I tend to prefer safer programming languages, redis
           | being the exception.
        
           | serial_dev wrote:
           | What's the license for Redpanda? I couldn't find anything but
           | maybe it's because I'm on mobile.
        
         | mikedelago wrote:
         | Kafka (along with zookeeper) really isn't that bad to self-
         | host.
         | 
         | Ime, it's easy if the org has a half decent
         | infrastructure/configuration as code setup.
        
         | grepLeigh wrote:
         | I've had an excellent experience using the Rust NATS client!
         | 
         | I pump time series data through NATS running on a Raspberry Pi,
         | which is part of a 3D printer monitoring and event/automation
         | system. I also use NATS as an MQTT broker, for compatibility
         | with other software in the 3D printer ecosystem.
         | 
         | FWIW I also have lots of experience running large Kafka and
         | Rabbitmq fleets. The choice between these technologies depends
         | on what you're optimizing for.
        
       | mrkeen wrote:
       | I jumped onto https://mqtt.org/ to try to answer my usual use-
       | case question about non-Kafka messaging, which is: "Do the
       | messages get saved anywhere so you can come back and read them
       | later?" Still not entirely sure about it.
       | 
       | But I did see:                   This is why MQTT has 3 defined
       | quality of service levels: 0 - at most once, 1- at least once, 2
       | - exactly once
       | 
       | I'm a big fan of advertising the impossible on the front page.
        
         | alecthomas wrote:
         | > I'm a big fan of advertising the impossible on the front
         | page.
         | 
         | Do you mean like Confluent do?
         | https://www.confluent.io/blog/exactly-once-semantics-are-pos...
        
           | ryanjshaw wrote:
           | That's the Kafka Streams API. "Exactly-once semantics" has a
           | very specific meaning in the context of that particular API,
           | which the article could probably do a better job of
           | clarifying upfront. (Otherwise it is an excellent overview of
           | the problem and solution provided by the Streams API.)
        
           | codeflo wrote:
           | The impossibility of "exactly once" is a theorem, not an
           | opinion.
           | 
           | Knowing that, the article you linked is funny. It begins with
           | maybe a thousand filler words complaining that all of this is
           | "poorly understood", and it's not "impossible", just "very
           | hard". Then it gets to the meat: Yeah well, you know, it's
           | not quite "exactly once delivery", just "exactly once
           | semantics", and to achieve that, messages need to be
           | idempotent so that duplicates don't matter.
           | 
           | We all know that. It's called "at least once".
        
             | simplotek wrote:
             | > The impossibility of "exactly once" is a theorem, not an
             | opinion.
             | 
             | It's quite likely that your definition of what "exactly
             | once" means differs from the one followed by MQTT. As this
             | issue was documented years ago, I doubt this is a relevant
             | argument to have, unless we want to feel smart by
             | criticising others.
             | 
             | https://www.eejournal.com/2015/05/28/is-exactly-once-
             | deliver...
        
         | alexisread wrote:
         | With mqtt it depends on the broker, eg. Emitter.io can save
         | them for a week etc. Offset for a client is usually stored on
         | the broker, so if a client reconnects, all of the messages it
         | has missed are forwarded to it.
         | 
         | As mentioned in other answers, service levels have a defined
         | meaning, which is different to the absolute theoretical
         | meaning, and really is to do with the message aks.
         | 
         | Really, as the article mentions, kafka and mqtt are for
         | different purposes, with some overlap. Kafka is all about the
         | log, whereas mqtt is about uncertain connections. A better
         | comparison which I've yet to see, is comparing mqtt to nats.
         | 
         | Lastly, kafka is much easier to administer using redpanda,
         | which doesn't have zookeeper, combines the registry and kafka
         | connect (see WASM runners) with the runtime, and has a very
         | nice console for debugging.
         | 
         | Similarly, Emitter.io does a great job with clustering for
         | mqtt.
         | 
         | I'd like to see an open source kafka-mqtt bridge that worked in
         | both directions as they all seem to go mqtt->kafka only.
        
           | imglorp wrote:
           | nats +1
        
         | jon-wood wrote:
         | MQTT isn't designed as a persistent log, but can fulfil some of
         | what you might want to use one for.
         | 
         | Each message has a couple of flags, the first being Quality of
         | Service, which as you quoted above determines deliver
         | guarantees. 0 is fire and forget, with potential loss of
         | messages. 1 will queue messages for delivery to offline clients
         | that are subscribed to a topic (within reason, all brokers set
         | limits on that), and 2 is often described as "exactly once",
         | but is in fact just a more involved dance to acknowledge
         | messages.
         | 
         | The other flag is a Retain flag, which instructs the broker to
         | associate that message with the topic it was sent to, and send
         | it on to any newly subscribing clients when they subscribe.
         | This is good for use cases like remote device configuration -
         | you can send it to a topic, setting the retain flag, and then
         | when a device comes online it'll immediately receive new
         | configuration.
         | 
         | MQTT is great as a message queue for remote devices, mostly
         | because it's so lightweight anything with an IP stack can
         | integrate with it, but I'm not sure why anyone would attempt to
         | make it a piece of core infrastructure.
        
         | [deleted]
        
         | [deleted]
        
         | [deleted]
        
         | avereveard wrote:
         | > advertising the impossible
         | 
         | eh if you read the finer print it's just a deduplication id
         | appended to every message. blog doesn't go into detail on what
         | happen when two client pust a message with the same it, or what
         | happens if there is more than one failure (i.e. client fails to
         | detect a service outage and during the service outage the
         | message is consumed by the broker but persisting fails) but in
         | general the usage of a at least once + a deduplication id is
         | not something revolutionary.
        
         | jonquark wrote:
         | MQTT.org can't answer that as it's a web page for for a
         | protocol. I've worked on platforms that do have a historian
         | feature but it will vary from broker to broker.
         | 
         | (disclosure, I work on Eclipse Amlen and it does not - but
         | people often rig it up to a subscriber that funnels (some/all)
         | messages into databases
        
       | gz5 wrote:
       | Good article (along with parts 2 and 3). Are there key
       | differences in secure networking constructs (TLS, mTLS, VPN,
       | whitelisted IPs, open ports, etc.) in the options described:
       | 
       | + inbound to Kafka clusters and Kafka Connect?
       | 
       | + inbound to Mosquitto MQTT broker?
       | 
       | + inbound to Telegraf?
       | 
       | + inbound to influxDB?
        
       | yawniek wrote:
       | many people seem to not have clarity on what a distributed log is
       | and in which architecture its useful and in which not. if you are
       | abusing a distributed log as a message queue, you are most of the
       | time creating a mess.
        
         | ed25519FUUU wrote:
         | The "abuse" you describe is pretty much how Kafka is used
         | everywhere I've worked.
        
       | Jemaclus wrote:
       | This article appears to be comparing MQTT and Kafka + Schema
       | Registry. Using Schema Registry is not required to use Kafka, so
       | OP overcomplicated their own set up for this comparison. There's
       | no argument that Schema Registry is valuable, but it's not
       | something that MQTT seems to provide out of the box, so the
       | comparison seems flawed.
       | 
       | I'd be interested in a comparison that is actually apples-to-
       | apples instead of introducing complexity with Schema Registry.
        
         | rad_gruchalski wrote:
         | Comparing mqtt to kafka is already apples vs oranges. Adding
         | the schema registry to this is like throwing a pitaya into the
         | mix.
         | 
         | Edit: after reading the article a couple of times it's clear
         | this isn't a comparison. "Vs" in the title is the problem. The
         | first impression would have been better if the title was
         | something like "Kafka and MQTT".
         | 
         | To be honest, as kafka and mqtt often reside next to each
         | other, they complement each other. A better use case would have
         | been an app combining both: show when should mqtt hand over to
         | kafka.
        
         | ryanjshaw wrote:
         | The author is also incorrect about message keys - these are
         | optional and you would only use them for strict ordering when
         | using > 1 partition, or log compaction where "latest is
         | greatest" is good enough.
        
       | justinclift wrote:
       | An important question not mentioned in this article - and may not
       | have been known by the author - is how much (Dev)Ops burden do
       | each of these add?
       | 
       | In the places I've worked that use Kafka, it's 100% always a
       | source of issues and operational headaches.
       | 
       | That's in fairly high throughput environments though, no idea if
       | it "just works" flawlessly in easy going ones.
        
         | sigwinch28 wrote:
         | I wonder... how many issues was Kafka "soaking up" by dealing
         | with concerns that applications and services didn't have to
         | even consider?
         | 
         | As in, I wonder how much application developer burden would be
         | present if using MQTT instead.
        
           | justinclift wrote:
           | It's an interesting question. No idea how to go about
           | quantifying it though.
        
             | sigwinch28 wrote:
             | Fair.
        
             | JUNGLEISMASSIVE wrote:
             | [dead]
        
         | outworlder wrote:
         | > In the places I've worked that use Kafka, it's 100% always a
         | source of issues and operational headaches.
         | 
         | Compared to what?
         | 
         | I have the opposite experience. For example, ingesting large
         | amounts of log data. Kafka could handle an order of magnitude
         | more events compared to Elasticsearch. Even if the data
         | ultimately ended up in ES, being able to ingest with Kafka
         | improved things considerably. We ended up getting an out of the
         | box solution that does just that (Humio, now known as
         | LogScale).
         | 
         | Similar experience when replacing RabbitMQ with Kafka. None
         | "just works" and there's always growing pains in high
         | throughput applications, but that comes with the territory.
         | 
         | Is Kafka the source of headaches, or is it Zookeeper? Usually
         | it's Zookeeper for me (although, again, Zookeeper has difficult
         | problems to solve, which is why software packages use ZK in the
         | first place).
        
         | ryanjshaw wrote:
         | What issues did you run into?
         | 
         | From a technology perspective it's been rock solid for years in
         | my experience.
         | 
         | Where issues crept in it was always due to people not
         | understanding the architecture and patterns you need to use
         | e.g. anti-patterns like splitting batches into multiple
         | messages, "everything must be stored in Kafka" thinking, not
         | understanding how offset commits work, not understanding when
         | to use keys or the effects of partitioning, resetting offsets
         | on a live topic, aggressive retention policies etc.
        
           | TheSoftwareGuy wrote:
           | Do you know of any places to learn those things? Kafka seems
           | pretty interesting to me
        
           | taywrobel wrote:
           | One issue I've encountered is over-partitioning to handle a
           | spike in traffic.
           | 
           | I.e. an event occurs which causes an order of magnitude more
           | messages than usual to be produced for a couple of hours, and
           | because ingest and processing flows are out of whack, a
           | backlog forms. Management wants things back in sync ASAP, and
           | so green lights increasing the partition count on the topic,
           | usually doubling it.
           | 
           | In an event driven architecture that is fairly well tuned for
           | normal traffic this can have the same downstream effect, and
           | those topics up their partition counts as well in response.
           | 
           | Once anomalous traffic subsides, teams go to turn down the
           | now over-partitioned topics only to learn that that was a one
           | way operation and now they're stuck with that many
           | partitions, and the associated cost overhead.
           | 
           | Also if I see another team try to implement "retries" or
           | delayed processing on messages by doing some weird multi-
           | topic trickery I'm going to lose my mind. Kafka is a message
           | queue, not a job queue, and not nearly enough engineers seem
           | to grok that.
        
         | drowsspa wrote:
         | Where I work we have an on-premises Hadoop cluster and Kafka is
         | its only stable component that works without constant
         | headaches.
        
         | Scubabear68 wrote:
         | For shops light on DevOps-fu, Confluent hosted Kafka is popular
         | for just this reason.
        
         | FridgeSeal wrote:
         | If you're on AWS I've had zero issues with their managed Kafka
         | offering (MSK). I'm sure they did lots behind the scenes, but
         | it was really one of our most rock-solid pieces of
         | infrastructure.
         | 
         | If I had a need for Kafka in my current role, I'd probably give
         | Confluent and Red Panda offerings a shot.
        
       | hkt wrote:
       | A better comparison with Kafka is redis streams. Similar
       | semantics, a fraction of the operational overhead.
        
         | FridgeSeal wrote:
         | Provided your surrounding tools plug into Redis streams.
         | 
         | Oh and provided you don't need the ordering and parallelism
         | guarantees of Kafka's partitions.
         | 
         | Oh and provided you don't need the same level of durability and
         | fault tolerance, so yeah, exactly the same.
        
           | outworlder wrote:
           | Exactly. Tradeoffs exist everywhere, in both directions.
        
       | [deleted]
        
       | twawaaay wrote:
       | One missing criterion is client complexity. MQTT is built to work
       | well with very little resources on the client. Kafka, on the
       | other hand, requires you to do things you just don't want on a
       | small embedded device -- like opening multiple connections to
       | multiple hosts. Kafka is also just a transport for messages while
       | MQTT is much larger part of the stack and takes care of
       | transporting individual values. Which means you need less other
       | code on your super restricted device.
       | 
       | That said, I don't understand all the complaining directed at
       | Kafka in this thread. Kafka is a fantastic tool that provides
       | unique properties and guarantees. As a tech lead/architect I love
       | to have a good selection of tools for different situations. Kafka
       | is very reliable tool that fils an important role of when
       | creating distributed systems and is particularly nice because it
       | is easy to reason about. The negative opinions I heard in the
       | past are typically from people who try to use it for something
       | that it is not well suited for (like efficient transfer of large
       | volumes of data) or because they misunderstood how to use its
       | guarantees to construct larger systems.
       | 
       | At one place I met a team who was completely lost with their
       | overloaded Kafka instance and requested to get external help to
       | "further scale and tune" it.
       | 
       | I just touched the piece of code on producer and on consumer to
       | publish data in large files to S3 rather than push it all through
       | Kafka. Instead, send a simple message to Kafka with the metadata
       | and location of the payload in S3. And then the client to
       | download it from the bucket. They were happy puppies in no time.
        
       | skrtskrt wrote:
       | there's a lot of "Kafka causes so many issues!" comments here.
       | 
       | I think it gets a bad rap because it gets introduced to orgs
       | without the org having the requisite level of understanding. If
       | your whole org is just on like a standard OLTP/OLAP setup, then
       | suddenly there's a Kafka queue, there's going to he a serious
       | learning curve and bumps along the way.
       | 
       | If you're incorrectly putting async event brokers as the
       | datastore where you should be putting a synchronous DB and then
       | streaming from the DB to kafka with an outbox pattern, you're
       | going to have a bad time.
       | 
       | If you're not modeling your queue depth and throughput you're
       | going to have a bad time.
       | 
       | If you're not modeling your concurrency scenarios and
       | synchronization, you're going to have a bad time.
        
       | adev_ wrote:
       | The article is pretty biased by comparing the complexity a schema
       | free scenario (MQTT) to Kafka with Schema.
       | 
       | However his points still remains: Most of the usage of Kafka I
       | have seen in production are the result of a random
       | Architect/Techlead who tried follow the hype train on event
       | sourcing and a recipe for disaster.
       | 
       | And in 90% of the case, that could have been replaced by a
       | trivial lightweight mosquito (MQTT) server for 10% of the
       | operating cost.
       | 
       | Kafka is a monster of complexity notoriously hard to operate
       | (Hello ZooKeeper) and to understand properly (Hello ordering,
       | persistency and partitions).
       | 
       | If all you need is a simple stupid publish/subscribe broker with
       | topics/auth management, do a favour to yourself, stay away from
       | it.
        
         | NovemberWhiskey wrote:
         | I see Kafka deployed for things which have perhaps a few
         | thousand messages per day. It's like "did you accidentally mis-
         | specify by six orders of magnitude here?"
        
         | gvtek0 wrote:
         | >However his points still remains: Most of the usage of Kafka I
         | have seen in production are the result of a random
         | Architect/Techlead who tried follow the hype train
         | 
         | Don't look now but this is how people end up with k8s as well.
         | "We need Kubernetes because we need containers." Google et al
         | convinced people it's the only way to run containers in prod.
        
         | septune wrote:
         | Forget MQTT, Redis as a PUB/SUB will do 99% of the job most of
         | the time.
        
           | jck wrote:
           | Redis 7 supports sharded pubsub.
           | 
           | Also, redis streams are excellent and perform really well for
           | Kafka lite type use cases.
        
             | Gwypaas wrote:
             | Until you need backpressure, then you are left with awful
             | out-of-band hacks and hope for the best.
        
         | EdwardDiego wrote:
         | > notoriously hard to operate (Hello ZooKeeper)
         | 
         | Eh? ZooKeeper is rock solid.
        
         | MuffinFlavored wrote:
         | > However his points still remains: Most of the usage of Kafka
         | I have seen in production are the result of a random
         | Architect/Techlead who tried follow the hype train on event
         | sourcing and a recipe for disaster.
         | 
         | While calling this out on a message board comment section is
         | going to be well-received, asking "do we need this" while
         | working at the company with said architect/tech lead is not
         | well-received.
         | 
         | How many of us get paid to work jobs where we're basically told
         | "shut up, this is what we're doing/using, go with it"?
        
           | rad_gruchalski wrote:
           | And go for mqtt instead? Not a smart choice. Look, it's going
           | to work for a few hundred, few thousand topics. But as soon
           | as you need resilience, replication, or you outgrow that one
           | broker... good luck. Mqtt is awful to scale horizontally.
        
             | jonquark wrote:
             | This comment seems backwards to me. If you're funnelling
             | incoming messages in to hundreds of topics (or less) Kafka
             | is a great "fat pipe" if you need millions (or tens of
             | millions) of topics for IoT devices, MQTT is much more
             | designed for that usecase
             | 
             | Disclosure: I'm biased - I've worked on the MQTT spec and
             | I'm the lead for Eclipse Amlen
        
             | justinclift wrote:
             | Maybe something AMQP related instead?
        
               | rad_gruchalski wrote:
               | Those are two different technologies. Amqp is all about
               | routing and queues. Kafka is a distributed log, it is not
               | a queue. There's a significant difference between those
               | two.
               | 
               | Kafka: every consumer for a partition within a consumer
               | group will see a message at least once. A queue: it's
               | possible that a partition has multiple consumers and only
               | one consumer sees a particular message.
               | 
               | Kafka is relatively small to medium number of large
               | volume topics. Topics can be larger the a machine due to
               | partitioning. Strict ordering per partition based on
               | message arrival time. Queues (RabbitMQ or anything amqp)
               | are relatively large number or small to medium volume
               | topics. Ordering is an option, a topic must fit within a
               | machine.
               | 
               | Thos are orthogonal concepts. They can live next to each
               | other. My first choice is always Kafka because:
               | persistence, replication, scalability. Works fine as a
               | single broker, can always scale horizontally, zookeeper
               | is not that scary, especially with a good operator.
               | 
               | If you think that Kafka is difficult to run, wait until
               | you need replication in RabbitMQ. Good luck with leader
               | election in your application layer. No fun.
        
               | MuffinFlavored wrote:
               | > Kafka is a distributed log
               | 
               | When should you use Kafka instead of storing rows in SQL
               | with a timestamp so you can replay them/fetch them if
               | needed?
               | 
               | Why do you need a sharded Kafka cluster?
               | 
               | Most businesses are going to have Redis, SQL, and
               | probably RabbitMQ.
               | 
               | Where/why add Kafka to that stack?
        
               | manv1 wrote:
               | Kafka was designed for places with a huge data/log volume
               | (GB/sec or more).
               | 
               | From what I understand it's generally The Thing To Use
               | when you have not very many (ie: thousands) high-volume
               | sources that have good links. You push stuff into the
               | logs and you can take your time chewing through the log.
               | 
               | MQTT was designed for shitloads (hundreds of thousands+)
               | of small devices on bad links connecting to the
               | mothership.
               | 
               | Neither of them really are a message queue. I mean, they
               | queue messages, but they have different goals than, say,
               | ActiveMQ or RabbitMQ (which both can be used as a backend
               | for MQTT).
               | 
               | Using Kafka for a message queue is overkill. It's more
               | likely that MQTT fits your bill.
        
               | justinclift wrote:
               | Interesting. What are you thoughts on NSQ?
               | 
               | https://github.com/nsqio/nsq
               | 
               | Was looking at it earlier today, but haven't ever tried
               | it out.
        
         | lmm wrote:
         | Kafka no longer requires zookeeper. If you need true master-
         | master high availability from a datastore - which anyone who
         | bothers with a load balancer for their application should
         | demand, what's the point in running your application in a HA
         | configuration if your datastore is a single point of failure -
         | then to the best of my knowledge Kafka is still the least bad
         | option available. It's not the easiest thing to operate, but
         | I'll take it over Galera or Grenplum any day.
        
         | foolfoolz wrote:
         | > implying the operational costs of a server are captured in
         | its per hour sticker price
         | 
         | managed kafka has been around a while
        
         | victor106 wrote:
         | >Kafka is a monster of complexity notoriously hard to operate
         | (Hello ZooKeeper) and to understand properly (Hello ordering,
         | persistency and partitions).
         | 
         | 100% this.
         | 
         | Even using managed Kafka is a pain for most use cases. We
         | replaced managed Kafka with a simple postgresql db using skip
         | locked as a queue mechanism and the dev teams productivity
         | tripled and our total cost of ownership decreased dramatically.
         | 
         | Don't think twice, think 10 times if you really need Kafka
        
           | EdwardDiego wrote:
           | Yeah, if you're able to replace Kafka with PG then your stack
           | didn't need Kafka, so that's good call.
        
         | itpragmatik wrote:
         | AWS SNS + SQS Fanout works pretty well and not too complicated.
        
           | querulous wrote:
           | orders of magnitude more expensive than kafka tho. not
           | feasible at scale
        
       ___________________________________________________________________
       (page generated 2023-03-14 23:01 UTC)