[HN Gopher] Apache Pulsar is an open-source distributed pub-sub ... ___________________________________________________________________ Apache Pulsar is an open-source distributed pub-sub messaging system Author : LinuxBender Score : 360 points Date : 2020-01-02 15:40 UTC (7 hours ago) (HTM) web link (pulsar.apache.org) (TXT) w3m dump (pulsar.apache.org) | firstposterone wrote: | Splunk just acquired streamlio and most of the core devs got | sucked up. While pulsar is a great product - are you not | concerned that these guys are getting paid $$ bank to do | something else now? | matteomerli wrote: | spoiler: we're still working on Pulsar | firstposterone wrote: | That's good news! I guess it would be very helpful to | formally address this concern. Is there something that has | been written / published to that effect? | bsaul wrote: | Sidenote question : | | Are we heading toward a split between apache/java/zookeeper | stacks and go/etcd on the other ? I've seen an issue related to | that question on pulsar, and this got me investigating the | distributed KV part of the stack. | | It seems by looking at some benchmark that etcd is much more | performant than zookeeper, and that to some people, operating two | stacks seems like an operation maintenance cost a bit too high. | Is that a valid concern ? | | Also, i've seen that kafka is working on removing the dependency | to zookeeper, is pulsar going to take the same road ? | geodel wrote: | This sound about right. Apart from maybe original Apache HTTP | server most of the Apache projects are in Java. | | Looking at codebase of Pulsar it looks like typical Apache | style sprawling Java project with more than thousand | directories, many thousand files and more than hundred | dependencies. As comparison NATS which is in Go has few hundred | files, less than hundred directories and about a dozen or so | dependencies. | tyri_kai_psomi wrote: | NATS is an amazing project, I just wanted to take the | opportunity to highlight it for those first hearing about it | in this comment. It's so brilliantly simple, yet changed the | way I design distributed systems. I handle almost anything in | regards to the standard messaging guarantees that a Kafka- | like system offers at the endpoints now. As a result, systems | are much simpler, and diagnosability of bugs or edge cases | are much more straightforward. | _frkl wrote: | This sounds interesting, what exactly do you mean by | 'endpoint' in this scenario? I looked into a few | alternatives before settling for pulsar, and disregarded | nats because it didn't seem to support message persistence. | I didn't look into it too deeply though, maybe i should | have. How do you guarantee no message is lost with NATS? | geodel wrote: | Did you check this? | | https://docs.nats.io/nats-streaming-concepts/intro | | "..Message/event persistence - NATS Streaming offers | configurable message persistence: in-memory, flat files | or database. The storage subsystem uses a public | interface that allows contributors to develop their own | custom implementations." | | and | | "At-least-once-delivery - NATS Streaming offers message | acknowledgements between publisher and server (for | publish operations) and between subscriber and server (to | confirm message delivery). Messages are persisted by the | server in memory or secondary storage (or other external | storage) and will be redelivered to eligible subscribing | clients as needed." | _frkl wrote: | No, i missed that. I think ive seen 'nats streaming', but | didn't realize that it is its own distinct thing. All | this makes more sense now to me, thanks! | tylertreat wrote: | Also check out Liftbridge (https://liftbridge.io), which | is a Kafka-like API on top of NATS. | | Disclaimer: I'm the author and former core contributor of | NATS and NATS Streaming. | tyri_kai_psomi wrote: | In my thinking, I think of an endpoint as something at | either end of the communication channel (NATS in this | case) where it is effectively terminal. Usually this is | where the application logic lies. Dereck Collison | (creator of NATS) brings this up in many of his talks | about NATS, but I think the source of his thinking might | come from "End-to-End Arguments in System Design" by | Saltzer, Reed, & Clark. | | The core of it is this point: | | "Functions placed at low levels of a system may be | redundant or of little value when compared with the cost | of providing them at that low level." | | That is, in order get that message redundancy or exactly | once delivery, or message persistence, you pay a high | cost, and you may be better off delegating to the | endpoints. | | This blog provides a good overview | | https://blog.acolyer.org/2014/11/14/end-to-end-arguments- | in-... | | Here is the original paper | | http://web.mit.edu/Saltzer/www/publications/endtoend/endt | oen... | _frkl wrote: | Thanks, much appreciated! | bsaul wrote: | Is it me or does NATS looks like it's aimed at an actor- | based style of distributed system ? | JensRantil wrote: | It not necessarily aiming at that. | vorpalhex wrote: | NATS is amazing but note that it makes different promises | than Pulsar. NATS doesn't offer true durability (in | exchange for amazing performance and great simplicity) | whereas Pulsar and similar are meant to survive certain | partition or failure situations and not lose data. | | It's not one or the other, they're just different tools. | [deleted] | jjjensen90 wrote: | There is nats-streaming-server as well which offers true | durability (via file or SQL store) and a streaming model | very similar to Kafka and Pulsar. It can also run as a | raft cluster or in fault tolerance mode. It still has | very good performance and is very simple to deploy and | operate (I use it for event sourcing for real time IoT | data at my day job). | vips7L wrote: | Its more than one project in the repo. | nvarsj wrote: | It's an interesting observation. | | I think that the modern approach to distributed systems is | moving towards golang style microservices and lightweight / | simple system design with RPC communication, reconcile type | loops for state reconciliation, and backing CP databases. I | think this is the influence of k8s (and maybe google's approach | to distributed systems). | | I will almost certainly get downvoted for this (as I always | seem to when I criticize the JVM), but Apache/JVM style | architecture feels REALLY long in the tooth to me. I think you | are committing to an outdated and very expensive approach to | building software if you use anything running on the JVM, | especially Apache based anything. Cassandra is a great example | of this - out of the box it's a terribly performing database | that is extremely expensive to run and tune. Throw enough | resources and time at it and you can get it to acceptable | scalability - but running on the JVM which is a huge memory hog | will always make it expensive to run (and even then, you will | always get terrible latency distributions with the JVM's awful | GC). | | If I was building a business I would run far far away from any | JVM based solution. The only thing it has going for it is | momentum. If you need to hire 100s of engineers off the street | for a large project, then a JVM based stack is about your only | option unfortunately. | vips7L wrote: | For what its worth I have a java microservice running on 13mb | of ram. | lenkite wrote: | Nowadays Java based micro-services can compile to native | thanks to Graal native-image and frameworks like: | https://quarkus.io/. | squarecog wrote: | I understand your architecture criticism, and think it has | merit, but I'm not sure why Apache gets dragged into that. | Apache Airflow is in Python. Apache Arrow is in C. CouchDB is | Erlang. | | There's a ton of projects Apache Foundation hosts that fit | your description but it's a mistake, I think, to confuse | individual projects with Apache in general. Bad enough that | people confuse the license with the foundation. | james-mcelwain wrote: | > the JVM's awful GC | | This just makes it seem like you are trolling. JVM devs have | done more to advance state of art in this area than any other | language. The problem is that most JVM apps just produce too | much garbage, not necessarily that the algo itself is awful. | | Either way, there's no such thing as an optimal GC algorithm, | just different trade-offs depending on your use case. Not | everyone cares about latency. | pm90 wrote: | It depends. We have a ton of Java apps running atop of | kubernetes. All of them use zk, but every team operates their | own mini zk cluster deployed on k8s. It's worked fine except | for certain hard to debug problems that hit a few teams | occasionally. | | I guess my point is that k8s let's you shift the operational | burden to dev teams if they need it. If you have a centralized | operations team running a giant, common zk/ etcd, yeah this | would be additional operational burden. | matteomerli wrote: | > is pulsar going to take the same road ? | | Yes, it's in the works | FBISurveillance wrote: | Related Kafka KIP: https://cwiki.apache.org/confluence/displa | y/KAFKA/KIP-500%3A... | | It doesn't look like it's going to be ready anytime soon | though. | wilhow wrote: | Just to confirm, Pulsar has on it's roadmap to remove it's | dependency on Zookeeper. Is that correct? | matteomerli wrote: | That's correct, we're moving to have a pluggable metadata | store and coordination service. | whycombin8 wrote: | Any plans for bookkeeper? | atombender wrote: | I can't wait for projects to ditch ZooKeeper. Apache | Bookkeeper, which Apache Pulsar uses for its state, already | supports Etcd as a consensus store (though I believe this is | still alpha? beta? quality). Pulsar is also working on | supporting Etcd. | PaulHoule wrote: | Zookeeper sux. If you are in the Java world you can often roll | something better out using Hazelcast. | cbnotfromthere wrote: | Can it be used to build something like a Whatsapp-like chat | system? | | If yes, why? If no, why? | eerrt wrote: | How does this compare to Redis Pub-Sub or RabbitMQ? | Joeri wrote: | It's closer to redis streams, except like kafka you can scale | topics beyond the limits of a single server because they can be | distributed. You couldn't run the twitter firehose over redis | streams, but you can run it over pulsar or kafka, given enough | hardware. | xrd wrote: | Or Firebase? | Eikon wrote: | Or Kafka ? | manigandham wrote: | Read the other comment thread: | https://news.ycombinator.com/item?id=21936523 | sz4kerto wrote: | Kafka is primarily designed for streaming, pulsar is both | for streaming and queueing. | | Firebase is a completely different animal. | qaq wrote: | Scales much better the storage layer is separate from | brokers so you can scale things independently. | [deleted] | sz4kerto wrote: | Very different. Pulsar is primarily a Kafka competitor. | | - it is much more performant than RabbitMQ - it's a commit log | as well, not just a pub-sub system, ie. it is a good candidate | as the storage backend for event sourcing - it supports | geodistributed and tiered storage (eg. some data on NVMe | drives, some on a coldline storage) - it's persistent, not in- | memory (primarily) | | .. and so on. | TuringNYC wrote: | I went to https://pulsar.apache.org but didnt find a "Why | Pulsar and not Kafka" -- is there an answer to that, or is | this another Kafka competitor with the same strengths and not | a specific differentiator? | cbartholomew wrote: | Here is a two-part blog post I wrote on why Pulsar and not | Kafka: https://kafkaesque.io/5-more-reasons-to-choose- | apache-pulsar... | richdougherty wrote: | Thanks! And Part 1 seems like a good place to start: | https://kafkaesque.io/7-reasons-we-choose-apache-pulsar- | over... | EGreg wrote: | What about ZeroMQ? | | Why use RabbitMQ and Kafka if you can use ZeroMQ? Meaning, | isn't it far more performant and distributed? | | Maybe I am missing something here. | Joeri wrote: | Message queues and message logs do different things. The | idea of the log is that subscribers can show up after the | log is written and read or reread it from the beginning. In | an event sourced architecture you use the log as the source | of truth and all consumers can replay the log against a | local store to reconstruct a view of the system's state. | You also can use a log for pubsub, but if that's all you | need one of the MQ solutions is usually a better fit. | sz4kerto wrote: | > Why use RabbitMQ and Kafka if you can use ZeroMQ? | | They are totally different, you're comparing apples with | oranges. | | ZeroMQ gives you basic, very fast tooling to communicate | between distributed processes. ZeroMQ does not provide | tooling for e.g. maintaining a strictly ordered, multi- | terabyte event log. And so on. | EGreg wrote: | Yes but isn't this a bit like comparing git / bitkeeper | vs subversion / perforce? | | Basically, one is decentralized and you can set up a | massively parallel architecture, with eg each topic or | subthread having its own pubsub. | | The other is a monolithic centralized pubsub | architecture. | | You could argue that git in large institutional projects | converges to a monolithic repo so at that point it's less | efficient even than svn. | | But for most use cases, ZeroMQ would allow far more | flexible distributed systems topologies and solutions. | No? | | Edit: HN and Google are both awesome: | https://news.ycombinator.com/item?id=9634925 | frogger23123 wrote: | > You could argue that git in large institutional | projects converges to a monolithic repo so at that point | it's less efficient even than svn. | | Not true. Facebook and Google do not use Git. Microsoft | does not use vanilla Git for their monorepo. They created | this extension to make it scalable | https://en.wikipedia.org/wiki/Virtual_File_System_for_Git | dickjocke wrote: | Why is one Apache providing competing with another one? | spelunker wrote: | Plenty competing Apache projects exist, Pulsar and Kafka | aren't unique in that regard. | | I don't think Apache cares if it's maintaining similar | projects. | qaq wrote: | This scales to multi-datacenter deployments well. Has strong | multi-tenancy support if memory serves Yahoo is running a | single cluster for all of their properties. | tyingq wrote: | It's a persistent store, so that would be different from Redis | Pub-Sub. Compared to RabbitMQ, Pulsar seems to favor strong | ordering and protection from message loss. | | This blog post offers some more info and leaders to other posts | comparing Pulsar to RabbitMQ and Kafka: https://jack- | vanlightly.com/blog/2018/10/2/understanding-how... | [deleted] | samzer wrote: | For a moment I thought Bajaj and TVS came together. | [deleted] | lonesword wrote: | Captain here. | | TVS and Bajaj are major motorbike manufacturers in India, and | TVS had a model named "Apache" and Bajaj had a model named | "Pulsar". | | Flies away | takeda wrote: | It's not obvious if you're not from India, so thank you for | the explanation :) | opendomain wrote: | Thank you for the explanation. | addisonj wrote: | I just finished rolling out Pulsar to 8 AWS regions with geo- | replication. Messages rates are currently at about 50k msgs/sec | but still in the process of migrating many more applications. We | run on top of kubernetes (EKS). | | It took about 5 months for our implementation with a chunk of | that work mostly about figuring out how to integrate our internal | auth as well as a using hashicorp vault as a clean automated way | to get auth tokens for an AWS IAM role. | | Overall, we are very pleased and the rest of the engineering org | is very excited about it and planning to migrate most of our SQS | and Kinesis apps. | | Ask me anything in thread and will try and answer questions. At | some point we will do a blog post on our experience. | bubbleRefuge wrote: | How does python stream processing work. Are the modules running | in the JVM ? Jython ? | matteomerli wrote: | The user code it's all running in a native CPython | interpreter. | ignoramous wrote: | On behalf of everyone here, _thanks a lot_ for answering every | single question being asked. Highly appreciate it. | | I have questions myself: | | 1. Did it reduce (TCO) costs or increase it versus using | Kinesis and SQS/SNS? | | 1a. Interestingly, there's no global-replication with those AWS | services. Why did you require global-replication with the move | to Apache Pulsar? | | 2. Since you mention _internal auth_ : Weren't Cognito / KMS / | Secrets Manager up to the job? Given these are integrated out- | of-the-box with EC2? | | 3. Was it ever under-consideration to roll out pub/sub on top | of Aurora for Postgres with Global Replication? | https://layerci.com/blog/postgres-is-the-answer/ | | Thanks again. | addisonj wrote: | 1. On a short time horizon, not as sure, back of the napkin, | it took ~12 dev months (5 months with 2.5 people average on | it). However, our cost per 1000 msgs/sec is _much_ lower | (like 1 /4 the cost of Kinesis) so we fully expect that | investment to pay off over time assuming that adoption by the | rest of the org continues and we don't find a ton of issues. | | 1a. You are correct we didn't require geo-replication for | existing use cases, however, initially, we saw geo- | replication as an easy way to improve DR and we have an | internal requirement for a DR zone in another region. Now | that we have done the work, we are starting to see multiple | places where we can simplify some things with geo- | replication, so we think long term the feature will be really | valuable | | 2. We split up auth into two main components: auth of users | (where we use Okta) and auth of services. For okta, we just | wrote a small webapp that users can log into via OKta and | generate credentials. For apps/services, we already had | hashicorp in place and wanted to just piggyback of our | existing form of identity (IAM roles). Essentially, a user | just associates an IAM role with a pulsar role and we | generate and drop off credentials into a per-role unique | shared location in vault that any IAM role can access (across | multiple AWS accounts) | | 3. Once again, geo-replication wasn't really a hard | requirement initially but more of something that we really | like now that we have. I think the biggest reason why not | postgres is that we have combined message rates (not | everything is migrated yet) on the order of 300k msgs/sec | across a few dozen services. Pulsar is designed to scale | horizontally and also has really great organizational | primitives as well as an ecosystem of tools. While I think | you could maybe figure that out with some PG solution, having | something purpose built really can pay big dividends for when | you are trying to make a solution that can easily integrate | into a complex ecosystems of many teams and many different | apps/use cases | ignoramous wrote: | Agreed. | | One more: | | For replication across regions, do you peer VPCs via | Transit Gateways or some such, or do it over the public | Internet? I ask because a lot of folks complain about | exobhirant AWS bandwidth charges for cross-AZ and cross- | region communication (esp over the Internet versus over | AWS' backbone): At 300k msgs/sec, the bandwidth costs might | add up quickly? | | Consequently, maintaining a multi-region, multi-AZ VPC | peering might have been complicated without Transit | Gateway, so I'm curious how the network side of things held | up for you. | addisonj wrote: | In this case, we use just straight VPC peering with a | full mesh of all our regions. We may eventually migrate | to being built on our VPN based mesh (we do that in other | places) | | Bandwidth is certainly a concern and that is one of the | nice bits about Pulsar is not everything is replicated. | You mark a namespace by adding additional clusters it | should replicate to. We don't expect to replicate | everything, just the things teams care about. | | When we did this, Transit Gateway was just within the | same region. At re:invent they announce the cross region | transit gateway which we will look at moving to as well, | but for now, it is just a full mesh of VPC peers, which | for 8 regions isn't bad... but certainly gets worse with | each new region we need to add. | | For exposing the service into other VPCs in the same | region we use private-link endpoints as to avoid needing | to do even more peering. | mavdi wrote: | Not questioning your judgement but interested to know about the | factors moving you away from Kinesis. | ckdarby wrote: | Kinesis is very expensive in the long run. There's almost | always an intersection point on AWS where you need to | consider moving away from AWS services/"managed services" and | bring it in house. | addisonj wrote: | Biggest pain points with Kinesis: | | - ordering is really hard, you don't get guaranted ordering | unless you write one message at a time or do a lot of | complexity on writes (see https://brandur.org/kinesis-order) | and the shards are simply too small for many of our ordered | use cases | | - cost, we just don't send some data right now because it | would just be too much relative to the utility of the data | (we would need like 250 shards) | | - retention, long term, we want to store data in Pulsar with | up to unlimited retention so we can rebuild views. There is | still some complexity there (like getting parallel access to | segments in a topic for batch processing) but it is much | further along than any other options | | - client APIs for consumer. We are a polyglot shop and really | the only language where consuming Kinesis isn't terrible is | Java (and other jvm languages). For every other language, we | use lambda and while lambda is great it is still distinct | deploy and management process from the rest of the app. Being | able to deploy a simple consumer just as part of the app is | really nice | rocky1138 wrote: | Why did you choose Pulsar? | addisonj wrote: | The main driver for Pulsar is that we have a number of | different messaging use cases, some more "pub/sub" like and | some that are more "log" like. Pulsar really does unify those | two worlds while also being a ton more flexible than any | hosted options. | | For example, Kinesis is really limiting with the limited | retention and making it very difficult to do any real | ordering at scale due to the really tiny size of each shard. | | Similarly, SQS does pub/sub well, but we keep finding that we | do need to use the data more than the first initial delivery. | Instead of having multiple systems where we store that data | we have one. | | As for why we didn't go with Kafka, the biggest single reason | is that Pulsar is easier operationally with no needing to re- | balance and also with the awesome feature that is tiered | storage via offloading that allows us to _actually_ do topics | that have unlimited retention. Perhaps more importantly for | the adoption though is pub /sub is much easier with Pulsar | and the API is just much easier to reason about for | developers than all the complexity of consumer groups, etc. | There are a ton of other nice things like being able to have | topics be so cheap such that we can have hundred of thousands | and all of the built-in multi-tenancy features, geo- | replication, flexible ACL system, pulsar functions and pulsar | IO and many other things that really have us excited about | all the capabilities | dominotw wrote: | > able to have topics be so cheap | | For GDPR a lot of us has to do exportable 'user activity'. | Can you in theory have a topic/user ( we had like 50 | million users) and publish any user activity to that topic? | zapdrive wrote: | We are developing a social app with features such as messaging, | notifications etc. We decided to use Yedis [0] (Yugabyte Redis) | which is a distributed Redis with persistence backed by | RocksDB. Yugabyte supports multiple datacentre distribution. | Yedis's pub/sub is distributed as well. We are already running | a Yugabyte cluster for data storage in Cassandra. So we didn't | have to do anything extra to get our distributed pub/sub up and | running. Would you recommend using Pulsar instead? | | 0: https://docs.yugabyte.com/latest/yedis/ | tschellenbach wrote: | Normally don't plug my own work, but this is super related. | Did you ever check out Stream? https://getstream.io/ We power | chat and feeds for >500 million end users. Tech is Go, | RocksDB & Raft based. | zapdrive wrote: | Yes, we did indeed consider Stream, but figured we could | save some money by deploying and running our own system. We | are very hopeful to quickly get a couple million users in a | short time from our launch and that would have ran up our | costs with stream quickly. | addisonj wrote: | I am not familiar enough with either Yedis or your use case | to make a recommendation, but I can say that Pulsar has a | great set of features, particularly if you need long term | retention, that make it very attractive. I also been | impressed with the community and development pace. | | Being that the project is a top level Apache project and also | has some adoption by quite a few different companies and a | number of corporate sponsors the future of the project is | pretty safe bet. | unethical_ban wrote: | I have no idea what "pub-sub" is used for outside of its | academic definitions, and I have no idea how Hashicorp Vault | works - Don't you need a secret/password in cleartext at some | point, for a given service or definition? | | You don't have to answer my questions, I am just shouting into | the void. I'm glad it works for y'all. | SlowRobotAhead wrote: | Pub/sub messaging is super common. | | Most IOT devices that aren't running HTTP stacks are using | MQTT. | DevKoala wrote: | Was NATS a consideration for your use cases? At work, we are | currently standardizing on NATS as our messaging system, and I | would like to know if there is a valid comparison. | liquidgecka wrote: | Nats is not a replacement for pulsar or rabbitmq. It is a | message passing system designed to pass lots of messages | live, however if nobody is their to receive them they are | lost and gone forever. There is a streaming layer but that is | closer to Kafka and still does not provide the typical | message model with an ack/nack API. | | I have used nats in several different ways but since it can | be lossy its never been considered as a replacement for a pub | sub message queue on my end. We used it for a chat message | layer and that worked pretty well. | | As for its message passing layer that can be interesting but | you end up writing all the retry and failure logic anyway so | its usually just better to use an existing message layer that | handles all of that for you anyway without all the funky | abstractions. | | Again, its interesting but nowhere near close to being a | rabbit or pulsar replacement if reliability is a goal. | addisonj wrote: | I think the other reply captures most of it for core NATS, | but we also looked at NATS Streaming a bit, but it seems to | be pretty immature (though promising) and doesn't check all | the boxes around integrations into the streaming ecosystem | like Pulsar does (Pulsar functions, Pulsar IO). | | I am interested to see where NATS goes but for where we are | today Pulsar was a much more obvious choice. | bubbleRefuge wrote: | Isn't using Kubernetes kind of an anti-pattern due to failover | and rebalancing logic clashing? If Kubernetes is killing and | re-starting nodes and the cluster's brokers are detecting dead | brokers and rebalancing partitions as a result, it seems | counterproductive. | addisonj wrote: | This is one of the main benefits of Pulsar is that because | state is split between brokers and bookkeeper and bookkeeper | doesn't need re-balanced (due to it's segment based | architecture where you choose new bookies with each new | segment), we really don't have to worry about re-balancing | (in general, not just in case of failover) of storage. It is | true that topics map to a single broker, but generally, | Pulsar has _really_ good limits on memory so we don 't see | nodes getting killed by limits and we only really see re- | scheduling for real issues. | | While there certainly is some aspects you need to be aware | of, generally, Pulsar is much more "cloud native" and maps | quiet well to k8s primitives. | skube wrote: | Using kubernetes is _always_ an anti-pattern. | rhizome wrote: | To the degree that that conflict exists in their | implementation I would think that it's possible to account | for all of that. | staticassertion wrote: | What is the SQS-based system you are migrating from? | | I'm currently building a data processing system that is backed | by S3 -> SQS based events, for persistent message passing. | addisonj wrote: | We have a number of systems, some use SQS, some use Kinesis. | Part of the draw of Pulsar is having one piece of tech that | we can unify everything over and offer more baseline | features, like infinite retention via storage offloading or | Pulsar IO connectors that standardize common operations. We | aren't really targeting one use case, instead, we looked for | the system that offered a broad set of features that other | developers in the company want and is operationally doable | with just a few people. | willvarfar wrote: | What's your plan on disaster recovery? Do your workers track | their own cursors, and if so, how does that work across | regions? | addisonj wrote: | In Pulsar, offsets are tracked by the service as part of the | bookkeeper data (unless you use the reader API which is only | really needed for advanced use cases like Flink), that means | we just need to do DR for bookkeepers, which I touch on in | another response but the tl:dr; is that we have a 3x | replication factor as well as EBS snapshots | ypcx wrote: | Alrighty, a few questions: | | - what k8s definitions do you use, e.g. do you use the official | Helm Chart, or have you written your .yaml's from scratch? | | - have you practiced disaster recovery scenarios in the context | of k8s? Can you describe them briefly? | | - how do you upgrade/redeploy the Pulsar k8s components, i.e. | does this cause the Bookies to trigger a cluster rebalance, or | does it trigger the Autorecovery | | - for the Bookies, do you use AWS EBS volumes with the EKS or | just local instance storage (that is, if you use persistent | topics) | | - do you use the Proxy pod's EKS k8s pod IPs as exposed on the | AWS network, or do you use a NodePort type of service for the | Proxy components (using the EKS node IPs) | | - have you been bitten by the recent EKS k8s network plugin bug | (loss of pod connectivity), and/or how do you maintain your EKS | cluster | | - do you run your EKS nodes in a multi-AZ setting? | addisonj wrote: | for the k8s definitions, we started with the helm chart, | rendered the template, and then moved it into kustomize, as | that is our tool of choice ATM, IDK if I would recommend that | approach for everyone (we expect we might move to helm v3 at | some point) but it was a good choice for us. | | We have practiced some disaster recovery, but it isn't 100% | exhaustive (is it ever?), however it is also aided by how | Pulsar is designed. We have killed bookie nodes as well as | lost all our state in zookeeper. The first is pretty easily | handled by the replication factor of bookkeeper data and for | zookeeper we do extra backup step and just dump the state to | s3 and can restore it. What we haven't tested in practice but | now how to do theoretically is to restore a k8s stateful set | from EBS volume snapshots. However, we see that as a real | edge case. In Pulsar, we offload our data to s3 after a few | hours, so we only need to worry about potentially losing a | few hours of data in BK, as the zookeeper state is very easy | to just snapshot and restore from s3. In other words, we are | still working on getting more and more confident with data | and don't yet recommend teams use it for mission critical | non-recoverable data, but there are a ton of uses cases for | it now and we can continue to improve on the DR front | | We have done multiple upgrades and deploy all the time. | Because bookkeeper nodes are in a stateful set and we have | don't do automated rollouts, we manually have a process to | replace the BK nodes. However, they don't trigger a re- | balance as it closes gracefully and then re-attaches the EBS | volume from the stateful set | | We use EBS volumes, we use a piops volumes for the journal | and a larger slower volume for the ledger store. THis is one | of the great parts of bookkeeper design is that the two disks | pools are separate so we just need a small chunk of really | fast storage and then the journaled data is copied over to | the ledger volume by a background process. We figure for | really high write throughput we could use instance storage | for the journal volume and EBS for ledger, but that would | have some complications on recovery but still easier than | having to rebuild the whole ledger data. | | We use the pulsar proxy and expose it via a k8s service with | the AWS specific NLB annotations. | | We haven't had any issues with the k8s plugin and haven't | really had any issues with EKS version upgrades. We just add | new nodes when we migrate the kubelets | | Yes, we have automation (via terraform) to allow us to add | many different pools of compute and we use labels and taints | to get specific apps mapped to specific pools of compute. For | Pulsar, we run all the components multi-AZ | addisonj wrote: | Oh forgot one aspect about DR: for critical data, we can | easily turn on geo-replication (with a single API call) and | have that data now in another region purely for DR purposes | (or for cross region use cases) | GordonS wrote: | Really interested why you chose Pulsar over RabbitMQ and | others? | addisonj wrote: | I have used (and deployed) rabbitmq in the past and really | love it for pub/sub, but for our needs, we keep needing | retention, particularly long retention that we process with | Flink for computing views. Having one system to do both is | great for us. | GordonS wrote: | Sorry, I'd missed that Pulsar was a streaming log system | (like kafka), _as well as_ a pub /sub system. HN title | misled me :) | [deleted] | 3fe9a03ccd14ca5 wrote: | The first question I have is why? SQS seems like such a simple | thing to keep hosted. | jjeaff wrote: | They said they are currently doing 50k messages a second and | they aren't even done migrating everything over. | | 50k messages a second would cost you around $50k a month for | AWS sqs, (math could be wrong, didn't double check). | | Plus, with sqs, you get what they have. No customizations. | deanCommie wrote: | I sincerely doubt they are sustaining 50k msgs/second. | Likely that's the MAXIMUM throughput. | | No way they would actually hit that sustained throughput | for the entire month. | | Even the other justifications about wanting to reference | messages after delivery do not to me justify migrating off | SQS/Kinesis, especially not at cost of 5 months development | effort. | rumanator wrote: | Even if you were off by an order of magnitude, that | expenditure level is not justified to run a message broker | service. | addisonj wrote: | Biggest thing is that we find ourselves needing to retain | this data for more than initial delivery and also for use | cases where we want to use the data more like a log and need | ordering guarantees. It isn't just our current SQS use cases, | it is being able to have one tech that does SQS like stuff | and Kinesis like stuff in one place | tybit wrote: | When we ran into a similar use case we went with writing to | S3 and using S3 SNS notifications and consumers could | subscribe their SQS queues to that. | | Having said that, assuming Pulsar has a similar feature to | Kafkas log compaction I can definitely see the appeal! | | Also the fact that SQS Fifo doesn't integrate with this | setup is super annoying. | | Edit: Log compaction not key compaction | js4ever wrote: | "high-level APIs for Java, C++, Python and GO", no love for | Node.js? :( | maxtollenaar wrote: | You can also use the websocket proxy: | https://pulsar.apache.org/docs/en/next/client-libraries-webs... | tyingq wrote: | It exists: https://pulsar.apache.org/docs/en/next/client- | libraries-node... | c0brac0bra wrote: | However it currently lacks the ability to listen for messages | and run an event handler when one comes in: | https://github.com/apache/pulsar-client-node/pull/56 | | You have to manually call ".receive()" to attempt to receive | a message. | gperinazzo wrote: | Using `.receive()` will occupy a worker thread from node | until it returns. Having multiple consumers waiting on | receive will clog up the worker threadpool, preventing | anything that uses it from running. If you want to use the | consumer right now, I would suggest always using a timeout | on the receive call, and waiting between timed-out calls to | receive. This is extremely important if you have multiple | consumers. | buboard wrote: | Why is apache developing all those servers that are only useful | to a handful of companies that are rich enough to build them | themselves? How about building something that individuals can | use, like, i dunno, apache server itself? | Eikon wrote: | If you did bother to read the linked page, you would have | understood that it's a yahoo project handed over to Apache for | management like many of Apache's projects. | buboard wrote: | yeah i m talking more generally about their full list here: | https://www.apache.org/ | mindw0rk wrote: | There are a lot of projects that were handed to Apache to | manage. Kafka for example was initially created by | LinkedIn. So yeah, you are right, big corps are actually | creating those tools, and in addition to this, giving it | away as open source to public. | eronwright wrote: | Best part is that the tools are put into production | before being open-sourced. In other words, they actually | work. | JackRabbitSlim wrote: | Another over engineered Lego block for quicker dev and even less | thought on design, upkeep or overhead. | | Now if you excuse me I need to go take my quad-core, petaflop | processing power and multiple gigabytes of RAM to read email from | a javascript infested, multi-byte to single byte encoded webpage | hosted across half a dozen server instances scattered across the | planet. | | CS is damned, and this is hell. | maximente wrote: | not really, people legitimately need this for truly big data. | you can't reliably processing 7 trillion events per day with a | completely C/unix CLI stack. | ivalm wrote: | I would bet that vast, vast majority of production kafka | deployments do not see 7 trillion events per day. I bet | many/most do not even see 7 billion events per day. | manigandham wrote: | This is one of the best designed pub/sub messaging systems | available, but you don't have to use it if you don't want to. | neeleshs wrote: | Please do elaborate | _frkl wrote: | Care to elaborate? I just started using the standalone version | of pulsar for a project, it looked better designed than Kafka, | resources usage looks quite acceptable so far, but i dont have | much experience with any solutions in this space, so im not | sure which problems im going to run into. Any suggestions for | good tech/strategies for a streaming-type setup like this? Or | what to do alternatively? Should i look into rolling my own? | breckcs wrote: | Being able to scale the durable-storage layer independently has a | lot of advantages. More thoughts here: | https://twitter.com/breckcs/status/1203736751681896449. | throwawaysea wrote: | A lot is said or referenced in this conversation about why people | chose Pulsar over Kafka. I'm not an expert in this area but are | there use cases where Kafka is still better? | bovermyer wrote: | How does this compare with NATS? | sqreept wrote: | NATS is a simpler PUB/SUB system that delivers in the UNIX | spirit of small composable parts. Apache Pulsar or Apache Kafka | deliver the banana, the ape holding it and the rest of the | jungle. | tylertreat wrote: | Check out Liftbridge (https://liftbridge.io) as a way to add | these capabilities to NATS. | manigandham wrote: | NATS is ephemeral pub/sub only. There is no persistence or | replay, but focuses on high performance and messaging patterns | like request/reply. | | Kafka and Pulsar persist every message and different consumers | can replay the stream from their own positions. Pulsar also | supports ephemeral pub/sub like NATS with a lot more advanced | features. | | NATS does have the NATS Streaming project for persistence and | replay but it has scalability issues. They're working on a new | project called Jetstream to replace this in the future. | barbarbar wrote: | How is it compared to kafka? | cbartholomew wrote: | You might want to check out this blog post I wrote comparing | Kafka to Pulsar: https://kafkaesque.io/5-more-reasons-to- | choose-apache-pulsar... | cbartholomew wrote: | If you have an O'Reilly subscription, you can also check out | this detailed report comparing Pulsar and Kafka: | https://learning.oreilly.com/library/view/apache-pulsar- | vers... | manigandham wrote: | Separates storage from brokers for better scaling and | performance. Millions of topics without a problem and built-in | tenant/namespace/topic hierarchy. Kubernetes-native. Per- | message acknowledgement instead of just an offset. Ephemeral | pub/sub or persistent data. Built-in functions/lambda platform. | Long-term/tiered storage into S3/object storage. Geo- | replication across clusters. | eerrt wrote: | Some latency benchmarks: https://kafkaesque.io/performance- | comparison-between-apache-... | SkyRocknRoll wrote: | Most of the flaws of Kafka are carefully studied and fixed in | Apache pulsar. I have written a blog about why we went ahead | with pulsar https://medium.com/@yuvarajl/why-nutanix-beam-went- | ahead-wit... | progval wrote: | > when consumers are lagging behind, producer throughput | falls off a cliff because lagging consumers introduce random | reads | | I am confused by this. The format of Kafka's log files is | designed to allow reading and sending to clients directly | using sendfile, in sequential reads of batches of messages. | http://kafka.apache.org/documentation/#maximizingefficiency | geeio wrote: | Kafka works best when the data it is returning to consumers | is in the page cache. | | When consumers fall behind, they start to request data that | might not be in the page cache, causing things to slow | down. | manigandham wrote: | Kafka brokers handle connections to consumers and data | storage. This creates contention as the primaries for each | partition have to service the traffic and handle IO. | Consumers that aren't tailing the stream will cause | slowdowns because Kafka has to seek to that offset from | files which aren't cached in RAM. | | Pulsar separates storage into a different layer (powered by | Apache Bookkeeper) which allows consumers to read directly | from multiple nodes. There's much more IO throughput | available to handle consumers picking up anywhere in the | stream. | mbostleman wrote: | This might be entirely off topic, but I'm having issues using | RabbitMQ whereby durability suffers because messages are sent to | remote hosts thus exposing them to both the network and remote | host availability. On a previous platform I used an MSMQ based | system which didn't have this problem since it uses a local store | and forward service. So all sends are to localhost and are not | affected by the network or the receiver availability. The MSMQ | system was my first and only experience with messaging up to now, | so I was surprised that any system would not work that way. How | is this dealt with in other systems? Is it just a feature that | exists or not and you just decide if it's important? And maybe | just to shoe horn it to be on topic, does Pulsar use a local | service? | manigandham wrote: | That's an inherent issue with distributed solutions and is | impossible to solve. The only way to deal with it is using | various techniques like acknowledgements, retries, local | storage, idempotency, etc. MSMQ handles that stuff behind the | scenes but the problem itself will always exist if there's a | network boundary. | | These other systems are designed to be remote with a network | interface. You can use the client drivers to handle | acknowledgements/retries/local-buffering in your own app or use | something like Logstash [1], FluentD [2], or Vector [3] for | message forwarding if you want a local agent to send to. You | might have to wire up several connectors since none of them | forward directly to Pulsar today. | | Also RabbitMQ is absolute crap. There are better options for | every scenario so I advise using something else like Redis, | NATS, Kafka, or Pulsar. | | 1. https://www.elastic.co/products/logstash | | 2. https://www.fluentd.org/ | | 3. https://vector.dev/ | bauerd wrote: | You're free to have queue and workers run on the same machine, | just bind to loopback. As soon as you deal with more than one | machine, which is required in HA scenarios, you deal with a | networked (distributed) system. I might not have understood | your question correctly though ... | | Edit: Maybe you're looking for acks/confirms? | https://www.rabbitmq.com/confirms.html | mbostleman wrote: | I have many machines, each of which have one or many | applications that send messages. And I have one machine with | an instance of Rabbit to which all messages are sent. If the | network is down or the Rabbit machine is down, the messages | are gone along with their data. | | Clustering the Rabbit machine helps one particular failure | scenario, but it's not a solution to the problem. | zackmorris wrote: | This looks promising. Is there such thing as a generalized SQL | query engine that runs over any key-value store that provides | certain minimal core operations? | | For example, say you have a KV Store with basic mathematical Set | operations like GET, SET, UNION, INTERSECT, EXCEPT, etc. The | Engine would parse the SQL and then call the low-level KV Store | Set operations, returning the result or updating KV pairs. This | explains how Join relates to Set operations: | | https://blog.jooq.org/2015/10/06/you-probably-dont-use-sql-i... | | Another thing I'd like is if KV stores exposed a general purpose | functional programming language (maybe a LISP or a minimal stack- | based language like PostScript) for running the same SQL Set | operations without ugly syntax. I don't know the exact name for | this. But if we had that, then we could build our own distributed | databases, similar to Firebase but with a SQL interface as well, | from KV stores like Pulsar. I'm thinking something similar to | RethinkDB but with a more distinct/open separation of layers. | | The hard part would be around transactions and row locking. A | slightly related question is if anyone has ever made a lock-free | KV store with Set operations using something like atomic compare- | and-swap (CAS) operations. There might be a way to leave requests | "open" until the CAS has been fully committed. Not sure if this | applies to ledger/log based databases since the transaction might | already be deterministic as long as the servers have exact copies | of the same query log. | | Edit: I wrote this thinking of something like Redis, but maybe | Pulsar is only the message component and not a store. So the | layering might look like: [Pulsar][KV Store (like Redis)][minimal | Set operations][SQL query engine]. | atombender wrote: | One of the challenges with layering SQL on top of a KV store is | query performance. | | The most obvious way to model a secondary index on top of a | pure KV store is to map indexed values to keys. For example, | given the (rowID, name) tuples (123, "Bob"), (345, "Jane"), | (234, "Zack"), you can store these as keys: | name:Bob:123 name:Jane:345 name:Zack:234 | | At this point you don't need or even want values, so this is | effectively a sorted set. | | Now you can easily find the rowID of Jane by doing a key scan | for "name:Jane:", which should be efficient in a KV store that | supports key range scans. You can do prefix searches this way | ("name:Jane" finds all keys starting with "Jane"), as well as | ordinal constraints ("age > 32", which requires that the age | index is encoded to something like: | age:Bob:\x00\x00\x00\x20:123 | | To perform an union ("name = 'Bob' OR name = 'Jane'"), you | simply do multiple range scans, performing a merge sort-ish | union operation as you go. To perform an intersection ("name = | 'Bob' AND age > 10"), you find the starting point for all the | terms and use that as the key range, then do the merge sort. | | This is what TiDB and FoundationDB's record layers do, which | both have a strict separation between the stateless database | layer and the stateful KV layer. | | The performance bottleneck will be the network layer. Your | range scan operations will be streaming a lot of data from the | KV store to the SQL layer, and potentially you'll be reading a | lot of data that is discarded by higher-level query layers. | This is why TiKV has "co-processor" logic in the KV store that | knows how to do things like filter; when TiDB plays your query, | it pushes some query operators down to TiKV itself for | performance. | | Unfortunately, this is not possible with FoundationDB. This is | why FoundationDB's authors recommend you co-locate FDB with | your application on the same machine. But since FDB key ranges | are distributed, there's no way to actually bring the query | code close to the data (as far as I know!). | | I'm sure you could do something similiar with Redis and Lua | scripting, i.e. building query operators as Lua scripts that | worked on sorted sets. I wouldn't trust Redis as a primary data | store, but it can be a fast secondary index. | PaulHoule wrote: | e.g. how about a complex event processing engine? Something | like that will do a lot of the above, but the inference | database stays managable since old data will fall out of the | windows. | eronwright wrote: | Take a look at the Apache Flink CEP library, which operates | over unbounded streams: | https://ci.apache.org/projects/flink/flink-docs- | stable/dev/l... | TuringNYC wrote: | I usually dont associated CEP with this, but is makes sense. | Are they meant to operate at this level (rather than at a | higher level of abstraction)? | | Which ones would you recommend looking at? | manigandham wrote: | Spark [1], Presto [2], and Drill [3] can all do that with | connectors to different data sources and varying support for | advanced SQL. | | Pulsar has support for Presto: | https://pulsar.apache.org/docs/en/sql-overview/ | | Pulsar isn't a KV store though, it's a distributed | log/messaging system that supports a "key" for each message | that can be used when scanning or compacting a stream. GET and | SET aren't individual operations but rather scans through a | stream or publishing a new message. | | If you just want to have a SQL interface to KV stores or | messaging systems that support a message key then Apache | Calcite [4] can be used as a query parser and planner. There | are examples of it being used for Kafka [5]. | | 1. https://spark.apache.org/ | | 2. https://prestodb.io/ | | 3. https://drill.apache.org/ | | 4. https://calcite.apache.org/ | | 5. https://github.com/rayokota/kareldb | coryvirokmobile wrote: | Regarding the generic sql engine - it looks like this is what | Apache Calcite was designed for. | | https://calcite.apache.org/ | reggieband wrote: | I'm still on the fence with these distributed log/queue hybrids. | From a theoretical perspective it seems these are excellent. I | just have this nagging suspicion that there is some even-worse | problem architectures based on these systems will harbor. This | kind of ambivalence is something I find myself having to battle | more and more in my career as I age. Most of the time the hype | around new design/development patterns leads to a worse | situation. Very rarely it leads to a significant improvement. I | dislike that my first impression looking at a system like this is | risk aversion. | zomglings wrote: | Your risk aversion seems justified. It seems reasonable to | estimate that very few teams are in the position of needing the | kind of scale/scalability that something like Apache Pulsar | offers. They are much more likely to be in either a state where | they will not put Pulsar through its paces or where they | already have a solution in place that serves their | scale/scalability needs. | | When a team you are on starts discussing switching over to a | technology like Pulsar because of its amazing benefits, unless | your pants are on fire, it is much more likely than not that | you do not stand to gain much from the benefits that such | software brings but you are accepting the maintenance burden | that it represents. | mpmpmpmp wrote: | I totally get where you're coming from as I feel like that alot | too. But the fact that you are thinking about risk and business | value over working on cool new tech should be a positive for | the ventures you are a part of. | alfalfasprout wrote: | I keep seeing new message queue solutions pop up over the years | and it's just been my impression at least that this is one area | where silicon valley really is way behind the trading industry. | | Reliable pub/sub that supports message rates over 100k/sec (even | up to the millions) has been available for a while now and with a | great deal of efficiency (eg; the Aeron project). The incredible | amount of effort to support complex partitions, extreme fault | tolerance (instead of more clever recovery logic), etc. add a lot | of overhead. To the point of talking about "low latency" overhead | in the order of 5ms instead of microseconds or even nanoseconds | as is expected in trading. | | Worse, many startups try to adopt these technologies where their | message rates are miniscule. To give you some context, even two | beefy machines with an older message queue solution like ZeroMQ | can tolerate throughput in excess of what most companies produce. | | This is not to discredit the authors of Pulsar or Kafka at all... | but it's just a concerning trend where easy to use horizontally | scalable message queues are being deployed everywhere. Similar to | how everyone was running hadoop a few years back even when the | data fit in memory. | tylertreat wrote: | ZeroMQ is not a message queue, it's a networking library. | gfodor wrote: | Worth noting that Kafka is not a queue, but an append-only log. | mickster99 wrote: | Is there a reason you went with Pulsar over Kafka? How is the | pulsar community? Where are you turning when you have support | issues? | mattboyle wrote: | We tried to adopt this but found the documentation very lacking | and a severe lack of quality client libraries for our language of | choice (go).the "official" one had race conditions in the code as | well as "todo" for key pieces littered throughout. There is | another from comcast which is abandoned. We had a serious | discussion about picking up ownership of the library or writing | our own but as a small start up we didnt feel we could do it and | still develop the product. I'll continue to keep an eye on pulsar | but for now Kafka is the clear go to imo. It's well documented, | great SAS offerings (confluent) and tons of books and training | courses for it. | cbartholomew wrote: | We provide a SaaS offering of Apache Pulsar in AWS, Azure, and | GCP: https://kafkaesque.io/ | jjeaff wrote: | Cool name. That's one of those company names that almost | seems like someone thought it would make a good company name | first and thought it was so fitting, they should build a | company around it. | cbartholomew wrote: | Thanks! | mattboyle wrote: | I didnt find this when looking, thanks will take a deeper | look. | ckdarby wrote: | > found the documentation very lacking | | Really? It is one of the few open source projects that we've | felt has had modern documentation. How long ago was this? | | > As a small startup | | You'll spend more time & money on the OpEx cost with Kafka than | picking up the client library for Pulsar. | mattboyle wrote: | It was about 6 months ago. | | I completely disagree with the opex of picking up kafka vs | developing a whole client library. Please could you try and | explain how you came to this conclusion? | ckdarby wrote: | > Please could you try and explain how you came to this | conclusion? | | 1. Stateless brokers | | With Kafka any time a broker goes down you need to be aware | of the kafka broker id. Yes, this can be fixed by creating | your entire infrastructure as code and keeping track of | state. | | This is something of great OpEx. I've seen few people | successfully automate this, Netflix is one of the few. The | rest just use manual process with tooling to get around, | pager, Kafka tooling to spawn replacement node with the | looked up broker id, etc. | | 2. Kafka MirrorMaker | | Granted I have not used v2 that recently came out in ~2.6 | but dear gosh v1 was so bad that Uber wrote their own | replacement from the ground up called uReplicator. The | amount of time wasted on replication broken across regions | is disgusting. | | 3. Optimization & Scaling | | Kafka bundles compute & storage. There's (maybe on a | upcoming KIP) no way that I know of splitting this. This | means you'll waste time on Ops side deciding on tradeoffs | between your broker throughput and your broker space. | | Worse yet time & money will be wasted here. I'd just rather | hire more people than waste time on silly things like this. | This is where I justify taking on the expense of client | libs. | | 4. Segments vs Partitions | | The major time wasters are where you end up in a situation | with the cluster utterly getting destroyed. It will happen, | it isn't a question of if but a question of when or the | company goes belly up and nobody cares. | | It's 3 AM, the producer is getting back pressure, you get a | page and now have to deal with adding on write capacity to | avoid a hot spot. Don't forget you can't just simply do a | rebalancement in Kafka or you'll break the contract with | every developer who has developed under the golden rule of, | "Your partition order will always be the same". | | You'll successfully pay the cost of upgrading the entire | cluster and then spending 3 days coming up with a solution | to rebalance without making all your devs riot against you | when you break that golden contract. | | RIP Kafka | | Having spent a couple of years dealing with Kafka I'm sorry | to burst people's bubbles but is dead. Even Confluent | doesn't have a good enough story these days to not switch | to Pulsar, they're going to sell you on the same consulting | bs, "We're more mature", "We've got better tooling.", | "Better suppott"... | | Yes, of course, it has been in the open source community 5 | years longer and the company has been also around longer | for that time. Kafka is dead, long live Pulsar. | bubbleRefuge wrote: | I think what is dead is confluent cloud b/c Amazon MSK | and Azure HDInsight will be close to feature parity at | much less cost. | ckdarby wrote: | Damn, I got lazy on my reply & just hoped nobody went | further, but well played on digging deeper. | | 5. Kafka is silly expensive | | Pulsar supports message ack with subscription groups. The | worst case with Pulsar is you're storing the entire | retention period. | | Let's say you have a 4 day retention window, to cover an | outage happening on Friday and not having to deal with it | until Monday. This is pretty typical with what I see in | the Kafka world for small-mid size companies who don't | want to pay the 1.5x OT on call. | | So, with Pulsar you're at worst storing the 4 days of | data but at best you're only storing the messages within | the lag period of all consumer groups acknowledging the | message. | | Now, without getting too deep into Pulsar's feature set | even that is a lie because Pulsar has tiered storage as a | first class citizen. The messages after the four days | could be ship off to S3 if we wanted or even within 1 day | depending on our use case and this is all built into | Pulsar, no OpEx tooling required. Even access the | messages from S3 through Pulsar is abstracted, there's no | tooling required to pull them back in if you wanted. | | Now with Kakfa our worst case is simply 4 days of | retention data. This can get very expensive as compute & | storage are tied together, it means scaling up all the | brokers (even though we don't need the throughput) for | the storage increase. Now, yes MSK basically abstracts | all this from you but you're paying for it. | | 6. AWS Managed Service are not equal citizens to EC2 | standalone | | Managed services right now don't fall under the new | Saving Plan: https://aws.amazon.com/blogs/aws/new- | savings-plans-for-aws-c... | | This will cost you 30-60% discount on your entire Kafka | bill. | | 7. Excel Life | | If I look at the numbers for what I'm doing it would have | costed ~$4M for Kafka vs ~$1M for Pulsar. | cpx86 wrote: | > You'll spend more time & money on the OpEx cost with Kafka | than picking up the client library for Pulsar. | | Could you elaborate why this would be the case? | gen220 wrote: | Not the OP, but I think they were exaggerating a bit. In | practice, operating kafka is a major PITA, because it means | you have to | | (1) choose a "flavor" wrapper (confluent seems to be a | popular one), because the base project isn't easy to | develop against | | (2) write your own wrappers of those wrappers, to keep your | developers from shooting themselves in the foot with wacky | defaults | | (3) suffer the immense pain that is authenticating topic | write/reads, if that's even possible??? | | (4) stand up zookeeper... and probably lose some data along | the way. | | (5) suffer zookeeper outages due to buggy code in kafka/zk | (I've experienced lost production data due to unpredictable | bugs in kafka/zk, but obviously YMMV). | | Based on my naive assessment, the kafka/zookeeper ecosystem | is maybe 10x as complicated as the problem it's solving, | and that shows up in the OpEx. I personally doubt that | Pulsar is _that_ much better, but it might be. | ckdarby wrote: | These are also valid. I wrote the reply explaining some | of the OpEx here: | https://news.ycombinator.com/item?id=21938463 | EdwardDiego wrote: | What do you mean by 1 and 2? I'm guessing you're | referring to the kafka-clients API? The defaults for | producer and consumer conf are quite sensible these days. | matteomerli wrote: | We're close to release a new "officially supported" native Go | client library: https://github.com/apache/pulsar-client-go | jgraettinger1 wrote: | If you're a Go shop, Gazette is worth a look | (https://gazette.dev). | eatonphil wrote: | Most of the comments are just pro-Pulsar but what's the | architectural trade-off? (Non-architectural trade-off is that | Pulsar is a new system to learn for folks familiar with | maintaining and using Kafka.) | manigandham wrote: | Pulsar is better designed than Kafka in every way with the main | trade-off being more moving pieces. That's why the recommended | deployment is Kubernetes which can manage all that complexity | for you. | | Pulsar also lacks in the size of the community and ecosystem | where Kafka has much more available. | qeternity wrote: | Someone linked some benchmarks here (on mobile and can't find) | that showed a single node Kafka outperforms but as soon as you | start scaling Pulsar pulls ahead. I'm not familiar enough with | the nitty gritty to comment beyond that. | srameshc wrote: | This is great !! What would be the easiest way to run a 3 node | cluster ? | ckdarby wrote: | The standalone mode will let you get started as a developer. | You grab tar.gz, uncompress, run standalone.sh. | | There are helm charts for running an actual cluster. | [deleted] ___________________________________________________________________ (page generated 2020-01-02 23:00 UTC)