[HN Gopher] Apache Pulsar is an open-source distributed pub-sub ...
       ___________________________________________________________________
        
       Apache Pulsar is an open-source distributed pub-sub messaging
       system
        
       Author : LinuxBender
       Score  : 360 points
       Date   : 2020-01-02 15:40 UTC (7 hours ago)
        
 (HTM) web link (pulsar.apache.org)
 (TXT) w3m dump (pulsar.apache.org)
        
       | firstposterone wrote:
       | Splunk just acquired streamlio and most of the core devs got
       | sucked up. While pulsar is a great product - are you not
       | concerned that these guys are getting paid $$ bank to do
       | something else now?
        
         | matteomerli wrote:
         | spoiler: we're still working on Pulsar
        
           | firstposterone wrote:
           | That's good news! I guess it would be very helpful to
           | formally address this concern. Is there something that has
           | been written / published to that effect?
        
       | bsaul wrote:
       | Sidenote question :
       | 
       | Are we heading toward a split between apache/java/zookeeper
       | stacks and go/etcd on the other ? I've seen an issue related to
       | that question on pulsar, and this got me investigating the
       | distributed KV part of the stack.
       | 
       | It seems by looking at some benchmark that etcd is much more
       | performant than zookeeper, and that to some people, operating two
       | stacks seems like an operation maintenance cost a bit too high.
       | Is that a valid concern ?
       | 
       | Also, i've seen that kafka is working on removing the dependency
       | to zookeeper, is pulsar going to take the same road ?
        
         | geodel wrote:
         | This sound about right. Apart from maybe original Apache HTTP
         | server most of the Apache projects are in Java.
         | 
         | Looking at codebase of Pulsar it looks like typical Apache
         | style sprawling Java project with more than thousand
         | directories, many thousand files and more than hundred
         | dependencies. As comparison NATS which is in Go has few hundred
         | files, less than hundred directories and about a dozen or so
         | dependencies.
        
           | tyri_kai_psomi wrote:
           | NATS is an amazing project, I just wanted to take the
           | opportunity to highlight it for those first hearing about it
           | in this comment. It's so brilliantly simple, yet changed the
           | way I design distributed systems. I handle almost anything in
           | regards to the standard messaging guarantees that a Kafka-
           | like system offers at the endpoints now. As a result, systems
           | are much simpler, and diagnosability of bugs or edge cases
           | are much more straightforward.
        
             | _frkl wrote:
             | This sounds interesting, what exactly do you mean by
             | 'endpoint' in this scenario? I looked into a few
             | alternatives before settling for pulsar, and disregarded
             | nats because it didn't seem to support message persistence.
             | I didn't look into it too deeply though, maybe i should
             | have. How do you guarantee no message is lost with NATS?
        
               | geodel wrote:
               | Did you check this?
               | 
               | https://docs.nats.io/nats-streaming-concepts/intro
               | 
               | "..Message/event persistence - NATS Streaming offers
               | configurable message persistence: in-memory, flat files
               | or database. The storage subsystem uses a public
               | interface that allows contributors to develop their own
               | custom implementations."
               | 
               | and
               | 
               | "At-least-once-delivery - NATS Streaming offers message
               | acknowledgements between publisher and server (for
               | publish operations) and between subscriber and server (to
               | confirm message delivery). Messages are persisted by the
               | server in memory or secondary storage (or other external
               | storage) and will be redelivered to eligible subscribing
               | clients as needed."
        
               | _frkl wrote:
               | No, i missed that. I think ive seen 'nats streaming', but
               | didn't realize that it is its own distinct thing. All
               | this makes more sense now to me, thanks!
        
               | tylertreat wrote:
               | Also check out Liftbridge (https://liftbridge.io), which
               | is a Kafka-like API on top of NATS.
               | 
               | Disclaimer: I'm the author and former core contributor of
               | NATS and NATS Streaming.
        
               | tyri_kai_psomi wrote:
               | In my thinking, I think of an endpoint as something at
               | either end of the communication channel (NATS in this
               | case) where it is effectively terminal. Usually this is
               | where the application logic lies. Dereck Collison
               | (creator of NATS) brings this up in many of his talks
               | about NATS, but I think the source of his thinking might
               | come from "End-to-End Arguments in System Design" by
               | Saltzer, Reed, & Clark.
               | 
               | The core of it is this point:
               | 
               | "Functions placed at low levels of a system may be
               | redundant or of little value when compared with the cost
               | of providing them at that low level."
               | 
               | That is, in order get that message redundancy or exactly
               | once delivery, or message persistence, you pay a high
               | cost, and you may be better off delegating to the
               | endpoints.
               | 
               | This blog provides a good overview
               | 
               | https://blog.acolyer.org/2014/11/14/end-to-end-arguments-
               | in-...
               | 
               | Here is the original paper
               | 
               | http://web.mit.edu/Saltzer/www/publications/endtoend/endt
               | oen...
        
               | _frkl wrote:
               | Thanks, much appreciated!
        
             | bsaul wrote:
             | Is it me or does NATS looks like it's aimed at an actor-
             | based style of distributed system ?
        
               | JensRantil wrote:
               | It not necessarily aiming at that.
        
             | vorpalhex wrote:
             | NATS is amazing but note that it makes different promises
             | than Pulsar. NATS doesn't offer true durability (in
             | exchange for amazing performance and great simplicity)
             | whereas Pulsar and similar are meant to survive certain
             | partition or failure situations and not lose data.
             | 
             | It's not one or the other, they're just different tools.
        
               | [deleted]
        
               | jjjensen90 wrote:
               | There is nats-streaming-server as well which offers true
               | durability (via file or SQL store) and a streaming model
               | very similar to Kafka and Pulsar. It can also run as a
               | raft cluster or in fault tolerance mode. It still has
               | very good performance and is very simple to deploy and
               | operate (I use it for event sourcing for real time IoT
               | data at my day job).
        
           | vips7L wrote:
           | Its more than one project in the repo.
        
         | nvarsj wrote:
         | It's an interesting observation.
         | 
         | I think that the modern approach to distributed systems is
         | moving towards golang style microservices and lightweight /
         | simple system design with RPC communication, reconcile type
         | loops for state reconciliation, and backing CP databases. I
         | think this is the influence of k8s (and maybe google's approach
         | to distributed systems).
         | 
         | I will almost certainly get downvoted for this (as I always
         | seem to when I criticize the JVM), but Apache/JVM style
         | architecture feels REALLY long in the tooth to me. I think you
         | are committing to an outdated and very expensive approach to
         | building software if you use anything running on the JVM,
         | especially Apache based anything. Cassandra is a great example
         | of this - out of the box it's a terribly performing database
         | that is extremely expensive to run and tune. Throw enough
         | resources and time at it and you can get it to acceptable
         | scalability - but running on the JVM which is a huge memory hog
         | will always make it expensive to run (and even then, you will
         | always get terrible latency distributions with the JVM's awful
         | GC).
         | 
         | If I was building a business I would run far far away from any
         | JVM based solution. The only thing it has going for it is
         | momentum. If you need to hire 100s of engineers off the street
         | for a large project, then a JVM based stack is about your only
         | option unfortunately.
        
           | vips7L wrote:
           | For what its worth I have a java microservice running on 13mb
           | of ram.
        
           | lenkite wrote:
           | Nowadays Java based micro-services can compile to native
           | thanks to Graal native-image and frameworks like:
           | https://quarkus.io/.
        
           | squarecog wrote:
           | I understand your architecture criticism, and think it has
           | merit, but I'm not sure why Apache gets dragged into that.
           | Apache Airflow is in Python. Apache Arrow is in C. CouchDB is
           | Erlang.
           | 
           | There's a ton of projects Apache Foundation hosts that fit
           | your description but it's a mistake, I think, to confuse
           | individual projects with Apache in general. Bad enough that
           | people confuse the license with the foundation.
        
           | james-mcelwain wrote:
           | > the JVM's awful GC
           | 
           | This just makes it seem like you are trolling. JVM devs have
           | done more to advance state of art in this area than any other
           | language. The problem is that most JVM apps just produce too
           | much garbage, not necessarily that the algo itself is awful.
           | 
           | Either way, there's no such thing as an optimal GC algorithm,
           | just different trade-offs depending on your use case. Not
           | everyone cares about latency.
        
         | pm90 wrote:
         | It depends. We have a ton of Java apps running atop of
         | kubernetes. All of them use zk, but every team operates their
         | own mini zk cluster deployed on k8s. It's worked fine except
         | for certain hard to debug problems that hit a few teams
         | occasionally.
         | 
         | I guess my point is that k8s let's you shift the operational
         | burden to dev teams if they need it. If you have a centralized
         | operations team running a giant, common zk/ etcd, yeah this
         | would be additional operational burden.
        
         | matteomerli wrote:
         | > is pulsar going to take the same road ?
         | 
         | Yes, it's in the works
        
           | FBISurveillance wrote:
           | Related Kafka KIP: https://cwiki.apache.org/confluence/displa
           | y/KAFKA/KIP-500%3A...
           | 
           | It doesn't look like it's going to be ready anytime soon
           | though.
        
           | wilhow wrote:
           | Just to confirm, Pulsar has on it's roadmap to remove it's
           | dependency on Zookeeper. Is that correct?
        
             | matteomerli wrote:
             | That's correct, we're moving to have a pluggable metadata
             | store and coordination service.
        
           | whycombin8 wrote:
           | Any plans for bookkeeper?
        
         | atombender wrote:
         | I can't wait for projects to ditch ZooKeeper. Apache
         | Bookkeeper, which Apache Pulsar uses for its state, already
         | supports Etcd as a consensus store (though I believe this is
         | still alpha? beta? quality). Pulsar is also working on
         | supporting Etcd.
        
         | PaulHoule wrote:
         | Zookeeper sux. If you are in the Java world you can often roll
         | something better out using Hazelcast.
        
       | cbnotfromthere wrote:
       | Can it be used to build something like a Whatsapp-like chat
       | system?
       | 
       | If yes, why? If no, why?
        
       | eerrt wrote:
       | How does this compare to Redis Pub-Sub or RabbitMQ?
        
         | Joeri wrote:
         | It's closer to redis streams, except like kafka you can scale
         | topics beyond the limits of a single server because they can be
         | distributed. You couldn't run the twitter firehose over redis
         | streams, but you can run it over pulsar or kafka, given enough
         | hardware.
        
         | xrd wrote:
         | Or Firebase?
        
           | Eikon wrote:
           | Or Kafka ?
        
             | manigandham wrote:
             | Read the other comment thread:
             | https://news.ycombinator.com/item?id=21936523
        
             | sz4kerto wrote:
             | Kafka is primarily designed for streaming, pulsar is both
             | for streaming and queueing.
             | 
             | Firebase is a completely different animal.
        
             | qaq wrote:
             | Scales much better the storage layer is separate from
             | brokers so you can scale things independently.
        
         | [deleted]
        
         | sz4kerto wrote:
         | Very different. Pulsar is primarily a Kafka competitor.
         | 
         | - it is much more performant than RabbitMQ - it's a commit log
         | as well, not just a pub-sub system, ie. it is a good candidate
         | as the storage backend for event sourcing - it supports
         | geodistributed and tiered storage (eg. some data on NVMe
         | drives, some on a coldline storage) - it's persistent, not in-
         | memory (primarily)
         | 
         | .. and so on.
        
           | TuringNYC wrote:
           | I went to https://pulsar.apache.org but didnt find a "Why
           | Pulsar and not Kafka" -- is there an answer to that, or is
           | this another Kafka competitor with the same strengths and not
           | a specific differentiator?
        
             | cbartholomew wrote:
             | Here is a two-part blog post I wrote on why Pulsar and not
             | Kafka: https://kafkaesque.io/5-more-reasons-to-choose-
             | apache-pulsar...
        
               | richdougherty wrote:
               | Thanks! And Part 1 seems like a good place to start:
               | https://kafkaesque.io/7-reasons-we-choose-apache-pulsar-
               | over...
        
           | EGreg wrote:
           | What about ZeroMQ?
           | 
           | Why use RabbitMQ and Kafka if you can use ZeroMQ? Meaning,
           | isn't it far more performant and distributed?
           | 
           | Maybe I am missing something here.
        
             | Joeri wrote:
             | Message queues and message logs do different things. The
             | idea of the log is that subscribers can show up after the
             | log is written and read or reread it from the beginning. In
             | an event sourced architecture you use the log as the source
             | of truth and all consumers can replay the log against a
             | local store to reconstruct a view of the system's state.
             | You also can use a log for pubsub, but if that's all you
             | need one of the MQ solutions is usually a better fit.
        
             | sz4kerto wrote:
             | > Why use RabbitMQ and Kafka if you can use ZeroMQ?
             | 
             | They are totally different, you're comparing apples with
             | oranges.
             | 
             | ZeroMQ gives you basic, very fast tooling to communicate
             | between distributed processes. ZeroMQ does not provide
             | tooling for e.g. maintaining a strictly ordered, multi-
             | terabyte event log. And so on.
        
               | EGreg wrote:
               | Yes but isn't this a bit like comparing git / bitkeeper
               | vs subversion / perforce?
               | 
               | Basically, one is decentralized and you can set up a
               | massively parallel architecture, with eg each topic or
               | subthread having its own pubsub.
               | 
               | The other is a monolithic centralized pubsub
               | architecture.
               | 
               | You could argue that git in large institutional projects
               | converges to a monolithic repo so at that point it's less
               | efficient even than svn.
               | 
               | But for most use cases, ZeroMQ would allow far more
               | flexible distributed systems topologies and solutions.
               | No?
               | 
               | Edit: HN and Google are both awesome:
               | https://news.ycombinator.com/item?id=9634925
        
               | frogger23123 wrote:
               | > You could argue that git in large institutional
               | projects converges to a monolithic repo so at that point
               | it's less efficient even than svn.
               | 
               | Not true. Facebook and Google do not use Git. Microsoft
               | does not use vanilla Git for their monorepo. They created
               | this extension to make it scalable
               | https://en.wikipedia.org/wiki/Virtual_File_System_for_Git
        
           | dickjocke wrote:
           | Why is one Apache providing competing with another one?
        
             | spelunker wrote:
             | Plenty competing Apache projects exist, Pulsar and Kafka
             | aren't unique in that regard.
             | 
             | I don't think Apache cares if it's maintaining similar
             | projects.
        
         | qaq wrote:
         | This scales to multi-datacenter deployments well. Has strong
         | multi-tenancy support if memory serves Yahoo is running a
         | single cluster for all of their properties.
        
         | tyingq wrote:
         | It's a persistent store, so that would be different from Redis
         | Pub-Sub. Compared to RabbitMQ, Pulsar seems to favor strong
         | ordering and protection from message loss.
         | 
         | This blog post offers some more info and leaders to other posts
         | comparing Pulsar to RabbitMQ and Kafka: https://jack-
         | vanlightly.com/blog/2018/10/2/understanding-how...
        
       | [deleted]
        
       | samzer wrote:
       | For a moment I thought Bajaj and TVS came together.
        
         | [deleted]
        
         | lonesword wrote:
         | Captain here.
         | 
         | TVS and Bajaj are major motorbike manufacturers in India, and
         | TVS had a model named "Apache" and Bajaj had a model named
         | "Pulsar".
         | 
         | Flies away
        
           | takeda wrote:
           | It's not obvious if you're not from India, so thank you for
           | the explanation :)
        
           | opendomain wrote:
           | Thank you for the explanation.
        
       | addisonj wrote:
       | I just finished rolling out Pulsar to 8 AWS regions with geo-
       | replication. Messages rates are currently at about 50k msgs/sec
       | but still in the process of migrating many more applications. We
       | run on top of kubernetes (EKS).
       | 
       | It took about 5 months for our implementation with a chunk of
       | that work mostly about figuring out how to integrate our internal
       | auth as well as a using hashicorp vault as a clean automated way
       | to get auth tokens for an AWS IAM role.
       | 
       | Overall, we are very pleased and the rest of the engineering org
       | is very excited about it and planning to migrate most of our SQS
       | and Kinesis apps.
       | 
       | Ask me anything in thread and will try and answer questions. At
       | some point we will do a blog post on our experience.
        
         | bubbleRefuge wrote:
         | How does python stream processing work. Are the modules running
         | in the JVM ? Jython ?
        
           | matteomerli wrote:
           | The user code it's all running in a native CPython
           | interpreter.
        
         | ignoramous wrote:
         | On behalf of everyone here, _thanks a lot_ for answering every
         | single question being asked. Highly appreciate it.
         | 
         | I have questions myself:
         | 
         | 1. Did it reduce (TCO) costs or increase it versus using
         | Kinesis and SQS/SNS?
         | 
         | 1a. Interestingly, there's no global-replication with those AWS
         | services. Why did you require global-replication with the move
         | to Apache Pulsar?
         | 
         | 2. Since you mention _internal auth_ : Weren't Cognito / KMS /
         | Secrets Manager up to the job? Given these are integrated out-
         | of-the-box with EC2?
         | 
         | 3. Was it ever under-consideration to roll out pub/sub on top
         | of Aurora for Postgres with Global Replication?
         | https://layerci.com/blog/postgres-is-the-answer/
         | 
         | Thanks again.
        
           | addisonj wrote:
           | 1. On a short time horizon, not as sure, back of the napkin,
           | it took ~12 dev months (5 months with 2.5 people average on
           | it). However, our cost per 1000 msgs/sec is _much_ lower
           | (like 1 /4 the cost of Kinesis) so we fully expect that
           | investment to pay off over time assuming that adoption by the
           | rest of the org continues and we don't find a ton of issues.
           | 
           | 1a. You are correct we didn't require geo-replication for
           | existing use cases, however, initially, we saw geo-
           | replication as an easy way to improve DR and we have an
           | internal requirement for a DR zone in another region. Now
           | that we have done the work, we are starting to see multiple
           | places where we can simplify some things with geo-
           | replication, so we think long term the feature will be really
           | valuable
           | 
           | 2. We split up auth into two main components: auth of users
           | (where we use Okta) and auth of services. For okta, we just
           | wrote a small webapp that users can log into via OKta and
           | generate credentials. For apps/services, we already had
           | hashicorp in place and wanted to just piggyback of our
           | existing form of identity (IAM roles). Essentially, a user
           | just associates an IAM role with a pulsar role and we
           | generate and drop off credentials into a per-role unique
           | shared location in vault that any IAM role can access (across
           | multiple AWS accounts)
           | 
           | 3. Once again, geo-replication wasn't really a hard
           | requirement initially but more of something that we really
           | like now that we have. I think the biggest reason why not
           | postgres is that we have combined message rates (not
           | everything is migrated yet) on the order of 300k msgs/sec
           | across a few dozen services. Pulsar is designed to scale
           | horizontally and also has really great organizational
           | primitives as well as an ecosystem of tools. While I think
           | you could maybe figure that out with some PG solution, having
           | something purpose built really can pay big dividends for when
           | you are trying to make a solution that can easily integrate
           | into a complex ecosystems of many teams and many different
           | apps/use cases
        
             | ignoramous wrote:
             | Agreed.
             | 
             | One more:
             | 
             | For replication across regions, do you peer VPCs via
             | Transit Gateways or some such, or do it over the public
             | Internet? I ask because a lot of folks complain about
             | exobhirant AWS bandwidth charges for cross-AZ and cross-
             | region communication (esp over the Internet versus over
             | AWS' backbone): At 300k msgs/sec, the bandwidth costs might
             | add up quickly?
             | 
             | Consequently, maintaining a multi-region, multi-AZ VPC
             | peering might have been complicated without Transit
             | Gateway, so I'm curious how the network side of things held
             | up for you.
        
               | addisonj wrote:
               | In this case, we use just straight VPC peering with a
               | full mesh of all our regions. We may eventually migrate
               | to being built on our VPN based mesh (we do that in other
               | places)
               | 
               | Bandwidth is certainly a concern and that is one of the
               | nice bits about Pulsar is not everything is replicated.
               | You mark a namespace by adding additional clusters it
               | should replicate to. We don't expect to replicate
               | everything, just the things teams care about.
               | 
               | When we did this, Transit Gateway was just within the
               | same region. At re:invent they announce the cross region
               | transit gateway which we will look at moving to as well,
               | but for now, it is just a full mesh of VPC peers, which
               | for 8 regions isn't bad... but certainly gets worse with
               | each new region we need to add.
               | 
               | For exposing the service into other VPCs in the same
               | region we use private-link endpoints as to avoid needing
               | to do even more peering.
        
         | mavdi wrote:
         | Not questioning your judgement but interested to know about the
         | factors moving you away from Kinesis.
        
           | ckdarby wrote:
           | Kinesis is very expensive in the long run. There's almost
           | always an intersection point on AWS where you need to
           | consider moving away from AWS services/"managed services" and
           | bring it in house.
        
           | addisonj wrote:
           | Biggest pain points with Kinesis:
           | 
           | - ordering is really hard, you don't get guaranted ordering
           | unless you write one message at a time or do a lot of
           | complexity on writes (see https://brandur.org/kinesis-order)
           | and the shards are simply too small for many of our ordered
           | use cases
           | 
           | - cost, we just don't send some data right now because it
           | would just be too much relative to the utility of the data
           | (we would need like 250 shards)
           | 
           | - retention, long term, we want to store data in Pulsar with
           | up to unlimited retention so we can rebuild views. There is
           | still some complexity there (like getting parallel access to
           | segments in a topic for batch processing) but it is much
           | further along than any other options
           | 
           | - client APIs for consumer. We are a polyglot shop and really
           | the only language where consuming Kinesis isn't terrible is
           | Java (and other jvm languages). For every other language, we
           | use lambda and while lambda is great it is still distinct
           | deploy and management process from the rest of the app. Being
           | able to deploy a simple consumer just as part of the app is
           | really nice
        
         | rocky1138 wrote:
         | Why did you choose Pulsar?
        
           | addisonj wrote:
           | The main driver for Pulsar is that we have a number of
           | different messaging use cases, some more "pub/sub" like and
           | some that are more "log" like. Pulsar really does unify those
           | two worlds while also being a ton more flexible than any
           | hosted options.
           | 
           | For example, Kinesis is really limiting with the limited
           | retention and making it very difficult to do any real
           | ordering at scale due to the really tiny size of each shard.
           | 
           | Similarly, SQS does pub/sub well, but we keep finding that we
           | do need to use the data more than the first initial delivery.
           | Instead of having multiple systems where we store that data
           | we have one.
           | 
           | As for why we didn't go with Kafka, the biggest single reason
           | is that Pulsar is easier operationally with no needing to re-
           | balance and also with the awesome feature that is tiered
           | storage via offloading that allows us to _actually_ do topics
           | that have unlimited retention. Perhaps more importantly for
           | the adoption though is pub /sub is much easier with Pulsar
           | and the API is just much easier to reason about for
           | developers than all the complexity of consumer groups, etc.
           | There are a ton of other nice things like being able to have
           | topics be so cheap such that we can have hundred of thousands
           | and all of the built-in multi-tenancy features, geo-
           | replication, flexible ACL system, pulsar functions and pulsar
           | IO and many other things that really have us excited about
           | all the capabilities
        
             | dominotw wrote:
             | > able to have topics be so cheap
             | 
             | For GDPR a lot of us has to do exportable 'user activity'.
             | Can you in theory have a topic/user ( we had like 50
             | million users) and publish any user activity to that topic?
        
         | zapdrive wrote:
         | We are developing a social app with features such as messaging,
         | notifications etc. We decided to use Yedis [0] (Yugabyte Redis)
         | which is a distributed Redis with persistence backed by
         | RocksDB. Yugabyte supports multiple datacentre distribution.
         | Yedis's pub/sub is distributed as well. We are already running
         | a Yugabyte cluster for data storage in Cassandra. So we didn't
         | have to do anything extra to get our distributed pub/sub up and
         | running. Would you recommend using Pulsar instead?
         | 
         | 0: https://docs.yugabyte.com/latest/yedis/
        
           | tschellenbach wrote:
           | Normally don't plug my own work, but this is super related.
           | Did you ever check out Stream? https://getstream.io/ We power
           | chat and feeds for >500 million end users. Tech is Go,
           | RocksDB & Raft based.
        
             | zapdrive wrote:
             | Yes, we did indeed consider Stream, but figured we could
             | save some money by deploying and running our own system. We
             | are very hopeful to quickly get a couple million users in a
             | short time from our launch and that would have ran up our
             | costs with stream quickly.
        
           | addisonj wrote:
           | I am not familiar enough with either Yedis or your use case
           | to make a recommendation, but I can say that Pulsar has a
           | great set of features, particularly if you need long term
           | retention, that make it very attractive. I also been
           | impressed with the community and development pace.
           | 
           | Being that the project is a top level Apache project and also
           | has some adoption by quite a few different companies and a
           | number of corporate sponsors the future of the project is
           | pretty safe bet.
        
         | unethical_ban wrote:
         | I have no idea what "pub-sub" is used for outside of its
         | academic definitions, and I have no idea how Hashicorp Vault
         | works - Don't you need a secret/password in cleartext at some
         | point, for a given service or definition?
         | 
         | You don't have to answer my questions, I am just shouting into
         | the void. I'm glad it works for y'all.
        
           | SlowRobotAhead wrote:
           | Pub/sub messaging is super common.
           | 
           | Most IOT devices that aren't running HTTP stacks are using
           | MQTT.
        
         | DevKoala wrote:
         | Was NATS a consideration for your use cases? At work, we are
         | currently standardizing on NATS as our messaging system, and I
         | would like to know if there is a valid comparison.
        
           | liquidgecka wrote:
           | Nats is not a replacement for pulsar or rabbitmq. It is a
           | message passing system designed to pass lots of messages
           | live, however if nobody is their to receive them they are
           | lost and gone forever. There is a streaming layer but that is
           | closer to Kafka and still does not provide the typical
           | message model with an ack/nack API.
           | 
           | I have used nats in several different ways but since it can
           | be lossy its never been considered as a replacement for a pub
           | sub message queue on my end. We used it for a chat message
           | layer and that worked pretty well.
           | 
           | As for its message passing layer that can be interesting but
           | you end up writing all the retry and failure logic anyway so
           | its usually just better to use an existing message layer that
           | handles all of that for you anyway without all the funky
           | abstractions.
           | 
           | Again, its interesting but nowhere near close to being a
           | rabbit or pulsar replacement if reliability is a goal.
        
           | addisonj wrote:
           | I think the other reply captures most of it for core NATS,
           | but we also looked at NATS Streaming a bit, but it seems to
           | be pretty immature (though promising) and doesn't check all
           | the boxes around integrations into the streaming ecosystem
           | like Pulsar does (Pulsar functions, Pulsar IO).
           | 
           | I am interested to see where NATS goes but for where we are
           | today Pulsar was a much more obvious choice.
        
         | bubbleRefuge wrote:
         | Isn't using Kubernetes kind of an anti-pattern due to failover
         | and rebalancing logic clashing? If Kubernetes is killing and
         | re-starting nodes and the cluster's brokers are detecting dead
         | brokers and rebalancing partitions as a result, it seems
         | counterproductive.
        
           | addisonj wrote:
           | This is one of the main benefits of Pulsar is that because
           | state is split between brokers and bookkeeper and bookkeeper
           | doesn't need re-balanced (due to it's segment based
           | architecture where you choose new bookies with each new
           | segment), we really don't have to worry about re-balancing
           | (in general, not just in case of failover) of storage. It is
           | true that topics map to a single broker, but generally,
           | Pulsar has _really_ good limits on memory so we don 't see
           | nodes getting killed by limits and we only really see re-
           | scheduling for real issues.
           | 
           | While there certainly is some aspects you need to be aware
           | of, generally, Pulsar is much more "cloud native" and maps
           | quiet well to k8s primitives.
        
           | skube wrote:
           | Using kubernetes is _always_ an anti-pattern.
        
           | rhizome wrote:
           | To the degree that that conflict exists in their
           | implementation I would think that it's possible to account
           | for all of that.
        
         | staticassertion wrote:
         | What is the SQS-based system you are migrating from?
         | 
         | I'm currently building a data processing system that is backed
         | by S3 -> SQS based events, for persistent message passing.
        
           | addisonj wrote:
           | We have a number of systems, some use SQS, some use Kinesis.
           | Part of the draw of Pulsar is having one piece of tech that
           | we can unify everything over and offer more baseline
           | features, like infinite retention via storage offloading or
           | Pulsar IO connectors that standardize common operations. We
           | aren't really targeting one use case, instead, we looked for
           | the system that offered a broad set of features that other
           | developers in the company want and is operationally doable
           | with just a few people.
        
         | willvarfar wrote:
         | What's your plan on disaster recovery? Do your workers track
         | their own cursors, and if so, how does that work across
         | regions?
        
           | addisonj wrote:
           | In Pulsar, offsets are tracked by the service as part of the
           | bookkeeper data (unless you use the reader API which is only
           | really needed for advanced use cases like Flink), that means
           | we just need to do DR for bookkeepers, which I touch on in
           | another response but the tl:dr; is that we have a 3x
           | replication factor as well as EBS snapshots
        
         | ypcx wrote:
         | Alrighty, a few questions:
         | 
         | - what k8s definitions do you use, e.g. do you use the official
         | Helm Chart, or have you written your .yaml's from scratch?
         | 
         | - have you practiced disaster recovery scenarios in the context
         | of k8s? Can you describe them briefly?
         | 
         | - how do you upgrade/redeploy the Pulsar k8s components, i.e.
         | does this cause the Bookies to trigger a cluster rebalance, or
         | does it trigger the Autorecovery
         | 
         | - for the Bookies, do you use AWS EBS volumes with the EKS or
         | just local instance storage (that is, if you use persistent
         | topics)
         | 
         | - do you use the Proxy pod's EKS k8s pod IPs as exposed on the
         | AWS network, or do you use a NodePort type of service for the
         | Proxy components (using the EKS node IPs)
         | 
         | - have you been bitten by the recent EKS k8s network plugin bug
         | (loss of pod connectivity), and/or how do you maintain your EKS
         | cluster
         | 
         | - do you run your EKS nodes in a multi-AZ setting?
        
           | addisonj wrote:
           | for the k8s definitions, we started with the helm chart,
           | rendered the template, and then moved it into kustomize, as
           | that is our tool of choice ATM, IDK if I would recommend that
           | approach for everyone (we expect we might move to helm v3 at
           | some point) but it was a good choice for us.
           | 
           | We have practiced some disaster recovery, but it isn't 100%
           | exhaustive (is it ever?), however it is also aided by how
           | Pulsar is designed. We have killed bookie nodes as well as
           | lost all our state in zookeeper. The first is pretty easily
           | handled by the replication factor of bookkeeper data and for
           | zookeeper we do extra backup step and just dump the state to
           | s3 and can restore it. What we haven't tested in practice but
           | now how to do theoretically is to restore a k8s stateful set
           | from EBS volume snapshots. However, we see that as a real
           | edge case. In Pulsar, we offload our data to s3 after a few
           | hours, so we only need to worry about potentially losing a
           | few hours of data in BK, as the zookeeper state is very easy
           | to just snapshot and restore from s3. In other words, we are
           | still working on getting more and more confident with data
           | and don't yet recommend teams use it for mission critical
           | non-recoverable data, but there are a ton of uses cases for
           | it now and we can continue to improve on the DR front
           | 
           | We have done multiple upgrades and deploy all the time.
           | Because bookkeeper nodes are in a stateful set and we have
           | don't do automated rollouts, we manually have a process to
           | replace the BK nodes. However, they don't trigger a re-
           | balance as it closes gracefully and then re-attaches the EBS
           | volume from the stateful set
           | 
           | We use EBS volumes, we use a piops volumes for the journal
           | and a larger slower volume for the ledger store. THis is one
           | of the great parts of bookkeeper design is that the two disks
           | pools are separate so we just need a small chunk of really
           | fast storage and then the journaled data is copied over to
           | the ledger volume by a background process. We figure for
           | really high write throughput we could use instance storage
           | for the journal volume and EBS for ledger, but that would
           | have some complications on recovery but still easier than
           | having to rebuild the whole ledger data.
           | 
           | We use the pulsar proxy and expose it via a k8s service with
           | the AWS specific NLB annotations.
           | 
           | We haven't had any issues with the k8s plugin and haven't
           | really had any issues with EKS version upgrades. We just add
           | new nodes when we migrate the kubelets
           | 
           | Yes, we have automation (via terraform) to allow us to add
           | many different pools of compute and we use labels and taints
           | to get specific apps mapped to specific pools of compute. For
           | Pulsar, we run all the components multi-AZ
        
           | addisonj wrote:
           | Oh forgot one aspect about DR: for critical data, we can
           | easily turn on geo-replication (with a single API call) and
           | have that data now in another region purely for DR purposes
           | (or for cross region use cases)
        
         | GordonS wrote:
         | Really interested why you chose Pulsar over RabbitMQ and
         | others?
        
           | addisonj wrote:
           | I have used (and deployed) rabbitmq in the past and really
           | love it for pub/sub, but for our needs, we keep needing
           | retention, particularly long retention that we process with
           | Flink for computing views. Having one system to do both is
           | great for us.
        
             | GordonS wrote:
             | Sorry, I'd missed that Pulsar was a streaming log system
             | (like kafka), _as well as_ a pub /sub system. HN title
             | misled me :)
        
           | [deleted]
        
         | 3fe9a03ccd14ca5 wrote:
         | The first question I have is why? SQS seems like such a simple
         | thing to keep hosted.
        
           | jjeaff wrote:
           | They said they are currently doing 50k messages a second and
           | they aren't even done migrating everything over.
           | 
           | 50k messages a second would cost you around $50k a month for
           | AWS sqs, (math could be wrong, didn't double check).
           | 
           | Plus, with sqs, you get what they have. No customizations.
        
             | deanCommie wrote:
             | I sincerely doubt they are sustaining 50k msgs/second.
             | Likely that's the MAXIMUM throughput.
             | 
             | No way they would actually hit that sustained throughput
             | for the entire month.
             | 
             | Even the other justifications about wanting to reference
             | messages after delivery do not to me justify migrating off
             | SQS/Kinesis, especially not at cost of 5 months development
             | effort.
        
             | rumanator wrote:
             | Even if you were off by an order of magnitude, that
             | expenditure level is not justified to run a message broker
             | service.
        
           | addisonj wrote:
           | Biggest thing is that we find ourselves needing to retain
           | this data for more than initial delivery and also for use
           | cases where we want to use the data more like a log and need
           | ordering guarantees. It isn't just our current SQS use cases,
           | it is being able to have one tech that does SQS like stuff
           | and Kinesis like stuff in one place
        
             | tybit wrote:
             | When we ran into a similar use case we went with writing to
             | S3 and using S3 SNS notifications and consumers could
             | subscribe their SQS queues to that.
             | 
             | Having said that, assuming Pulsar has a similar feature to
             | Kafkas log compaction I can definitely see the appeal!
             | 
             | Also the fact that SQS Fifo doesn't integrate with this
             | setup is super annoying.
             | 
             | Edit: Log compaction not key compaction
        
       | js4ever wrote:
       | "high-level APIs for Java, C++, Python and GO", no love for
       | Node.js? :(
        
         | maxtollenaar wrote:
         | You can also use the websocket proxy:
         | https://pulsar.apache.org/docs/en/next/client-libraries-webs...
        
         | tyingq wrote:
         | It exists: https://pulsar.apache.org/docs/en/next/client-
         | libraries-node...
        
           | c0brac0bra wrote:
           | However it currently lacks the ability to listen for messages
           | and run an event handler when one comes in:
           | https://github.com/apache/pulsar-client-node/pull/56
           | 
           | You have to manually call ".receive()" to attempt to receive
           | a message.
        
             | gperinazzo wrote:
             | Using `.receive()` will occupy a worker thread from node
             | until it returns. Having multiple consumers waiting on
             | receive will clog up the worker threadpool, preventing
             | anything that uses it from running. If you want to use the
             | consumer right now, I would suggest always using a timeout
             | on the receive call, and waiting between timed-out calls to
             | receive. This is extremely important if you have multiple
             | consumers.
        
       | buboard wrote:
       | Why is apache developing all those servers that are only useful
       | to a handful of companies that are rich enough to build them
       | themselves? How about building something that individuals can
       | use, like, i dunno, apache server itself?
        
         | Eikon wrote:
         | If you did bother to read the linked page, you would have
         | understood that it's a yahoo project handed over to Apache for
         | management like many of Apache's projects.
        
           | buboard wrote:
           | yeah i m talking more generally about their full list here:
           | https://www.apache.org/
        
             | mindw0rk wrote:
             | There are a lot of projects that were handed to Apache to
             | manage. Kafka for example was initially created by
             | LinkedIn. So yeah, you are right, big corps are actually
             | creating those tools, and in addition to this, giving it
             | away as open source to public.
        
               | eronwright wrote:
               | Best part is that the tools are put into production
               | before being open-sourced. In other words, they actually
               | work.
        
       | JackRabbitSlim wrote:
       | Another over engineered Lego block for quicker dev and even less
       | thought on design, upkeep or overhead.
       | 
       | Now if you excuse me I need to go take my quad-core, petaflop
       | processing power and multiple gigabytes of RAM to read email from
       | a javascript infested, multi-byte to single byte encoded webpage
       | hosted across half a dozen server instances scattered across the
       | planet.
       | 
       | CS is damned, and this is hell.
        
         | maximente wrote:
         | not really, people legitimately need this for truly big data.
         | you can't reliably processing 7 trillion events per day with a
         | completely C/unix CLI stack.
        
           | ivalm wrote:
           | I would bet that vast, vast majority of production kafka
           | deployments do not see 7 trillion events per day. I bet
           | many/most do not even see 7 billion events per day.
        
         | manigandham wrote:
         | This is one of the best designed pub/sub messaging systems
         | available, but you don't have to use it if you don't want to.
        
         | neeleshs wrote:
         | Please do elaborate
        
         | _frkl wrote:
         | Care to elaborate? I just started using the standalone version
         | of pulsar for a project, it looked better designed than Kafka,
         | resources usage looks quite acceptable so far, but i dont have
         | much experience with any solutions in this space, so im not
         | sure which problems im going to run into. Any suggestions for
         | good tech/strategies for a streaming-type setup like this? Or
         | what to do alternatively? Should i look into rolling my own?
        
       | breckcs wrote:
       | Being able to scale the durable-storage layer independently has a
       | lot of advantages. More thoughts here:
       | https://twitter.com/breckcs/status/1203736751681896449.
        
       | throwawaysea wrote:
       | A lot is said or referenced in this conversation about why people
       | chose Pulsar over Kafka. I'm not an expert in this area but are
       | there use cases where Kafka is still better?
        
       | bovermyer wrote:
       | How does this compare with NATS?
        
         | sqreept wrote:
         | NATS is a simpler PUB/SUB system that delivers in the UNIX
         | spirit of small composable parts. Apache Pulsar or Apache Kafka
         | deliver the banana, the ape holding it and the rest of the
         | jungle.
        
           | tylertreat wrote:
           | Check out Liftbridge (https://liftbridge.io) as a way to add
           | these capabilities to NATS.
        
         | manigandham wrote:
         | NATS is ephemeral pub/sub only. There is no persistence or
         | replay, but focuses on high performance and messaging patterns
         | like request/reply.
         | 
         | Kafka and Pulsar persist every message and different consumers
         | can replay the stream from their own positions. Pulsar also
         | supports ephemeral pub/sub like NATS with a lot more advanced
         | features.
         | 
         | NATS does have the NATS Streaming project for persistence and
         | replay but it has scalability issues. They're working on a new
         | project called Jetstream to replace this in the future.
        
       | barbarbar wrote:
       | How is it compared to kafka?
        
         | cbartholomew wrote:
         | You might want to check out this blog post I wrote comparing
         | Kafka to Pulsar: https://kafkaesque.io/5-more-reasons-to-
         | choose-apache-pulsar...
        
           | cbartholomew wrote:
           | If you have an O'Reilly subscription, you can also check out
           | this detailed report comparing Pulsar and Kafka:
           | https://learning.oreilly.com/library/view/apache-pulsar-
           | vers...
        
         | manigandham wrote:
         | Separates storage from brokers for better scaling and
         | performance. Millions of topics without a problem and built-in
         | tenant/namespace/topic hierarchy. Kubernetes-native. Per-
         | message acknowledgement instead of just an offset. Ephemeral
         | pub/sub or persistent data. Built-in functions/lambda platform.
         | Long-term/tiered storage into S3/object storage. Geo-
         | replication across clusters.
        
         | eerrt wrote:
         | Some latency benchmarks: https://kafkaesque.io/performance-
         | comparison-between-apache-...
        
         | SkyRocknRoll wrote:
         | Most of the flaws of Kafka are carefully studied and fixed in
         | Apache pulsar. I have written a blog about why we went ahead
         | with pulsar https://medium.com/@yuvarajl/why-nutanix-beam-went-
         | ahead-wit...
        
           | progval wrote:
           | > when consumers are lagging behind, producer throughput
           | falls off a cliff because lagging consumers introduce random
           | reads
           | 
           | I am confused by this. The format of Kafka's log files is
           | designed to allow reading and sending to clients directly
           | using sendfile, in sequential reads of batches of messages.
           | http://kafka.apache.org/documentation/#maximizingefficiency
        
             | geeio wrote:
             | Kafka works best when the data it is returning to consumers
             | is in the page cache.
             | 
             | When consumers fall behind, they start to request data that
             | might not be in the page cache, causing things to slow
             | down.
        
             | manigandham wrote:
             | Kafka brokers handle connections to consumers and data
             | storage. This creates contention as the primaries for each
             | partition have to service the traffic and handle IO.
             | Consumers that aren't tailing the stream will cause
             | slowdowns because Kafka has to seek to that offset from
             | files which aren't cached in RAM.
             | 
             | Pulsar separates storage into a different layer (powered by
             | Apache Bookkeeper) which allows consumers to read directly
             | from multiple nodes. There's much more IO throughput
             | available to handle consumers picking up anywhere in the
             | stream.
        
       | mbostleman wrote:
       | This might be entirely off topic, but I'm having issues using
       | RabbitMQ whereby durability suffers because messages are sent to
       | remote hosts thus exposing them to both the network and remote
       | host availability. On a previous platform I used an MSMQ based
       | system which didn't have this problem since it uses a local store
       | and forward service. So all sends are to localhost and are not
       | affected by the network or the receiver availability. The MSMQ
       | system was my first and only experience with messaging up to now,
       | so I was surprised that any system would not work that way. How
       | is this dealt with in other systems? Is it just a feature that
       | exists or not and you just decide if it's important? And maybe
       | just to shoe horn it to be on topic, does Pulsar use a local
       | service?
        
         | manigandham wrote:
         | That's an inherent issue with distributed solutions and is
         | impossible to solve. The only way to deal with it is using
         | various techniques like acknowledgements, retries, local
         | storage, idempotency, etc. MSMQ handles that stuff behind the
         | scenes but the problem itself will always exist if there's a
         | network boundary.
         | 
         | These other systems are designed to be remote with a network
         | interface. You can use the client drivers to handle
         | acknowledgements/retries/local-buffering in your own app or use
         | something like Logstash [1], FluentD [2], or Vector [3] for
         | message forwarding if you want a local agent to send to. You
         | might have to wire up several connectors since none of them
         | forward directly to Pulsar today.
         | 
         | Also RabbitMQ is absolute crap. There are better options for
         | every scenario so I advise using something else like Redis,
         | NATS, Kafka, or Pulsar.
         | 
         | 1. https://www.elastic.co/products/logstash
         | 
         | 2. https://www.fluentd.org/
         | 
         | 3. https://vector.dev/
        
         | bauerd wrote:
         | You're free to have queue and workers run on the same machine,
         | just bind to loopback. As soon as you deal with more than one
         | machine, which is required in HA scenarios, you deal with a
         | networked (distributed) system. I might not have understood
         | your question correctly though ...
         | 
         | Edit: Maybe you're looking for acks/confirms?
         | https://www.rabbitmq.com/confirms.html
        
           | mbostleman wrote:
           | I have many machines, each of which have one or many
           | applications that send messages. And I have one machine with
           | an instance of Rabbit to which all messages are sent. If the
           | network is down or the Rabbit machine is down, the messages
           | are gone along with their data.
           | 
           | Clustering the Rabbit machine helps one particular failure
           | scenario, but it's not a solution to the problem.
        
       | zackmorris wrote:
       | This looks promising. Is there such thing as a generalized SQL
       | query engine that runs over any key-value store that provides
       | certain minimal core operations?
       | 
       | For example, say you have a KV Store with basic mathematical Set
       | operations like GET, SET, UNION, INTERSECT, EXCEPT, etc. The
       | Engine would parse the SQL and then call the low-level KV Store
       | Set operations, returning the result or updating KV pairs. This
       | explains how Join relates to Set operations:
       | 
       | https://blog.jooq.org/2015/10/06/you-probably-dont-use-sql-i...
       | 
       | Another thing I'd like is if KV stores exposed a general purpose
       | functional programming language (maybe a LISP or a minimal stack-
       | based language like PostScript) for running the same SQL Set
       | operations without ugly syntax. I don't know the exact name for
       | this. But if we had that, then we could build our own distributed
       | databases, similar to Firebase but with a SQL interface as well,
       | from KV stores like Pulsar. I'm thinking something similar to
       | RethinkDB but with a more distinct/open separation of layers.
       | 
       | The hard part would be around transactions and row locking. A
       | slightly related question is if anyone has ever made a lock-free
       | KV store with Set operations using something like atomic compare-
       | and-swap (CAS) operations. There might be a way to leave requests
       | "open" until the CAS has been fully committed. Not sure if this
       | applies to ledger/log based databases since the transaction might
       | already be deterministic as long as the servers have exact copies
       | of the same query log.
       | 
       | Edit: I wrote this thinking of something like Redis, but maybe
       | Pulsar is only the message component and not a store. So the
       | layering might look like: [Pulsar][KV Store (like Redis)][minimal
       | Set operations][SQL query engine].
        
         | atombender wrote:
         | One of the challenges with layering SQL on top of a KV store is
         | query performance.
         | 
         | The most obvious way to model a secondary index on top of a
         | pure KV store is to map indexed values to keys. For example,
         | given the (rowID, name) tuples (123, "Bob"), (345, "Jane"),
         | (234, "Zack"), you can store these as keys:
         | name:Bob:123       name:Jane:345       name:Zack:234
         | 
         | At this point you don't need or even want values, so this is
         | effectively a sorted set.
         | 
         | Now you can easily find the rowID of Jane by doing a key scan
         | for "name:Jane:", which should be efficient in a KV store that
         | supports key range scans. You can do prefix searches this way
         | ("name:Jane" finds all keys starting with "Jane"), as well as
         | ordinal constraints ("age > 32", which requires that the age
         | index is encoded to something like:
         | age:Bob:\x00\x00\x00\x20:123
         | 
         | To perform an union ("name = 'Bob' OR name = 'Jane'"), you
         | simply do multiple range scans, performing a merge sort-ish
         | union operation as you go. To perform an intersection ("name =
         | 'Bob' AND age > 10"), you find the starting point for all the
         | terms and use that as the key range, then do the merge sort.
         | 
         | This is what TiDB and FoundationDB's record layers do, which
         | both have a strict separation between the stateless database
         | layer and the stateful KV layer.
         | 
         | The performance bottleneck will be the network layer. Your
         | range scan operations will be streaming a lot of data from the
         | KV store to the SQL layer, and potentially you'll be reading a
         | lot of data that is discarded by higher-level query layers.
         | This is why TiKV has "co-processor" logic in the KV store that
         | knows how to do things like filter; when TiDB plays your query,
         | it pushes some query operators down to TiKV itself for
         | performance.
         | 
         | Unfortunately, this is not possible with FoundationDB. This is
         | why FoundationDB's authors recommend you co-locate FDB with
         | your application on the same machine. But since FDB key ranges
         | are distributed, there's no way to actually bring the query
         | code close to the data (as far as I know!).
         | 
         | I'm sure you could do something similiar with Redis and Lua
         | scripting, i.e. building query operators as Lua scripts that
         | worked on sorted sets. I wouldn't trust Redis as a primary data
         | store, but it can be a fast secondary index.
        
         | PaulHoule wrote:
         | e.g. how about a complex event processing engine? Something
         | like that will do a lot of the above, but the inference
         | database stays managable since old data will fall out of the
         | windows.
        
           | eronwright wrote:
           | Take a look at the Apache Flink CEP library, which operates
           | over unbounded streams:
           | https://ci.apache.org/projects/flink/flink-docs-
           | stable/dev/l...
        
           | TuringNYC wrote:
           | I usually dont associated CEP with this, but is makes sense.
           | Are they meant to operate at this level (rather than at a
           | higher level of abstraction)?
           | 
           | Which ones would you recommend looking at?
        
         | manigandham wrote:
         | Spark [1], Presto [2], and Drill [3] can all do that with
         | connectors to different data sources and varying support for
         | advanced SQL.
         | 
         | Pulsar has support for Presto:
         | https://pulsar.apache.org/docs/en/sql-overview/
         | 
         | Pulsar isn't a KV store though, it's a distributed
         | log/messaging system that supports a "key" for each message
         | that can be used when scanning or compacting a stream. GET and
         | SET aren't individual operations but rather scans through a
         | stream or publishing a new message.
         | 
         | If you just want to have a SQL interface to KV stores or
         | messaging systems that support a message key then Apache
         | Calcite [4] can be used as a query parser and planner. There
         | are examples of it being used for Kafka [5].
         | 
         | 1. https://spark.apache.org/
         | 
         | 2. https://prestodb.io/
         | 
         | 3. https://drill.apache.org/
         | 
         | 4. https://calcite.apache.org/
         | 
         | 5. https://github.com/rayokota/kareldb
        
         | coryvirokmobile wrote:
         | Regarding the generic sql engine - it looks like this is what
         | Apache Calcite was designed for.
         | 
         | https://calcite.apache.org/
        
       | reggieband wrote:
       | I'm still on the fence with these distributed log/queue hybrids.
       | From a theoretical perspective it seems these are excellent. I
       | just have this nagging suspicion that there is some even-worse
       | problem architectures based on these systems will harbor. This
       | kind of ambivalence is something I find myself having to battle
       | more and more in my career as I age. Most of the time the hype
       | around new design/development patterns leads to a worse
       | situation. Very rarely it leads to a significant improvement. I
       | dislike that my first impression looking at a system like this is
       | risk aversion.
        
         | zomglings wrote:
         | Your risk aversion seems justified. It seems reasonable to
         | estimate that very few teams are in the position of needing the
         | kind of scale/scalability that something like Apache Pulsar
         | offers. They are much more likely to be in either a state where
         | they will not put Pulsar through its paces or where they
         | already have a solution in place that serves their
         | scale/scalability needs.
         | 
         | When a team you are on starts discussing switching over to a
         | technology like Pulsar because of its amazing benefits, unless
         | your pants are on fire, it is much more likely than not that
         | you do not stand to gain much from the benefits that such
         | software brings but you are accepting the maintenance burden
         | that it represents.
        
         | mpmpmpmp wrote:
         | I totally get where you're coming from as I feel like that alot
         | too. But the fact that you are thinking about risk and business
         | value over working on cool new tech should be a positive for
         | the ventures you are a part of.
        
       | alfalfasprout wrote:
       | I keep seeing new message queue solutions pop up over the years
       | and it's just been my impression at least that this is one area
       | where silicon valley really is way behind the trading industry.
       | 
       | Reliable pub/sub that supports message rates over 100k/sec (even
       | up to the millions) has been available for a while now and with a
       | great deal of efficiency (eg; the Aeron project). The incredible
       | amount of effort to support complex partitions, extreme fault
       | tolerance (instead of more clever recovery logic), etc. add a lot
       | of overhead. To the point of talking about "low latency" overhead
       | in the order of 5ms instead of microseconds or even nanoseconds
       | as is expected in trading.
       | 
       | Worse, many startups try to adopt these technologies where their
       | message rates are miniscule. To give you some context, even two
       | beefy machines with an older message queue solution like ZeroMQ
       | can tolerate throughput in excess of what most companies produce.
       | 
       | This is not to discredit the authors of Pulsar or Kafka at all...
       | but it's just a concerning trend where easy to use horizontally
       | scalable message queues are being deployed everywhere. Similar to
       | how everyone was running hadoop a few years back even when the
       | data fit in memory.
        
         | tylertreat wrote:
         | ZeroMQ is not a message queue, it's a networking library.
        
         | gfodor wrote:
         | Worth noting that Kafka is not a queue, but an append-only log.
        
       | mickster99 wrote:
       | Is there a reason you went with Pulsar over Kafka? How is the
       | pulsar community? Where are you turning when you have support
       | issues?
        
       | mattboyle wrote:
       | We tried to adopt this but found the documentation very lacking
       | and a severe lack of quality client libraries for our language of
       | choice (go).the "official" one had race conditions in the code as
       | well as "todo" for key pieces littered throughout. There is
       | another from comcast which is abandoned. We had a serious
       | discussion about picking up ownership of the library or writing
       | our own but as a small start up we didnt feel we could do it and
       | still develop the product. I'll continue to keep an eye on pulsar
       | but for now Kafka is the clear go to imo. It's well documented,
       | great SAS offerings (confluent) and tons of books and training
       | courses for it.
        
         | cbartholomew wrote:
         | We provide a SaaS offering of Apache Pulsar in AWS, Azure, and
         | GCP: https://kafkaesque.io/
        
           | jjeaff wrote:
           | Cool name. That's one of those company names that almost
           | seems like someone thought it would make a good company name
           | first and thought it was so fitting, they should build a
           | company around it.
        
             | cbartholomew wrote:
             | Thanks!
        
           | mattboyle wrote:
           | I didnt find this when looking, thanks will take a deeper
           | look.
        
         | ckdarby wrote:
         | > found the documentation very lacking
         | 
         | Really? It is one of the few open source projects that we've
         | felt has had modern documentation. How long ago was this?
         | 
         | > As a small startup
         | 
         | You'll spend more time & money on the OpEx cost with Kafka than
         | picking up the client library for Pulsar.
        
           | mattboyle wrote:
           | It was about 6 months ago.
           | 
           | I completely disagree with the opex of picking up kafka vs
           | developing a whole client library. Please could you try and
           | explain how you came to this conclusion?
        
             | ckdarby wrote:
             | > Please could you try and explain how you came to this
             | conclusion?
             | 
             | 1. Stateless brokers
             | 
             | With Kafka any time a broker goes down you need to be aware
             | of the kafka broker id. Yes, this can be fixed by creating
             | your entire infrastructure as code and keeping track of
             | state.
             | 
             | This is something of great OpEx. I've seen few people
             | successfully automate this, Netflix is one of the few. The
             | rest just use manual process with tooling to get around,
             | pager, Kafka tooling to spawn replacement node with the
             | looked up broker id, etc.
             | 
             | 2. Kafka MirrorMaker
             | 
             | Granted I have not used v2 that recently came out in ~2.6
             | but dear gosh v1 was so bad that Uber wrote their own
             | replacement from the ground up called uReplicator. The
             | amount of time wasted on replication broken across regions
             | is disgusting.
             | 
             | 3. Optimization & Scaling
             | 
             | Kafka bundles compute & storage. There's (maybe on a
             | upcoming KIP) no way that I know of splitting this. This
             | means you'll waste time on Ops side deciding on tradeoffs
             | between your broker throughput and your broker space.
             | 
             | Worse yet time & money will be wasted here. I'd just rather
             | hire more people than waste time on silly things like this.
             | This is where I justify taking on the expense of client
             | libs.
             | 
             | 4. Segments vs Partitions
             | 
             | The major time wasters are where you end up in a situation
             | with the cluster utterly getting destroyed. It will happen,
             | it isn't a question of if but a question of when or the
             | company goes belly up and nobody cares.
             | 
             | It's 3 AM, the producer is getting back pressure, you get a
             | page and now have to deal with adding on write capacity to
             | avoid a hot spot. Don't forget you can't just simply do a
             | rebalancement in Kafka or you'll break the contract with
             | every developer who has developed under the golden rule of,
             | "Your partition order will always be the same".
             | 
             | You'll successfully pay the cost of upgrading the entire
             | cluster and then spending 3 days coming up with a solution
             | to rebalance without making all your devs riot against you
             | when you break that golden contract.
             | 
             | RIP Kafka
             | 
             | Having spent a couple of years dealing with Kafka I'm sorry
             | to burst people's bubbles but is dead. Even Confluent
             | doesn't have a good enough story these days to not switch
             | to Pulsar, they're going to sell you on the same consulting
             | bs, "We're more mature", "We've got better tooling.",
             | "Better suppott"...
             | 
             | Yes, of course, it has been in the open source community 5
             | years longer and the company has been also around longer
             | for that time. Kafka is dead, long live Pulsar.
        
               | bubbleRefuge wrote:
               | I think what is dead is confluent cloud b/c Amazon MSK
               | and Azure HDInsight will be close to feature parity at
               | much less cost.
        
               | ckdarby wrote:
               | Damn, I got lazy on my reply & just hoped nobody went
               | further, but well played on digging deeper.
               | 
               | 5. Kafka is silly expensive
               | 
               | Pulsar supports message ack with subscription groups. The
               | worst case with Pulsar is you're storing the entire
               | retention period.
               | 
               | Let's say you have a 4 day retention window, to cover an
               | outage happening on Friday and not having to deal with it
               | until Monday. This is pretty typical with what I see in
               | the Kafka world for small-mid size companies who don't
               | want to pay the 1.5x OT on call.
               | 
               | So, with Pulsar you're at worst storing the 4 days of
               | data but at best you're only storing the messages within
               | the lag period of all consumer groups acknowledging the
               | message.
               | 
               | Now, without getting too deep into Pulsar's feature set
               | even that is a lie because Pulsar has tiered storage as a
               | first class citizen. The messages after the four days
               | could be ship off to S3 if we wanted or even within 1 day
               | depending on our use case and this is all built into
               | Pulsar, no OpEx tooling required. Even access the
               | messages from S3 through Pulsar is abstracted, there's no
               | tooling required to pull them back in if you wanted.
               | 
               | Now with Kakfa our worst case is simply 4 days of
               | retention data. This can get very expensive as compute &
               | storage are tied together, it means scaling up all the
               | brokers (even though we don't need the throughput) for
               | the storage increase. Now, yes MSK basically abstracts
               | all this from you but you're paying for it.
               | 
               | 6. AWS Managed Service are not equal citizens to EC2
               | standalone
               | 
               | Managed services right now don't fall under the new
               | Saving Plan: https://aws.amazon.com/blogs/aws/new-
               | savings-plans-for-aws-c...
               | 
               | This will cost you 30-60% discount on your entire Kafka
               | bill.
               | 
               | 7. Excel Life
               | 
               | If I look at the numbers for what I'm doing it would have
               | costed ~$4M for Kafka vs ~$1M for Pulsar.
        
           | cpx86 wrote:
           | > You'll spend more time & money on the OpEx cost with Kafka
           | than picking up the client library for Pulsar.
           | 
           | Could you elaborate why this would be the case?
        
             | gen220 wrote:
             | Not the OP, but I think they were exaggerating a bit. In
             | practice, operating kafka is a major PITA, because it means
             | you have to
             | 
             | (1) choose a "flavor" wrapper (confluent seems to be a
             | popular one), because the base project isn't easy to
             | develop against
             | 
             | (2) write your own wrappers of those wrappers, to keep your
             | developers from shooting themselves in the foot with wacky
             | defaults
             | 
             | (3) suffer the immense pain that is authenticating topic
             | write/reads, if that's even possible???
             | 
             | (4) stand up zookeeper... and probably lose some data along
             | the way.
             | 
             | (5) suffer zookeeper outages due to buggy code in kafka/zk
             | (I've experienced lost production data due to unpredictable
             | bugs in kafka/zk, but obviously YMMV).
             | 
             | Based on my naive assessment, the kafka/zookeeper ecosystem
             | is maybe 10x as complicated as the problem it's solving,
             | and that shows up in the OpEx. I personally doubt that
             | Pulsar is _that_ much better, but it might be.
        
               | ckdarby wrote:
               | These are also valid. I wrote the reply explaining some
               | of the OpEx here:
               | https://news.ycombinator.com/item?id=21938463
        
               | EdwardDiego wrote:
               | What do you mean by 1 and 2? I'm guessing you're
               | referring to the kafka-clients API? The defaults for
               | producer and consumer conf are quite sensible these days.
        
         | matteomerli wrote:
         | We're close to release a new "officially supported" native Go
         | client library: https://github.com/apache/pulsar-client-go
        
         | jgraettinger1 wrote:
         | If you're a Go shop, Gazette is worth a look
         | (https://gazette.dev).
        
       | eatonphil wrote:
       | Most of the comments are just pro-Pulsar but what's the
       | architectural trade-off? (Non-architectural trade-off is that
       | Pulsar is a new system to learn for folks familiar with
       | maintaining and using Kafka.)
        
         | manigandham wrote:
         | Pulsar is better designed than Kafka in every way with the main
         | trade-off being more moving pieces. That's why the recommended
         | deployment is Kubernetes which can manage all that complexity
         | for you.
         | 
         | Pulsar also lacks in the size of the community and ecosystem
         | where Kafka has much more available.
        
         | qeternity wrote:
         | Someone linked some benchmarks here (on mobile and can't find)
         | that showed a single node Kafka outperforms but as soon as you
         | start scaling Pulsar pulls ahead. I'm not familiar enough with
         | the nitty gritty to comment beyond that.
        
       | srameshc wrote:
       | This is great !! What would be the easiest way to run a 3 node
       | cluster ?
        
         | ckdarby wrote:
         | The standalone mode will let you get started as a developer.
         | You grab tar.gz, uncompress, run standalone.sh.
         | 
         | There are helm charts for running an actual cluster.
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2020-01-02 23:00 UTC)