[HN Gopher] Launch HN: Batch (YC S20) - Replays for event-driven...
       ___________________________________________________________________
        
       Launch HN: Batch (YC S20) - Replays for event-driven systems
        
       Hello HN!  We are Ustin and Daniel, co-founders of Batch
       (https://batch.sh) - an event replay platform. You can think of us
       as version control for data passing through your messaging systems.
       With Batch, a company is able to go back in time, see what data
       looked like at a certain point and if it makes sense, replay that
       piece of data back into the company's systems.  This idea was born
       out of getting annoyed by what an unwieldy blackbox Kafka is. While
       many folks use Kafka for streaming, there is an equal number of
       Kafka users that use it as a traditional messaging system.
       Historically, these systems have offered very poor visibility into
       what's going on inside them and offer (at best) a poor replay
       experience. This problem is prevalent pretty much across every
       messaging system. Especially if the messages on the bus are
       serialized, it is almost guaranteed that you will have to write
       custom, one-off scripts when working with these systems.  This
       "visibility" pain point is exacerbated tenfold if you are working
       with event driven architectures and/or event sourcing - you must
       have a way to search and replay events as you will need to rebuild
       state in order to bring up new data stores and services. That may
       sound straightforward, but it's actually really involved. You have
       to figure out how and where to store your events, how to serialize
       them, search them, play them back, and how/when/if to prune, delete
       or archive them.  Rather than spending a ton of money on building
       such a replay platform in-house, we decided to build a generic one
       and hopefully save everyone a bunch of time and money. We are 100%
       believers in "buy" (vs "build") - companies should focus on
       building their core product and not waste time on sidequests. We've
       worked on these systems before at our previous gigs and decided to
       put our combined experience into building Batch.  A friend of mine
       shared this bit of insight with me (that he heard from Dave Cheney,
       I think?) - "Is this what you want to spend your innovation tokens
       on?" (referring to building something in-house) - and the answer is
       probably... no. So this is how we got here!  In practical terms, we
       give you a "connector" (in the form of a Docker image) that hooks
       into your messaging system as a consumer and begins copying all
       data that it sees on a topic/exchange to Batch. Alternatively, you
       can pump data into our platform via a generic HTTP or gRPC API.
       Once the messages reach Batch, we index them and write them to a
       long-term store (we use https://www.elassandra.io). At that point,
       you can use either our UI or HTTP API to search and replay a subset
       of the messages to an HTTP destination or into another messaging
       system.  Right now, our platform is able to ingest data from Kafka,
       RabbitMQ and GCP PubSub, and we've got SQS on the roadmap. Really,
       we're cool with adding support for whatever messaging system you
       need as long as it solves a problem for you.  One super cool thing
       is that if you are encoding your events in protobuf, we are able to
       decode them upon arrival on our platform, so that we can index them
       and let you search for data within them. In fact, we think this
       functionality is so cool that we really wanted to share it - surely
       there are other folks that need to quickly read/write encoded data
       to various messaging systems. We wrote
       https://github.com/batchcorp/plumber for that purpose. It's like
       curl for messaging systems and currently supports Kafka, RabbitMQ
       and GCP PubSub. It's a port from an internal tool we used when
       interacting with our own Kafka and RabbitMQ instances.  In closing,
       we would love for you to check out https://batch.sh and tell us
       what you think. Our initial thinking is to allow folks to pump
       their data into us for free with 1-3 days of retention. If you need
       more retention, that'll require $ (we're leaning towards a usage-
       based pricing model).  We envision Batch becoming a foundational
       component of your system architecture, but right now, our #1 goal
       is to lower the barrier to entry for event sourcing and we think
       that offering "out-of-the-box" replay functionality is the first
       step towards making this happen.  .. And if event sourcing is not
       your cup of tea - then you can get us in your stack to gain
       visibility and a peace of mind.  OK that's it! Thank you for
       checking us out!  ~Dan & Ustin  P.S. Forgot about our creds:  I
       (Dan), spent a large chunk of my career working at data centers
       doing systems integration work. I got exposed to all kinds of
       esoteric things like how to integrate diesel generators into CMSs
       and automate VLAN provisioning for customers. I also learned that
       "move fast and break things" does not apply to data centers haha.
       After data centers, I went to work for New Relic, followed by
       InVision, Digital Ocean and most recently, Community (which is
       where I met Ustin). I work primarily in Go, consider myself a
       generalist, prefer light beers over IPAs and dabble in metal
       (music) production.  Ustin is a physicist turned computer scientist
       and worked towards a PhD on distributed storage over lossy
       networks. He has spent most of his career working as a founding
       engineer at startups like Community. He has a lot of experience
       working in Elixir and Go and working on large, complex systems.
        
       Author : dsies
       Score  : 94 points
       Date   : 2020-08-17 15:33 UTC (7 hours ago)
        
       | Monotonic wrote:
       | Does this have support for Rabbit pub/sub? There's a bit of
       | confusing wording on the page that makes it unclear.
        
         | dsies wrote:
         | 100% - we use Rabbit internally for our own systems so it has
         | first-class support.
         | 
         | I think maybe we should just list out the messaging systems we
         | support on the front page, so you don't have to dig through
         | stuff... Good point. Let me know if you've got any other
         | suggestions.
        
       | treis wrote:
       | If I'm writing all my messages to durable storage why not work
       | off the durable storage? I'm definitely not an expert in this
       | area so perhaps I'm missing something. My logic is that if you're
       | paying the resource cost to write all your messages why not pay
       | the resource cost to read/write back there?
        
         | dsies wrote:
         | I think you're asking why don't we just be the hosted
         | kafka/rabbitmq/etc and offer all of this stuff in one place.
         | (let me know if that's wrong).
         | 
         | That's a totally legit point - we've talked about offering it
         | all in-house before but it would require us to split our
         | efforts into two - operating a PaaS (for a bunch of different
         | messaging tech) and running the event collection platform.
         | 
         | Operating the PaaS part would be a full-time effort and there's
         | a lot of competition out there. We've decided to focus on the
         | observability/replay part first (since there is a lot less
         | competition) and then later maybe explore the hosted bus
         | option.
         | 
         | LMK if that's not what you meant :)
        
           | treis wrote:
           | >I think you're asking why don't we just be the hosted
           | kafka/rabbitmq/etc and offer all of this stuff in one place.
           | 
           | The other way around. If I'm not storing my messages today
           | it's probably because it is too expensive in terms of storage
           | or compute to do so. But, presumably, you can't do that any
           | cheaper than I can. And now we are duplicating the work so
           | even more resources are being consumed making it that much
           | more expensive than just doing it myself.
           | 
           | It seems like your service is something I'd want to run
           | pointed towards my Kafka/RabbitMQ/whatever servers. I don't
           | see how duplicating that stream is cost effective.
        
             | dsies wrote:
             | Ahh gotcha. If you need event introspection, doing it in-
             | house is extremely likely to be more expensive (and
             | definitely time consuming) than offloading it.
             | 
             | For example: if you are sending serialized data on your bus
             | - you will need to write something that will deserialize it
             | before inserting it into your elastic search cluster - and
             | now you're managing even more infra (message systems,
             | decoders, document storage).
             | 
             | There is definitely a price attached to the luxury - but
             | we're betting that it'll be _significantly_ less than doing
             | it yourself.
        
       | kanobo wrote:
       | Congrats, looks useful! Just an opinion, but I think you should
       | skip the cool large animation on your homepage and just start
       | with the "Our platform is essential in scaling and maintaining
       | your business.". I had no idea what Batch was until I scrolled
       | way below the fold.
        
         | dsies wrote:
         | Yeah... we've heard this before. But the wavy stuff makes me
         | feel so ... _caaaaalm_ :)
        
           | [deleted]
        
           | kanobo wrote:
           | It is nice and calming, btw your twitter link at the bottom
           | your site is broken (it links to
           | https://batch.sh/www.twitter.com/batchsh).
        
             | uzarubin wrote:
             | Fixed!
        
         | uzarubin wrote:
         | Ustin here.
         | 
         | It's definitely on our roadmap to fix. Thank you for the
         | feedback!
        
       | pdubs1 wrote:
       | Has anyone ever told you that you're "batch it crazy"?
        
         | [deleted]
        
         | dsies wrote:
         | No, but this is now definitely going on a sticker :D
        
       | ponker wrote:
       | I hear so much about Kafka, could someone give the two-sentence
       | description of what it is and who uses it and for what?
        
         | uzarubin wrote:
         | From their website: Kafka is an open-source distributed event
         | streaming platform.
         | 
         | There are many use cases from piping website activity tracking,
         | metrics, log aggregation and stream processing. For us, it's a
         | communication layer utilized by our microservices. An event
         | goes into the stream and any services that cares about that
         | data will consume it. In other words, it's like an ultra-
         | resilient, scalable, redis pub-sub with history that runs on
         | the JVM. You can read more about the use cases here:
         | https://kafka.apache.org/uses
         | 
         | edit: Sidenote, Kafka is often waaaaaay overkill - if you need
         | messaging, use something simpler like Rabbit or NATS or Redis
         | and only use Kafka if you know why you need it.
        
           | ponker wrote:
           | Thanks. So an event that goes in would be something like,
           | "user logged in," and services that care about that data
           | would be...? Sorry still having some trouble understanding
           | it.
        
         | [deleted]
        
       | pratio wrote:
       | Congrats on the release. We've made a ragtag solution in-house
       | that is complicated but works on those few unfortunate events
       | that we need it. There's a demo on request but it would be
       | helpful if we can have a better way to test the product. Maybe an
       | endpoint where we stream maybe 10000 events and see them replay?
       | What sort of pricing tier are we talking about?
        
         | dsies wrote:
         | Thank you!
         | 
         | Re: ragtag in-house solution that is complicated
         | 
         | ^ That's _exactly_ what we 're talking about. These systems get
         | complex pretty quick and you end up with duct tape in more than
         | a few places.
         | 
         | As for demo - yeah, our plan is to open up registrations for
         | accounts soon which will allow you to pump data into us for
         | free with a low retention period.
         | 
         | We've still got some pieces to tighten before we can open the
         | gates fully but we'll try to make it happen soon (within next
         | few weeks?). In the meantime, if you want a demo, ping us and
         | we'll make it happen.
        
       | yamrzou wrote:
       | Congrats on the launch!
       | 
       | Two questions:
       | 
       | - If I have some data in Kafka, why would I want to pump it into
       | your platform instead of spawning an Elasticsearch instance and
       | using something like Kafka Connect to write to it and gain
       | visibility?
       | 
       | - If I use Kafka as a permanent data store (with infinite
       | retention), I can easily replay all events with existing clients
       | (or with plumber). What additional functionality does the
       | "replay" feature offer compared to that?
        
         | dsies wrote:
         | Hey there!
         | 
         | > - If I have some data in Kafka, why would I want to pump it
         | into your platform instead of spawning an Elasticsearch
         | instance and using something like Kafka Connect to write to it
         | and gain visibility?
         | 
         | To avoid having to build, own and maintain the infra you just
         | mentioned. As the number of events on your system increase, you
         | will have to scale ES and other pieces of the system as well.
         | 
         | Our point is just that - if you know what's involved in
         | collecting and indexing the events - that is awesome but maybe
         | you shouldn't have to spend time building the infra around that
         | stuff.
         | 
         | > If I use Kafka as a permanent data store (with infinite
         | retention), I can easily replay all events with existing
         | clients (or with plumber). What additional functionality does
         | the "replay" feature offer compared to that?
         | 
         | I think it depends on your definition of "easily replay" - a
         | kafka replay for a topic that's being consumed by a consumer
         | group would require you to disconnect that consumer group and
         | then run a shell script to move the offsets. You also would not
         | have any way to replay any specific messages - your only point
         | of reference would be an offset (and keyname, if you use it) -
         | not terribly flexible.
         | 
         | With Batch, you get to drill in and replay the _exact_ messages
         | you want (and avoid having to pump and dump potentially
         | millions of messages your consumer doesn't care about).
        
           | yamrzou wrote:
           | Makes sense, thanks for the clarification!
        
         | thiscatis wrote:
         | I hope for the founders this ages like the Dropbox comment.
        
       | shay_ker wrote:
       | Hi Dan/Ustin,
       | 
       | Congrats on the launch. The pain-point makes sense to me. I'm
       | just curious - what's the big picture for you all? I imagine it
       | must be larger than just replay.
        
         | uzarubin wrote:
         | Batch is betting that more companies are going to be utilizing
         | event sourcing in order to scale. We want to be a foundation
         | piece in their data infrastructure and support their transition
         | into event sourcing by initially offering replays. We want to
         | be a "One stop shop" for all event sourcing needs.
        
           | shay_ker wrote:
           | Cool! I don't have much data on how many companies are using
           | events for key workflows, but I do know that many, many
           | companies would _love_ to replay HTTP requests!
        
             | uzarubin wrote:
             | That's awesome! We support http and gRPC collection as
             | well. Let us know what you have in mind.
        
       | worldsoup wrote:
       | I've worked on a very similar product in the past and can affirm
       | that there is definitely enterprise interest for a good solution
       | to event replay for orgs that are already doing event
       | sourcing...I'm curious if offering out of the box replay will
       | actually lower the bar and drive more orgs to pursue event
       | sourcing? The CLI search functionality is really cool and useful
       | as well.
        
         | dsies wrote:
         | Hey there!
         | 
         | Re: lowering the bar - we hope so. What we've noticed is that
         | the papers that talk about event sourcing mention replays but
         | don't talk at all about the implementation (or give any
         | pointers). We're hoping that if at least that part is done for
         | you, you've got one less thing to worry about.
         | 
         | As for the CLI tool - thanks! We found it super useful
         | ourselves and figured others would too. I like to think of it
         | as a sort of intelligent `netcat` for messaging systems :D
        
       | randtrain34 wrote:
       | Is Pulsar support on the roadmap?
        
         | uzarubin wrote:
         | We are planning to support as many messaging systems as we can.
         | We will definitely investigate Pulsar. Going to add it to our
         | feature list and make an issue on plumber to support
         | introspection on Pulsar. Cheers!
        
           | zok3102 wrote:
           | Good to know that Pulsar is on your roadmap. Also, kudos to
           | see user-land tooling around a common painpoint for teams
           | doing any event processing at scale.
        
             | uzarubin wrote:
             | Thank you! We felt the pain point while actively trying to
             | build observability tools in order to debug our messaging
             | systems. We built plumber to standardize some of our
             | internal tools and then decided to open source it to help
             | others who are feeling the pain.
        
       | danenania wrote:
       | This looks interesting! A couple questions (that may also apply
       | to event sourcing more generally):
       | 
       | - How do you handle events with side effects (sending emails, for
       | example), and ensuring they aren't triggered on replay when they
       | shouldn't be?
       | 
       | - How do you handle randomness, like uuid generation?
        
         | dsies wrote:
         | > How do you handle events with side effects (sending emails,
         | for example), and ensuring they aren't triggered on replay when
         | they shouldn't be?
         | 
         | Someone else already addressed this, but to paraphrase: your
         | application should be able to deal with duplicate events (and
         | gracefully handle side-effects).
         | 
         | > How do you handle randomness, like uuid generation?
         | 
         | Are you referring to id generation and tagging in events (ie.
         | aggregate id's)? If so, that'd be an application responsibility
         | - you'd have to determine how to properly attach id's.
         | 
         | Hmm. But that does bring up an interesting idea - what if we
         | provided a way to "group" events and generate aggregate id's on
         | your behalf. Maybe that's what you meant - it's an interesting
         | idea.
         | 
         | We currently don't do anything "extra" in regards to grouping
         | events - we tag each individual event but that's about it.
        
         | kbyatnal wrote:
         | Event systems are able to guarantee at-least once delivery, so
         | your application needs to be able to handle duplicate events in
         | any case via idempotency.
        
       | cflyingdutchman wrote:
       | How does bookmarking work/How do I keep track of how far I've
       | read while replaying from Batch? Will you also index by date? It
       | can take a long time to replay a lot of data; do you have any
       | numbers on the read rates you support per topic?
        
         | dsies wrote:
         | Great questions!
         | 
         | > How does bookmarking work/How do I keep track of how far I've
         | read while replaying from Batch?
         | 
         | We do not have any bookmarking functionality built (yet) as we
         | currently expect folks to just tweak their search query. Each
         | one of the events has a new id attached to it that you can
         | query and reference during search.
         | 
         | > Will you also index by date?
         | 
         | We do! Every event has a microsecond timestamp attached to it.
         | 
         | > It can take a long time to replay a lot of data; do you have
         | any numbers on the read rates you support per topic?
         | 
         | We've done some initial replay throughput tests and have been
         | able to reach ~10k/s outbound via HTTP - of course, this is all
         | _highly_ dependent on where you're located. We expect that for
         | folks who need super high throughput, we'll probably need to be
         | closer to them - we fully expect to have to peer with some of
         | our customers and optimize for throughput by doing gRPC and ...
         | batching :)
         | 
         | So far, we've done _most_ of our testing on inbound and we are
         | currently able to sustain ~50k /s (with ~5KB event size). Our
         | inbound is able to scale horizontally and so can go waaaaay
         | beyond 50k/s if needed.
         | 
         | We have _a ton_ of service instrumentation so we 've got good
         | visibility around throughput (and thus should know well in
         | advance as to when we're starting to hit limits).
        
       | historyremade wrote:
       | Ustin vs Justin. Interesting!
        
       ___________________________________________________________________
       (page generated 2020-08-17 23:00 UTC)