[HN Gopher] ClickHouse Keeper: A ZooKeeper alternative written i...
       ___________________________________________________________________
        
       ClickHouse Keeper: A ZooKeeper alternative written in C++
        
       Author : eatonphil
       Score  : 168 points
       Date   : 2023-09-27 14:14 UTC (8 hours ago)
        
 (HTM) web link (clickhouse.com)
 (TXT) w3m dump (clickhouse.com)
        
       | what-no-tests wrote:
       | I've been looking at RedPanda[0] for a new project.
       | 
       | Any ideas on that?
       | 
       | [0] https://redpanda.com/
        
       | tbragin wrote:
       | Coincidentally, as someone who worked on this blog, I was
       | surprised (and pleased!) to see that we are not the only ones who
       | felt the need to build a Zookeeper alternative.
       | 
       | Looks like folks at StreamNative did as well, with their Oxia
       | project: https://github.com/streamnative/oxia. They were just
       | talking about this yesterday at Confluent Current ("Introducing
       | Oxia: A Scalable Zookeeper Alternative" was the title of their
       | talk). https://streamnative.io/blog/introducing-oxia-scalable-
       | metad...
       | 
       | Seems to be a trend :)
        
         | coding123 wrote:
         | Is the trend mainly due to ZK being written in Java?
        
           | tylerhannan wrote:
           | TBH, I don't think so.
           | 
           | I mean, I have worked and, and been guilty of tooling driven
           | development (RiiR anyone?) .
           | 
           | But, also, in a comment below Alexey shares many of the
           | reasons other than language. I think Oxia does a good job of
           | sharing their approach in - https://github.com/streamnative/o
           | xia/blob/main/docs/design-g...
           | 
           | (Alexey's comment, FYI,
           | https://news.ycombinator.com/item?id=37677324)
        
           | Xeoncross wrote:
           | It's not the written in Java part, it's the running in the
           | JVM that's the issue. Memory hungry is what I think of when I
           | think Java apps. I'd much rather have a low memory Go or Rust
           | service.
        
             | dathinab wrote:
             | but if you use ZK for what it was designed for (basic
             | cluster role/naming cordination, distributing configs to
             | cluster nodes and similar) then this kind doesn't matter
             | 
             | I mean in a typical use case of it you would
             | 
             | - run it on long running nodes (e.g. not lambda or spot
             | instances)
             | 
             | - run more or less exactly 3 nodes up to quite a cluster
             | size, I guess some use-cases which involve a lot of
             | serverless might need more
             | 
             | - configs tend to not change "that" much nor are they
             | "that" big
             | 
             | what this means is
             | 
             | - java needing time to run-hot (JIT optimize) is not an
             | issue for it
             | 
             | - GC isn't an issue
             | 
             | and if you look at how much memory (RAM) typical minimal
             | nodes in the cloud have it in context of typical config
             | sizes and cluster sizes is also not an issue
             | 
             | through I guess depending what you want to do there could
             | be issues if you use it
             | 
             | - for squeezing through analytics or similar
             | 
             | - setups with very very very large constantly changing
             | clusters, e.g. in some serverless context with ton of ad-
             | hoc spawned up instances, maybe using WASM and certain
             | snapshot tricks allowing insane fast startup time
             | 
             | - you want to bundle it directly into other applications,
             | running on the same hardware and that applications need
             | more memory
             | 
             | but all of it are cases it wasn't designed for so I
             | wouldn't call it an "alternative" but a ZooKeeper like
             | service for different use-cases, I guess
        
               | StevePerkins wrote:
               | Long-running server processes are not only "not an issue"
               | for the JVM, they're the main use case in which the JVM
               | is _superior_ to AOT compilation! Same is true for C# and
               | the .NET CLR, by the way.
               | 
               | If you're running a Lambda function in which startup time
               | is extremely important, or an embedded application where
               | size and resources are paramount, or even just a short-
               | lived process where you don't care either way... then AOT
               | makes a lot of sense.
               | 
               | But for long-running server processes, just-in-time
               | compilation almost always results is better performance
               | than AOT compilation that cannot optimize at runtime
               | based on what's actually happening.
               | 
               | HN _should_ be full of people who know better, but these
               | discussions feel like piping information into  /dev/null.
               | Web devs, students and hobbyists, and other low-
               | information voters just have it in their heads that AOT
               | is always a superior model, and JIT always an inferior
               | fallback, and there's nothing you can say to break
               | through that. There aren't enough people from the
               | business server side world who spend enough time in
               | online discussion forums to correct the narrative.
        
               | neonsunset wrote:
               | I think there's an important distinction to be made:
               | Theoretically speaking, given that JIT compilers
               | generally care much more about compilation speed than AOT
               | ones, it is not unreasonable to assume that AOT compilers
               | _ought_ to produce more optimized code.
               | 
               | With that said, this is not the case for both JVM and
               | .NET. Stepping away from ought to is, both have JIT
               | compilers which produce better optimized code than their
               | AOT counterparts, due to a variety of reasons including
               | R&D effort done for JIT throughout their history and JIT
               | allowing to dynamically profile and recompile the code
               | according to its execution characteristics (.NET's Tier 1
               | PGO Optimized and HotSpot JVM's C2).
        
             | adra wrote:
             | I mean java native compilation is a thing. I imagine you'd
             | get the majority of the lift via that route instead of
             | reinventing the wheel, but it's possible there are caveats
             | to that route for these impls. Even java native/golang have
             | GC heaps, but in general ZK is really low GC churn so I
             | don't see that being the impetus to rewrite.
             | 
             | My guess if anything is that people always complain about
             | ZK being annoying as a second piece of infra to distribute
             | for simple setups, so my guess this is just a prelude to
             | them embedding their keeper into the DB deployment itself
             | which is the same general strategy Kafka is taking (a
             | verrrry loong time) to rollout.
        
         | kapilvt wrote:
         | looks to be a slightly different design goal on oxia wrt to
         | replication and fault tolerance.
         | https://github.com/streamnative/oxia/blob/main/docs/design-g...
        
           | tylerhannan wrote:
           | Def.
           | 
           | I hadn't seen Oxia before but the idea, for their
           | implementation, of making Zookeeper more like Bookkeeper was
           | an interesting one.
           | 
           | Not right for ClickHouse needs but, IMO, a novel approach.
        
       | tylerhannan wrote:
       | Thanks for sharing!
       | 
       | If anyone has any questions, I'll do my best to get them
       | answered.
       | 
       | (Disclaimer: I work at ClickHouse)
        
         | [deleted]
        
         | alchemist1e9 wrote:
         | Is there a python client library you can recommend?
        
           | alesapin wrote:
           | All ZooKeeper libraries are compatible with clickhouse-
           | keeper. The most popular and mature is
           | https://kazoo.readthedocs.io/en/latest/. We use it in our
           | integration tests framework (with clickhouse-keeper) a lot.
        
           | zX41ZdbW wrote:
           | The same library that you use for ZooKeeper - kazoo.
           | 
           | Note: our stress tests have found a segmentation fault in
           | Python's kazoo library.
           | 
           | We only wanted to test Keeper, but found every bug around it
           | :) Let me find a link.
        
             | zX41ZdbW wrote:
             | https://github.com/ClickHouse/ClickHouse/issues/45367
        
               | qoega wrote:
               | Did not expect to see issue I created
        
         | secondcoming wrote:
         | What do you use for network stuff in C++, ASIO?
        
           | alesapin wrote:
           | Yes, for internal RAFT implementation boost.asio is used.
        
         | pdeva1 wrote:
         | 1. can this be used without clickhouse as just a zookeeper
         | replacement? 2. am i correct in that its using s3 as disk? so
         | can it be run as stateless pods in k8s? 3. if it uses s3, how
         | are latency and costs of PUTs affected? does every write result
         | in a PUT call to s3?
        
           | zX41ZdbW wrote:
           | 1. Yes, it can be used with other applications as a ZooKeeper
           | replacement, unless some unusual ZooKeper features are used
           | (there is no Kerberos integration in Keeper, and it does not
           | support the TTL of persistent nodes) or the application tests
           | for a specific ZooKeeper version.
           | 
           | 2. It could be configure to store - snapshots; - RAFT logs
           | other than the latest log; in S3. It cannot use a stateless
           | Kubernetes pod - the latest log has to be located on the
           | filesystem.
           | 
           | Although I see you can make a multi-region setup with
           | multiple independent Kubernetes clusters and store logs in
           | tmpfs (which is not 100% wrong from a theoretical
           | standpoint), it is too risky to be practical.
           | 
           | 3. Only the snapshots and the previous logs could be on S3,
           | so the PUT requests are done only on log rotation.
        
             | pdeva1 wrote:
             | 2. ok. so can i rebuild a cluster with just state in s3?
             | eg: i create a cluster with local disks and s3 backing.
             | entire cluster gets deleted. if i recreate cluster and
             | point to same s3 bucket, will it restore its state?
        
               | zX41ZdbW wrote:
               | It depends on how the entire cluster gets deleted.
               | 
               | If one out of three nodes disappears, but two out of
               | three nodes are shut down properly and written the latest
               | snapshot to S3, it will restore correctly.
               | 
               | If two out of three nodes disappeared, but one out of
               | three nodes is shut down properly and written the latest
               | snapshot to S3, and you restore from its snapshot - it is
               | equivalent to split-brain, and you could lose some of the
               | transactions, that were acknowledged on the other two
               | nodes.
               | 
               | If all three nodes suddenly disappear, and you restore
               | from some previous snapshot on S3, you will lose the
               | transactions acknowledged after the time of this snapshot
               | - this is equivalent to restoring from a backup.
               | 
               | TLDR - Keeper writes the latest log on the filesystem. It
               | does not continuously write data to S3 (it could be
               | tempting, but if we do, it will give the latency around
               | 100..500 ms, even in the same region, which is comparable
               | to the latency between the most distant AWS regions), and
               | it still requires a quorum, and the support of S3 gives
               | no magic.
               | 
               | The primary motivation for such feature was to reduce the
               | space needed on SSD/EBS disk.
        
           | alesapin wrote:
           | 1. Absolutely. clickhouse-keeper is distributed as a
           | standalone static binary or .deb package or .rpm package. You
           | can use it without clickhouse as ZooKeeper replacement. 2.
           | It's not recommended to use slow storage devices for logs in
           | any coordination system (zookeeper, clickhouse-keeper, etcd
           | and so on). Good setup will be small fast SSD/EBS disk for
           | fresh logs and old logs + snapshots offloaded to S3. In such
           | setup the amount of PUT requests will be tiny and latency
           | will be as good as possible.
        
           | pradeepchhetri wrote:
           | Sometime back, I tried using clickhouse-keeper as zookeeper
           | alternative with few other systems like kafka, mesos, solr,
           | Wrote some notes here:
           | https://pradeepchhetri.xyz/clickhousekeeper/
        
         | abronan wrote:
         | Thanks for this excellent article! Enjoyed it from start to
         | finish. This gave me a good memory of the work we've done at
         | docker embedding our own replicated and consistent metadata
         | storage using etcd's raft library.
         | 
         | Looking at the initial pull request, is it correct that
         | ClickHouse Keeper is based on Ebay's NuRaft library? Or did the
         | Clickhouse team fork and modified this library to accommodate
         | for ClickHouse usage and performance needs?
        
           | alesapin wrote:
           | Yes, you are right ClickHouse Keeper is based on NuRaft. We
           | did a lot of modifications for this library, both for
           | correctness and performance. Almost all of them (need to
           | check) are contributed back to upstream ebay/NuRaft library.
        
       | wdb wrote:
       | Nice to see an alternative for Zookeeper that doesn't depend on
       | the Java runtime
       | 
       | I thought stuff were supposed to be rewritten Rust /s
        
         | tylerhannan wrote:
         | lol.
         | 
         | I was waiting for that somewhere ;)
        
       | Dowwie wrote:
       | Any thoughts here on Fly's Corrosion?
       | https://github.com/superfly/corrosion
        
         | mdaniel wrote:
         | At least two comments spring to mind: this is at least
         | _blogged_ as a drop-in ZK replacement, which for sure is not
         | true of Corrosion, and ClickHouse has Jepsen tests for their
         | distributed KV store, which I don't see any reference to such a
         | thing for Corrosion
         | 
         | Maybe neither of those two things matter for one's use case,
         | but it's similar to someone rolling up on this blog post and
         | saying "but what about etcd" -- they're just different, with
         | wholly different operational and consumer concerns
        
       | lambda_garden wrote:
       | Why is "written in C++" part of the headline?
       | 
       | As engineers we focus too much on the implementation details and
       | not the benefits to the user.
       | 
       | How about:
       | 
       | - ZooKeeper alternative with lower latency
       | 
       | - ZooKeeper alternative with lower memory use
       | 
       | - ZooKeeper alternative with predictable overheads
       | 
       | (I don't know if these are true, just suggestions)
        
         | zX41ZdbW wrote:
         | ZooKeeper alternative with:
         | 
         | 1. Snapshots and logs take much less amount of space on disk
         | due to better compression.
         | 
         | 2. No limit on the default packet and node data size (it is 1
         | MB in ZooKeeper)
         | 
         | 3. No zxid overflow issue (it forces restart for every 2 bn
         | transactions in ZooKeeper)
         | 
         | 4. Faster recovery after network partitions due to the use of a
         | different distributed consensus protocol.
         | 
         | 5. It uses less amount of memory for the same volume of data.
         | 
         | 6. It is easier to setup, as it does not require specifying the
         | JVM heap size or a custom gc implementation.
         | 
         | 7. A larger coverage by Jepsen tests. (This could be hard to
         | believe, but true - ZooKeeper is tested by Jepsen, but Keeper
         | takes the existing tests and adds more).
         | 
         | 8. The possibility to store snapshots and previous logs on S3.
         | 
         | C++ isn't a key detail, just a consequence of the fact that the
         | main ClickHouse code base is written in C++.
         | 
         | If you need a distributed consensus system but not necessarily
         | compatible with ZooKeeper, there are plenty of options: Etcd,
         | Consul, FoundationDB...
        
           | adra wrote:
           | 2. is configurable (with specific caveats) 3. I believe this
           | was solved a very long time ago? Don't epochs rollover
           | automatically now? 6. is this remotely relevant? You still
           | want limits in cloud deploys so not sure how this is remotely
           | a consideration given it takes 2 minutes when you first set
           | it up to use best practices settings.
        
         | klysm wrote:
         | Running services on the JVM is terrible and requires much more
         | resources than other platforms
        
           | pjmlp wrote:
           | Depends on how much one cares about memory corruption,
           | developer tooling and library ecosystem.
        
             | klysm wrote:
             | Yes I agree memory safety is an important trade off, but
             | the performance wins can be worth it in some cases.
        
         | nickHN2023 wrote:
         | Disclaimer: I working on this blog at ClickHouse with the team.
         | 
         | We'll look in to them and adding to some of our social
         | promotion over the coming weeks. Will try to find a way to give
         | you credit.
        
         | betaby wrote:
         | Java vs C++ is very important implementation detail, especially
         | for 'the benefits to the user'. Java is commercial platform
         | which requires a fee to Oracle if used in enterprise, while C++
         | complied binary does not.
        
           | jsiepkes wrote:
           | That's a _very_ incorrect statement. You can use any OpenJDK
           | (which is GPLv2 with classpath exception) distribution you
           | want to run Apache Zookeeper without having to have any
           | agreement with Oracle or pay any fee. The Oracle JDK is just
           | Oracle's commercial version of their OpenJDK distribution
           | with Oracle support.
           | 
           | You can use the OpenJDK distro shipped in your Linux distro
           | (RedHat, Debian, etc.), you can use Microsoft's OpenJDK
           | distro[1], you can use the Eclipse OpenJDK distro, you can
           | use Amazon's OpenJDK distro [3] and there are a whole bunch
           | more.
           | 
           | [1] https://www.microsoft.com/openjdk [2]
           | https://adoptium.net/ [3] https://aws.amazon.com/corretto/
        
           | wiseowise wrote:
           | What year are you from to say such nonsense?
        
           | pjmlp wrote:
           | The usual FUD, no it doesn't require any fee, use OpenJDK.
           | 
           | Several distributions to chose from.
        
           | icedchai wrote:
           | Or you can just use OpenJDK, right?
        
             | betaby wrote:
             | All ZooKeeper installations I've seen so far in production
             | were on Oracle Java JDK for one or another reason.
        
               | icedchai wrote:
               | Probably because people are stuck in the 2010's. OpenJDK
               | used to have more compatibility issues.
        
         | GauntletWizard wrote:
         | I'd be most interested in a Zookeeper alternative that doesn't
         | have massive bugs in leader election.
        
           | antonio2368 wrote:
           | Just adding on to a nice response from abronan, our internal
           | protocol also does some optimizations when it comes to leader
           | election, e.g. Pre Vote protocol (https://github.com/eBay/NuR
           | aft/blob/master/docs/prevote_prot...)
           | 
           | Also, we apply many different faults in our Jepsen tests
           | which are run 3 times a day and we never had a problem with
           | leader election. I know this doesn't confirm that there is no
           | bug in it but it's pretty reassuring I would say.
        
           | abronan wrote:
           | Since the article states that they're using Raft and not ZAB
           | for the consensus algorithm and leader election, it must be
           | less prone to bugs when it comes to electing a leader. Since
           | Raft is easier to reason about and the leader election
           | process is more straightforward (Raft minimizes the chance
           | that any two nodes will be candidates at the same time and
           | thus avoids starting multiple concurrent elections).
        
         | stonemetal12 wrote:
         | It is to let you know to not use it, since it is a pile of
         | Memory related CVEs just waiting for the joy of discovery.
        
           | zX41ZdbW wrote:
           | That's true - C++ libraries are typically bug-ridden and
           | require exhaustive efforts to clean up.
           | 
           | But the latest bugs found by ClickHouse continuous
           | integration system in the related library were fixed about a
           | year ago:
           | 
           | https://github.com/eBay/NuRaft/pull/373
           | https://github.com/eBay/NuRaft/pull/392
        
         | tbragin wrote:
         | Disclaimer: I worked on this blog with the team at ClickHouse.
         | 
         | I like your suggestions!
         | 
         | Some of the benefits we summarized in this summary page
         | https://clickhouse.com/clickhouse/keeper include ease of setup
         | and operation, no overflow issues, better compression, faster
         | recovery, (dramatically) less memory used, etc..
         | 
         | There was actually a reason why C++ was important for us at
         | ClickHouse, and it's because C++ is our main code base and
         | managing a Java project as part of it was not natural, but you
         | right - for standalone use of this alternative, that doesn't
         | matter.
        
           | jsiepkes wrote:
           | How much did memory safety factor in the decision? I mean
           | only last week the IT world was yet again bitten by a major
           | memory safety bug, in libwebp.
        
             | nanolith wrote:
             | Given that their entire code base is written in C++, and
             | switching to a different language would be a significant
             | retooling for the team, I think it's reasonable to assume
             | that it did not. Language choice is rarely made on the
             | grounds of specific features, and is often made on the
             | grounds of ergonomics and team knowledge.
             | 
             | A more revealing question is, "How are you dealing with
             | memory safety in this implementation?" There are ways to
             | improve memory safety in C++ through tooling and idiomatic
             | style. Are these things being used?
        
               | antonio2368 wrote:
               | Keeper is tested in the same way as ClickHouse.
               | 
               | There are Keeper only tests, but we run ClickHouse with
               | Keeper for all of our server tests.
               | 
               | For each test we try to use all useful tools for
               | verifying safety and correctness like sanitizers.
               | 
               | E.g. an interesting tool we introduced in our codebase
               | for thread safety
               | https://clang.llvm.org/docs/ThreadSafetyAnalysis.html#
               | 
               | We found some issues using sanitizers in our codebase and
               | NuRaft library itself which were instantly fixed.
               | 
               | And let's not forget about Jepsen which showed some
               | really tricky bugs but were more related to the
               | correctness.
        
               | nanolith wrote:
               | Thanks. That is useful.
               | 
               | I would suggest looking into CBMC and similar tools as
               | well. Model checking is incredibly useful.
        
               | antonio2368 wrote:
               | Sadly I never put enough effort into trying out such
               | checks. but your excitement about them gives me
               | motivation to properly try them out.
        
               | nanolith wrote:
               | CBMC is subtle and will require some code changes to use
               | effectively.
               | 
               | The real key for using it, in my opinion, is to isolate
               | individual classes and functions. Avoid instrumenting
               | code with recursion and loops, and focus on defining and
               | verifying function contracts, class invariants, and
               | resource / memory lifetimes.
               | 
               | It will require a significant amount of work to mock up
               | standard library and third party library APIs, but the
               | real beauty of CBMC is that once you define the interface
               | contracts for these APIs and libraries, you can verify
               | every use of them.
               | 
               | I used CBMC previously to verify proper usage rules with
               | C / JNI integration. JNI can be one complicated beast,
               | and CBMC handily managed rule checks for its use.
               | 
               | I'm an extremely careful developer who unit tests
               | everything and strives for 99% coverage. CBMC was still
               | able to detect a memory overwrite flaw in a networking
               | library I wrote that was based on undefined behavior due
               | to integer promotion and offset math. This passed the
               | various sanitizers and unit tests I had in place, but
               | CBMC was able to reduce it to an actual crash condition
               | that was potentially exploitable.
               | 
               | I don't think I can over-emphasize the usefulness of this
               | tool.
        
         | [deleted]
        
       | The_Colonel wrote:
       | Does ClickHouse still have some relationship to Yandex?
        
         | einpoklum wrote:
         | It seems like these days, Yandex is just one of multiple stock
         | holders; see:
         | 
         | https://en.wikipedia.org/wiki/ClickHouse
        
           | tylerhannan wrote:
           | "ClickHouse, Inc. is a Delaware company with headquarters in
           | the San Francisco Bay Area. We have no operations in Russia,
           | no Russian investors, and no Russian members of our Board of
           | Directors."
           | 
           | Source - https://clickhouse.com/blog/we-stand-with-ukraine
        
       | cvccvroomvroom wrote:
       | Compatible possibly but unproven in production at scale like ZK.
        
         | tylerhannan wrote:
         | Definitely used in production ;) and at rather some scale.
         | 
         | It runs thousands of clusters, daily, both in CSP hosted
         | offerings (including our own ClickHouse Cloud) and at customers
         | running the OSS release.
         | 
         | Never accept any claims at face value and always test. But, in
         | this case, it is quite battle-hardened (i.e. the Jepsen tests
         | run 3x daily https://github.com/ClickHouse/ClickHouse/tree/mast
         | er/tests/j...)
         | 
         | But yes, ZooKeeper is pretty amazing. We are building on the
         | backs of giants.
         | 
         | I'd also argue the RAFT v. ZAB is an important production scale
         | conversation. But, as the blog says, Zookeper is a better
         | option when you require scalability with a read-heavy workload.
        
       | spullara wrote:
       | Yet another thing I have just used FoundationDB for in the past.
        
         | rad_gruchalski wrote:
         | I'd love to read about that.
        
       | the-alchemist wrote:
       | It's been a struggle getting Clickhouse accepted here in the
       | U.S., despite its technical prowess, even prior to the war in
       | Ukraine.
       | 
       | I know, I read the blog:
       | 
       | > ClickHouse, Inc. is a Delaware company with headquarters in the
       | San Francisco Bay Area. We have no operations in Russia, no
       | Russian investors, and no Russian members of our Board of
       | Directors. We do, however, have an incredibly talented team of
       | Russian software engineers located in Amsterdam, and we could not
       | be more proud to call them colleagues.
       | 
       | The FUD is really hard to overcome. This is coming from someone
       | who advocated for Clickhouse, sent some PRs, and did a minor code
       | audit.
        
       | jzelinskie wrote:
       | It's been a few years since I've checked in with distributed lock
       | services. Why would someone adopt ZooKeeper after etcd gained
       | maturity? I recall seeing benchmarks more than 5 years ago where
       | a naive proxy like zetcd[0] out-performs ZooKeeper itself in many
       | ways and offers more consistent latencies. etcd has gotten lots
       | of battle-testing being Kubernetes' datastore, but I can also see
       | how that has shaped its design in a way that might not fit other
       | projects.
       | 
       | I think there are plenty of other projects (e.g. FoundationDB,
       | Kafka) that also replaced their usage of ZooKeeper as their
       | systems matured. I guess I'm confused why anyone has been picking
       | up new installations of ZooKeeper.
       | 
       | [0]: https://github.com/etcd-io/zetcd
        
         | klysm wrote:
         | The term "distributed lock" is a bit of a mental red flag to
         | me.
        
         | zX41ZdbW wrote:
         | There is no specific reason to start with ZooKeeper, nor with
         | ClickHouse Keeper, if you want to use another distributed
         | consensus system.
         | 
         | But: every such system is slightly different in the data model
         | and the set of available primitives.
         | 
         | It's very hard to build a distributed system correctly, even
         | relying on ZooKeeper/Etcd/FoundationDB. For example, when I
         | hear "distributed lock," I know that there is 90% chance there
         | is a bug (distributed lock can be safely used if every
         | transaction made under a lock also atomically tests that the
         | lock still holds).
         | 
         | So, if there is an existing system heavily relying on one
         | distributed consensus implementation, it's very hard to switch
         | to another. The main value of ClickHouse Keeper is its
         | compatibility with ZooKeeper - it uses the same data model and
         | wire protocol.
        
           | endisneigh wrote:
           | Foundationdb doesn't use locks, due to that it's relatively
           | easy to build distributed systems on top, but the trade off
           | are 5sec transaction limits.
        
       | davideberdin wrote:
       | I'm always impressed by the quality of the blog posts coming out
       | from clickhouse.com! Super well written!
        
         | tylerhannan wrote:
         | Thank you! We try ;)
        
           | macintux wrote:
           | Basho blog quality 4ever.
        
             | tylerhannan wrote:
             | It's a deep-cut reference. But it's one I am honoured you
             | made.
             | 
             | :mug:
        
       | klysm wrote:
       | I hate running services in Java but this will have to earn a lot
       | of trust before it's a viable replacement in prod
        
         | tylerhannan wrote:
         | ++ agreed.
         | 
         | ClickHouse Keeper was released as feature complete in December
         | of 2021.
         | 
         | It runs thousands of clusters, daily, both in CSP hosted
         | offerings (including our own ClickHouse Cloud) and at customers
         | running the OSS release.
         | 
         | Never accept any claims at face value and always test. But, in
         | this case, it is quite battle-hardened (i.e. the Jepsen tests
         | run 3x daily https://github.com/ClickHouse/ClickHouse/tree/mast
         | er/tests/j...).
        
           | klysm wrote:
           | That's a strong endorsement. I wonder if there's been any
           | effect where it's strongly tailored to the API surface area
           | utilized by ClickHouse and there's any gaps elsewhere
        
             | tylerhannan wrote:
             | We hope not and try to keep in wire compatible for clients
             | to interact (recently added dynamic reconfig, etc.)
             | 
             | It is definitely opinionated and influenced by our
             | work...but not designed solely for it.
             | 
             | But, also, we continue to improve. Most notably in the work
             | on Multi-group Raft -
             | https://github.com/ClickHouse/ClickHouse/issues/54172
        
       | dathinab wrote:
       | Is it just me or does it look like an "alternative for use-cases
       | ZooKeeper was not intended for"?
       | 
       | E.g. if we quote ZooKeeper:
       | 
       | > ZooKeeper is a centralized service for maintaining
       | configuration information, naming, providing distributed
       | synchronization, and providing group services.
       | 
       | and ClickHouse
       | 
       | > ClickHouse is the fastest and most resource-efficient open-
       | source database for real-time applications and analytics.
       | 
       | Like this are completely different use cases with just a small
       | overlap.
        
         | advisedwang wrote:
         | And then further down:
         | 
         | > ClickHouse Keeper is a drop-in replacement for ZooKeeper
         | 
         | That opening was about ClickHouse in general, but the article
         | is about one particular application using the database.
        
           | dathinab wrote:
           | drop in replacement doesn't change anything about my
           | question, interfaces are not all what matters
           | 
           | and now knowing more about it I would say the answer to my
           | question is a clear yes, it used ZK for something ZK wasn't
           | at all intended for
           | 
           | which means it makes a lot of sense that they replace it
        
             | macksd wrote:
             | Are you perhaps confusing ClickHouse, with ClickHouse
             | Keeper (one of ClickHouse's components?) Sounds to me like
             | ClickHouse is the database, and ClickHouse Keeper is the ZK
             | drop-in replacement. A bit like HBase being a database, and
             | ZooKeeper being a service it is heavily dependent on.
        
           | tylerhannan wrote:
           | Yep.
           | 
           | Generally, ClickHouse Keeper provides the coordination system
           | for data replication and distributed DDL query execution for
           | ClickHouse clusters.
        
       | insanitybit wrote:
       | So could I just point my Kafka at this thing and use it?
        
         | tylerhannan wrote:
         | fundamentally, yep.
         | 
         | https://pradeepchhetri.xyz/clickhousekeeper/ talks about some
         | experiments in exactly that vein.
        
       | pram wrote:
       | Looks nice, I will definitely be trying this out.
       | 
       | Built in s3 storage immediately sold me. I've used something
       | called Exhibitor to manage ZK clusters in the past but it's
       | totally dead. Working with ZK is probably one of my least
       | favorite things to do.
        
         | randomtb wrote:
         | I used exhibitor in the past too. They were specially useful
         | during the time when zookeeper cluster needed to expand/shrink
         | or move host nodes. Zookeeper dynamic configuration solved that
         | problem, which seems to be also supported by clickhouse keeper,
         | Pretty impressive! Would definitely give it a try.
        
           | tylerhannan wrote:
           | It does indeed.
           | 
           | Do note the docs page...
           | 
           | https://clickhouse.com/docs/en/guides/sre/keeper/clickhouse-.
           | ..
           | 
           | In particular, it is necessary to enable the
           | `keeper_server.enable_reconfiguration` flag. It is pretty
           | exhaustive coverage but if there is an important use case
           | missing, let us know!
        
       | kiratp wrote:
       | Written in C++ is not a positive in my book. New things created
       | this decade in unsafe languages (where safe options would have
       | worked fine) should be frowned up and criticized as bad
       | engineering.
        
         | krvajal wrote:
         | > where safe options would have worked fine
         | 
         | Thats a very short-sighted view IMO, software engineering is
         | not just about technology choices.
        
           | kiratp wrote:
           | Software engineering is first and foremost about making good
           | engineering choices. We are proving time and again as an
           | industry that humans CANNOT write mistake-free code.
           | 
           | From just this week:
           | https://news.ycombinator.com/item?id=37600852
           | 
           | Just because someone built a fantastically functional
           | building doesn't mean we can't criticize their choice of
           | foundation. Case in point: Millennium Tower in SF: https://ww
           | w.nbcbayarea.com/investigations/series/millennium-...
        
             | krvajal wrote:
             | And when making those engineering choices there are
             | different tradeoffs and constraints to be considered. The
             | language to use is one of them. So when a Rust (wild guess)
             | fanboy comes without any background context and makes
             | comments like yours is very telling.
             | 
             | You are correct that bug free software does not exist. But
             | choose a "memory safe" language does not prevent that. A
             | seasoned C++ developer knows how to use memory sanitizers
             | and other tools to guarantee the correctness of its code
             | compared to an average Rust developer that just trust the
             | compiler which, guess what, also may have bugs.
        
             | kgeist wrote:
             | Clickhouse is a Yandex project and at Yandex, they
             | historically use C++ for almost everything, I guess it's
             | part of their culture (probably the founders were C++
             | programmers?) Their web services such as Yandex Taxi's
             | backend (Uber's equivalent) are also written in C++ which
             | is unusual for webdev nowadays.
        
               | tylerhannan wrote:
               | s/is a/was a/g
               | 
               | But more interesting, to me, is language adoption and
               | familiarity by region.
               | 
               | I have a bookmarked dev.to article from 2020 that
               | discussed programming language popularity by state -
               | https://dev.to/eduecosystem/what-is-the-most-popular-
               | program...
               | 
               | I'm uncertain if anyone has extrapolated that to more
               | geographic regions. It would be interesting.
        
         | berkle4455 wrote:
         | [flagged]
        
           | krvajal wrote:
           | And very bad at accounting :P
        
       | antonio2368 wrote:
       | As one of the contributors, I'm always happy to see interest and
       | people using it.
       | 
       | Keeper is a really interesting challenge and we're really open to
       | any kind of feedback and thoughts.
       | 
       | If you tried it out and have some feedback for it, I encourage
       | you to create an issue
       | (https://github.com/ClickHouse/ClickHouse), ask on our Slack,
       | ping me directly on Slack... (just don't call me on my phone)
       | 
       | And don't forget that it's completely open-source like ClickHouse
       | so contributors are more than welcome.
        
         | Redsquare wrote:
         | shared mergetree is not open!
        
           | jeremyjh wrote:
           | Can you elaborate? The software is distributed under an
           | Apache 2.0 license.
        
             | krvajal wrote:
             | he means the new SharedMergeTree, but that's clickhouse
             | specific.
        
           | jeremyjh wrote:
           | So someone who gives away free software must give away all
           | software they write forever?
        
       | xyzelement wrote:
       | I usually scoff at "written in.." part of such announcements,
       | because it is a sign that the author is focused on the input ("I
       | wrote this in X") not the output (value the user gets)
       | 
       | In this case though, the blog outlines specific reasons why this
       | had to be in C++ (interoperability w. their C++ codebase) as well
       | as benefits that are separate from the language.
        
         | The_Colonel wrote:
         | It's a huge turn-off for me as well, because I interpret it as
         | the main value it's supposed to deliver (which for me is 0 for
         | the most part). Not talking about this specific project, just
         | generally.
        
           | hodgesrm wrote:
           | In this case written in C++ is goodness. Also, having it be a
           | variant of ClickHouse server that can run embedded or
           | standalone is quite nice.
        
       ___________________________________________________________________
       (page generated 2023-09-27 23:01 UTC)