[HN Gopher] Implementing a distributed key-value store on top of...
       ___________________________________________________________________
        
       Implementing a distributed key-value store on top of implementing
       Raft in Go
        
       Author : simjue
       Score  : 219 points
       Date   : 2023-05-25 13:26 UTC (9 hours ago)
        
 (HTM) web link (notes.eatonphil.com)
 (TXT) w3m dump (notes.eatonphil.com)
        
       | didip wrote:
       | Big fan of Phil!
       | 
       | A lot of us tinkered like him, but very few came back and write
       | the findings nicely in an easy to read form.
        
         | eatonphil wrote:
         | Glad to hear it! :)
        
       | yi_xuan wrote:
       | Recently, I have been watching the course MIT6.824. This article
       | appeared very timely :)
       | 
       | Here is another raft implementation in Go
       | https://github.com/eliben/raft
        
       | siftrics wrote:
       | This is essentially do-it-yourself etcd.
       | 
       | https://github.com/etcd-io/etcd
        
         | esafak wrote:
         | And tidb
        
           | moderation wrote:
           | TIKV [0] is closer (written in Rust).
           | 
           | 0. https://github.com/tikv/tikv
        
       | bit_flipper wrote:
       | Your note about encoding/gob being inefficient is somewhat
       | accurate for how you're using it, but I want to talk a bit about
       | how you could improve your use.
       | 
       | encoding/gob is intended for streams, not stateless
       | marshals/unmarshals. The first thing that is sent over the stream
       | is the type information the receiver should expect, that's why
       | your payload was so large. After the first type is received,
       | subsequent messages are much smaller. You can see this by
       | extending your example to do multiple writes; each write after
       | the first is only 10 bytes: https://play.golang.com/p/Po_iaXrTUER
       | 
       | You have to plan differently, but you could get large
       | improvements to transmission sizes by changing to append only
       | files and creating the gob encoder once per file. If you find
       | you're creating a gob encoder/decoder very often, that's a
       | telltale sign you're not using it as intended.
        
         | eatonphil wrote:
         | Another thing I glossed over (unintentionally) is that I only
         | started limiting batch size after I switched to the custom
         | encoder. So persist() being a function of length wouldn't be
         | quite true anymore.
         | 
         | However, I still keep seeing encoding/gob high up in the
         | profiler taking a lot of time doing reflection during RPC
         | calls.
         | 
         | So it does still seem like it's not ideal. Though I may still
         | just not be understanding how to use net/rpc correctly either.
        
         | candiddevmike wrote:
         | > encoding/gob is intended for streams, not stateless
         | marshals/unmarshals
         | 
         | I don't understand this within the context of the rest of your
         | comment. I use gob for marshaling stuff to storage all the
         | time, I'm not aware of a better way to do that (serialize data
         | to binary).
        
           | bit_flipper wrote:
           | Sorry, I should have been more precise. encoding/gob is not
           | optimized for situations where you create an encoder or
           | decoder, read/write a single value, then discard that
           | encoder/decoder. As the author noted, payloads for a single
           | call to Encode() are quite large. Additionally, re-
           | instantiating a gob encoder for each call to Encode() is very
           | expensive allocation-wise and benchmarks where this happens
           | will show gob to perform poorly in these scenarios. You can
           | certainly still use gob this way, and if the performance
           | works for you then have at it! But it performs significantly
           | better in situations where you make multiple calls to
           | Encode() with the same encoder.
        
           | throwaway894345 wrote:
           | The parent is saying that gob includes the type information
           | in the serialized payload. This is extraneous information in
           | many cases, and the result is an unnecessarily large payload.
        
       | dotnwat wrote:
       | epic
        
       | cheeseprocedure wrote:
       | I was lucky enough to take David Beazley's "Rafting Trip," a
       | five-day training course that guides each student through
       | building their own Raft implementation from scratch:
       | 
       | https://www.dabeaz.com/raft.html
       | 
       | I'd recommend the course to anyone with development experience
       | working with or near distributed systems. David is a fantastic
       | instructor and facilitator, and the blend of student backgrounds
       | led to some great learning and discussion.
       | 
       | (I have no financial or personal interest here; I just loved the
       | course.)
        
       | geospeck wrote:
       | Thanks for the post!
       | 
       | Another great blog post series about implementig Raft in Go that
       | I found is this one
       | https://eli.thegreenplace.net/2020/implementing-raft-part-0-...
        
         | eatonphil wrote:
         | Yes I definitely referred to his posts and project while
         | working on this. I'm a big fan of his blog!
        
       | Xeoncross wrote:
       | These kinds of posts are my favorite part of HN. The deep dives
       | into LLM's, state machine theory, fourier transforms, locking
       | data structures, and memory allocation that is miles above the
       | basic posts around the internet you get with searching - but not
       | quite a full book yet.
       | 
       | The author spent 7 months tinkering and cared enough to come back
       | and share that with us.
        
         | beoberha wrote:
         | Phil rocks. Highly suggest following him on twitter. He's
         | pretty outspoken about writing. This is a great post of his
         | that I need to reflect on more: https://notes.eatonphil.com/is-
         | it-worth-writing-about.html
        
           | eatonphil wrote:
           | Glad that post resonated! <3
        
         | quaintdev wrote:
         | I wish there was a separate catalog of these posts which I
         | could browse through.
        
       | restlake wrote:
       | Cool post. Wonder if it would have helped to take a look at the
       | MIT distributed systems course on the web and YouTube - one of
       | the projects is exactly this (Go Raft implementation)
       | 
       | https://pdos.csail.mit.edu/6.824/labs/lab-shard.html
        
         | eatonphil wrote:
         | I just don't love learning from videos personally. But I'm sure
         | those are good if you do!
        
       | powerset wrote:
       | I highly recommend MIT open courseware 6.824. Incredibly valuable
       | for learning distributed systems, and one of the lab assignments
       | is implementing raft in Go.
       | http://nil.csail.mit.edu/6.824/2022/schedule.html
       | 
       | There are a ton of fascinating and potentially frustrating edge
       | cases and gotchas to implementing raft correctly. There's no
       | better way to understand it than actually implementing it, and I
       | probably never would have done it myself without these course
       | materials.
        
         | rochak wrote:
         | Unfortunately, I hit a roadblock while implementing the Raft
         | assignment. I knew it was simply beyond my capabilities but
         | would have made through if I had anyone I could reach out to. I
         | second this recommendation, but make sure you know what you are
         | signing up for. This course is as hard as they come.
        
           | valzam wrote:
           | I have found the performance tests very tricky to get to pass
           | without having any input from others. The assignment is
           | really very unforgiving, I would wager the test suite is
           | comparable to how commercial Raft implementations are tested
           | (e.g. https://github.com/hashicorp/raft)
        
       | leetrout wrote:
       | So kinda like "build your own consul"
        
         | eatonphil wrote:
         | I'm pretty sure consul builds on top of hashicorp's Raft
         | implementation so I'd say end result, yes, but that this post
         | goes into the Raft implementation.
         | 
         | And I'm sure consul does plenty my silly key-value state
         | machine doesn't.
         | 
         | I'm not super familiar with it.
        
         | roncesvalles wrote:
         | Reminded me of etcd, which is exactly this - Raft-based key-
         | value store written in Go.
         | 
         | https://github.com/etcd-io/etcd
        
       | fredski42 wrote:
       | Here is nice interactive and visual representation of how raft
       | works: http://thesecretlivesofdata.com/raft/
        
         | thinkharderdev wrote:
         | That's really cool!
        
       | hintymad wrote:
       | A side topic: how can a company find people like the author, who
       | can really articulate how to build a system? I've been
       | interviewing senior engineers for years, and more than 95% of
       | them, if not more, are really box drawers. They throw terms
       | around confidently but fail miserably the moment I ask for
       | specifics. No, I don't ask for details like how Raft works. That
       | level of details probably would fail all the candidates. Just
       | simple things like how you organize the data, how you route the
       | traffic, how you manage metadata, or how your data path keeps
       | metadata in sync. Really basic ones, yet most interviewers fail.
       | Indeed, the more senior they are, the more likely they lose touch
       | of the details -- even those who claim to work on distributed
       | databases.
        
         | pavluha wrote:
         | Try to interview more senior people, who still remember their C
         | course in an University, or even know what is assembler.
         | Currently, most graduates (from my experience) don't know how
         | an integer is stored in memory.
        
           | mydogcanpurr wrote:
           | I don't even know what a good answer to that is. Some
           | integers are stored in registers. Some are stored on the
           | stack. Some are stored on the heap. Or you could be asking
           | how integers are represented, with two's complement being the
           | usual way. Did I pass your ambiguous question? Probably not.
        
         | qznc wrote:
         | You find them via their blogs.
        
         | bombela wrote:
         | Where do you find the jobs where people actually care about
         | engineering? Because real work is mostly careful plumbing
         | without breaking what works, but in interviews you must
         | leetcode. Real systems only work when kept reasonably simple,
         | but system design interviews are all about drawing boxes.
        
         | BossingAround wrote:
         | I recently interviewed with a company, and the interviewer
         | asked me "how do you organize data." I wasn't sure if they
         | wanted to talk about classes, modules, databases, k-v stores,
         | hashing data and routing distributed requests to the same pods.
         | I asked, and they answered "I mean in general, how do you
         | organize data."
         | 
         | After talking for a bit about pretty much all of the above, the
         | interviewer asked "have you used dictionaries?"
         | 
         | The reason I'm telling the story is, if a lot of your
         | candidates fail to answer your questions, the problem might be
         | in the question.
        
           | skippyboxedhero wrote:
           | I have had that exact experience before. It is fine, you
           | realise the company is toasted and you move on (the place
           | that asked me this had just got rinsed by an offshore
           | consulting firm, ecomm site built with a graph database,
           | comedic stuff).
        
           | maximinus_thrax wrote:
           | > After talking for a bit about pretty much all of the above,
           | the interviewer asked "have you used dictionaries?"
           | 
           | Hahaha classic.
           | 
           | I was once in an interview (which I failed) and was asked a
           | problem related to some sort of monotonic queue. I wrote the
           | solution and was going through it, when the interviewer
           | asked: 'what CS concept are you using in this solution'? I
           | didn't really understand what he was asking for and in turn I
           | asked for more clarification like 'Are you referring to the
           | data structures I used? Or the technique (it was some sort of
           | greedy)?' 'no'. After a couple of back-and-forth questions
           | and answers, he finally tells me he was looking for me to say
           | 'a state machine'. Seriously?
        
           | no_wizard wrote:
           | I would have halted for a moment and asked something like:
           | _organize data for what purpose?_ At which point I would
           | expect there to be some amount of clarification. As it sounds
           | to me like they 're talking about organizing some set of data
           | for lookup, as opposed to organizing data around how it flows
           | through an application, for example
        
             | aranw wrote:
             | I often find when you are given questions like that no
             | matter how you probe you get a really defensive interviewer
             | who doesn't want to give away the answer they are
             | expecting. I've been in very similar situations with vague
             | questions and I've tried to probe for more details and just
             | been met with defensiveness with an attitude that says they
             | expect me to know exactly what they mean
        
               | d0mine wrote:
               | It is worse: your attempt to clarify the question may be
               | regarded by some interviewers as a sign you are not
               | senior dev (they think [erroneously in my opinion] that
               | "senior" means that you must figure out fine details
               | yourself).
        
               | maximinus_thrax wrote:
               | One a rare occasion I actually got feedback after the
               | interview (it was for a startup), It was because I asked
               | 'too many questions' it was decided that I'd need way
               | more mentoring than they can afford for the senior
               | position I was applying for.
        
               | no_wizard wrote:
               | This seems odd to me, though its entirely possible my
               | experience thus far hasn't been this. I've been told at
               | the last 3 places I worked that one of the reasons I was
               | hired was how inquisitive I was during the interview
               | stage, asking _lots_ of questions before coming up with
               | solutions
               | 
               | I am most definitely not ruling out what you're saying
               | may be common case and I am just very lucky, but I do
               | want to act as a sort of counter point to see if others
               | can weigh in so we can get more of an industry sense
               | around this
        
               | zimpenfish wrote:
               | Anecdatum: my experience has been the same as others -
               | asking questions about the interview questions generally
               | leads to a bad outcome. In 25 years, I reckon it's less
               | than a handful of times where asking questions of the
               | interviewers has actually got a positive response.
               | 
               | Hell, even when in jobs, asking questions about projects
               | I'm assigned has sometimes got a negative response...
        
           | pavluha wrote:
           | There are plenty of different types of dictionaries,
           | organizing their data differently.
        
         | nitwit005 wrote:
         | You seem confident you wouldn't have failed the author of this
         | blog post, but I wouldn't be so confident.
         | 
         | Interviews are a high pressure situation where people want an
         | immediate answer, without much of any time to think, let alone
         | research options or run an experiment. You get people's
         | absolute worst possible results.
        
         | convolvatron wrote:
         | I have worked in distributed databases. I can't seem to find
         | any host organization that's actually building anything and
         | really cares about those details. I know several other senior
         | systems engineers in the same boat. where are you guys hiding?
        
         | bastardoperator wrote:
         | Maybe you're the problem? Looking at your questions, you're
         | asking for opinions, not answers or solutions. Putting people
         | on the spot is bad form in my opinion. If I have a technical
         | challenge I want you to solve, I'll give you the time for that
         | so I can actually see your work and discuss it with you.
         | 
         | I've seen people that can talk the talk but not walk the walk
         | and vice versa, they can't articulate well, but their approach
         | and work speaks for itself.
        
           | hintymad wrote:
           | Maybe I am. For a senior enough role, my questions are open
           | ended. I'd expect the candidate to drive conversations and
           | examine alternatives. I ask follow-up questions based on what
           | the candidate mentions. Yeah, such may be indeed their
           | opinions, but at least they should be opinions derived from
           | assumptions, constraints, and fundamentals.
        
             | darth_avocado wrote:
             | That's a terrible way to interview. Interviews are high
             | stress, time constrained settings where you're talking to
             | people you don't know. If you want thoughtful answers, you
             | need to be precise in what you're asking. You can't be like
             | "tell me about yourself" and then complain I didn't get
             | from the candidate whether steak is their favorite food.
        
         | skippyboxedhero wrote:
         | Do you understand the difference between writing a blog post
         | about something and being able to talk about it in an
         | interview? The majority of work in software is not being an
         | "expert" on any one thing because the one thing that employers
         | require is changing quite frequently. Rather it is being able
         | to build up experience and carrying knowledge forward to new
         | areas.
         | 
         | Most technical interviews are conducted in a completely inane
         | way where the goal is to memorize pieces of information. If
         | that was how software worked, the best engineer would just be
         | Google. 95% of people who know enough to build this system
         | would likely be unable to answer your questions because, more
         | than likely, they do not currently work in a job that requires
         | them to use this information every day. This does not mean they
         | do not understand it or are unable to build it.
         | 
         | The reason why companies can't find people like this is a
         | combination of not understanding what software development is
         | (and not understanding people, it is really the blind leading
         | the blind but technical people are often as bad as
         | interviewing).
         | 
         | You are asking completely wrong questions. You aren't starting
         | from the right place.
        
           | hintymad wrote:
           | > Rather it is being able to build up experience and carrying
           | knowledge forward to new areas.
           | 
           | Then I'd expect the candidate could articulate her experience
           | and knowledge in some way, no? Otherwise, how would I know
           | the candidate has the built-up expertise? Of course, I assume
           | we can only have interviews. Otherwise, we can have other
           | means, like mini-project, onsite project, or a writeup. Some
           | candidates do like the alternatives more, and some not.
           | 
           | > where the goal is to memorize pieces of information
           | 
           | I disagree. The goal is to see if a candidate does have what
           | the candidate herself claims to know. I find it hard to
           | imagine that a candidate claims to be an expert in a field
           | yet couldn't articulate even one thing in depth for hours if
           | not days, let alone 30 minutes of interview time. Note this
           | is not about any specific details, but the general picture
           | and insights that an expert can convey. This is like a PhD
           | oral defense. The candidate talks about the topic that _the
           | candidate_ is familiar with, and the professors dive in on
           | such topics. I don 't see what's wrong with that.
        
             | skippyboxedhero wrote:
             | Candidates aren't defending a PHd. The interviewer is
             | nowhere near competent enough for this.
             | 
             | > The goal is to see if a candidate does have what the
             | candidate herself claims to know.
             | 
             | The process you are using does not tell you that. You will
             | notice that your explanation is full of things that you
             | expect, not based in an understanding of how reality
             | actually works. And the result is, unsurprisingly, one
             | where you do not understand the output.
        
       | samsquire wrote:
       | Thank you for the write up eatonphil.
       | 
       | I experimentally implemented Raft in Java but I am not very
       | confident that I did it correctly.
       | 
       | I wish there was a way to implement stateful programs that
       | guarantee "forward progress" and are "steady state systems". I
       | think essentially a state machine that cannot be trapped in a
       | state. Debugging the absence of something of forward moving
       | progress or lack of causation is very difficult.
       | 
       | When there's essentially different actors in the system and they
       | can interact with eachother by communicating, they each have a
       | number of states they can get into. There's no guarantee that the
       | system shall converge on a state that forward progress can be
       | made. Maybe TLA+ is the right answer here.
       | 
       | YMMV but I think (my) reasoning over stateful systems is rather
       | difficult, I think there's lots of hidden states that we cannot
       | easily detect or reason about because they're in our blind spots.
       | Especially related to synchronization. I think it's part of what
       | makes multithreading and distributed systems so hard, because
       | every component can be in a different state and if something is
       | not where it is expected to be, the baton doesn't get passed to
       | the correct state. If you check for something too early, you have
       | a race condition.
       | 
       | If we could see in slow motion what was going on, an interaction
       | between different actors, we could work out why something happens
       | the way it does. But usually the logs are too numerous to get to
       | this detail. I think animation can save us, but what does a Raft
       | animation look like?
       | 
       | How often have you seen an endless spinner? It's as if a
       | completion event was raised but didn't get detected and the
       | system is waiting for something that shall never occur. I want
       | this kind of error to be impossible. This is one form of hidden
       | state that prevents progress.
       | 
       | I wrote an eventually consistent mesh protocol in Python and
       | tested it with Jepsen, it is not linearizable because the
       | consistency level is "eventually consistent".
       | 
       | I don't understand how Raft can scale writes or reads across
       | multiple machines due to the round trip time talking to other
       | nodes.
        
         | qznc wrote:
         | I think you should try TLA+. I found it surprisingly easy:
         | https://beza1e1.tuxen.de/tla-plus.html
         | 
         | Still haven't found an opportunity to use it professionally
         | though.
        
         | no_wizard wrote:
         | Sounds like a job for state machines like you can build out
         | with a library like xstate[0] (though I'm sure there are
         | similar libraries in whatever language you choose. Python has
         | one called automat[1])
         | 
         | These exist to formalize state logic (current state, computing
         | state, transitioning state etc), you can even produce diagrams
         | based on their definitions. Advanced libraries like xstate even
         | have Actors are part of the core of the library
         | 
         | [0]: https://stately.ai/docs/xstate
         | 
         | [1]: https://github.com/glyph/Automat
        
         | tbrockman wrote:
         | Don't have anything to comment on the rest, but my
         | understanding with respect to:
         | 
         | > I don't understand how Raft can scale writes or reads across
         | multiple machines due to the round trip time talking to other
         | nodes.
         | 
         | Is that at the point where that becomes a bottleneck you scale
         | to multiple clusters, where the read/write destination cluster
         | is determined by a key and whatever your preferred hashing
         | mechanism is.
        
         | DylanSp wrote:
         | Microsoft has a library/tool called Coyote* that helps with
         | testing distributed systems; you can write
         | tests/specifications, Coyote will systematically explore
         | nondeterminism in your system and check if your tests still
         | pass. If there's a failure, it'll show the sequence of events
         | that led to the failing test.
         | 
         | I started a project to implement Raft with a KV-store on top,
         | similar to the article, meaning to use Coyote to test it; I
         | didn't get that far before losing interest, though. It's
         | reassuring to read that it took Phil several months to write
         | the code in the post, it's good to know that this is a
         | decidedly nontrivial problem.
         | 
         | * https://github.com/microsoft/coyote
        
       ___________________________________________________________________
       (page generated 2023-05-25 23:00 UTC)