hngopher.com

       [HN Gopher] Paxos
       ___________________________________________________________________
        
       Paxos
        
       Author : joeyespo
       Score  : 83 points
       Date   : 2022-01-06 19:14 UTC (3 hours ago)
        
 (HTM) web link (martinfowler.com)
 (TXT) w3m dump (martinfowler.com)
        
       | tombert wrote:
       | I have a bit of a man-crush on Leslie Lamport (and have
       | (unsuccessfully) tried to get TLA+ to be used at every job I've
       | had for the last five years), and I have read through both the
       | original Paxos paper in addition to the Paxos Made Simple paper,
       | and I'm just going to say that I still found it about five times
       | harder to grok than Raft. Honestly the TLA+ specification of
       | Paxos made it click more than the papers for me, but that could
       | just be because of the repeated readings before.
       | 
       | It's a beautiful algorithm, but Raft has the advantage of being a
       | lot simpler.
        
         | uvdn7 wrote:
         | I am also a huge fan of Lamport. Paxos is so concise and of
         | such beauty that once you understand it, all other consensus
         | protocols seem just different ways of phrasing Paxos with
         | additional features (e.g. membership changes, make it into a
         | log, etc.). Paxos is the smallest, most elegant consensus
         | building block that I have seen so far. Raft's protocol covers
         | a much bigger area than just consensus (it gives you a log,
         | supports reconfiguration, etc.). Paxos' entire protocol fits a
         | single slide. Period.
         | 
         | I wrote a few posts about how to understand paxos, and this is
         | my personal favorite - https://blog.the-pans.com/understanding-
         | paxos/.
         | 
         | Lamport did comment on it (via email) saying that understanding
         | Paxos is less important than proving it works (via TLA+).
        
           | MatteoFrigo wrote:
           | Your view that Paxos is an algorithm for read-modify-write is
           | a really deep insight. Once you adopt this point of view,
           | things like Raft are just RMW updates of a log. (I like to
           | think of Paxos as the fault-tolerant version of the load-
           | linked/store-conditional pair of instructions.)
           | 
           | However, there is a difficulty in expressing exactly what
           | RMW-Paxos does, because it may end up "modifying" a value
           | that was never "committed".
           | 
           | For example, assume that I have three acceptors storing a
           | replicated counter. The proposer executes phase-1, increments
           | the value received from the acceptor with the highest ballot,
           | and sends the value to all acceptors in phase-2. Consider now
           | an execution where all acceptors initially hold 0. The
           | proposer manages to store a new value 1 in one acceptor, and
           | then crashes. Thus, 1 was not committed and a reader may
           | conclude that the consensus value is still 0. Now a new
           | proposer arrives, receives (1, 0, 0) from the three
           | acceptors, chooses 1 as the newest value, and writes (2, 2,
           | 2) to all acceptors. Thus "2" is now committed, and the
           | commit of the "2" retroactively commits the "1". Thus, the
           | condition for being committed is no longer "there exists a
           | phase-2 quorum ...", but it is more complicated and depends
           | on future history.
           | 
           | I have verified with TLA+ that this algorithm does indeed
           | implement an atomic read-modify-write operation, but only
           | under a complicated notion of being "committed" that boils
           | down to either having a phase-2 quorum, or one of the
           | successor operations having a phase-2 quorum.
           | 
           | Did you ever figure out a simpler way to say this?
        
             | iambvk wrote:
             | Instead of Read-Modify-Write, I like to think of it as
             | Read/Determine/Write cycle.
             | 
             | A note worthy point is that, Paxos actually expects
             | proposers to _drop_ their value and _carry-forward_
             | someone-else 's value to complete the consensus. This is
             | typically not an intended behavior from client's point of
             | view, but proposers role is to form the consensus, not push
             | their value.
             | 
             | Read phase is basically understanding if there is any value
             | that could've been chosen when F nodes are unavailable.
             | 
             | Determine phase is to choose to carry-forward any value
             | (if-any) from the Read phase or use the new value from the
             | client.
             | 
             | Write phase is to actually ask majority of the acceptors to
             | save the value selected in the Determine phase.
        
               | MatteoFrigo wrote:
               | You are totally correct that in Paxos the proposer is
               | supposed to "carry-forward" the value proposed by
               | somebody else. In this view, which is how everybody
               | understands Paxos, Paxos is an algorithm for consensus.
               | 
               | However, the proposer can also _modify_ the value
               | proposed by somebody else, and it turns out that this is
               | a perfectly valid algorithm for implementing a
               | distributed read-modify-write. This variant does more
               | than just consensus---it basically behaves like a fault-
               | tolerant memory with an atomic update operation (think
               | compare-and-swap or load-linked /store-conditional). In
               | his blog post, GP also mentions that paxos is a read-
               | modify-write transaction, but perhaps I read too much
               | into what he is saying. Either way, this RMW variant is
               | correct, I have verified it exhaustively with TLA+, and
               | it is used in a few production systems that I have
               | implemented.
               | 
               | The difficulty is now to say exactly what this RMW
               | variant does. I was hoping that GP (or anybody else) may
               | have some insights.
        
           | tombert wrote:
           | > Lamport did comment on it (via email) saying that
           | understanding Paxos is less important than proving it works
           | (via TLA+).
           | 
           | I suppose that's not wrong; if I know for a fact that
           | something is going to behave how I expect it too, I don't
           | generally care how it works. I can treat the algorithm as a
           | black box and just go from there.
           | 
           | That said, I dont' completely agree. If I am implementing
           | Paxos in NewLangOfTheMonth, it's likely that a straight one-
           | to-one port of another version will not fully work due to
           | tiny differences in the language semantics. If I don't
           | understand Paxos, then it's unlikely I'll be able to correct
           | any mistakes in the implementation.
        
           | iambvk wrote:
           | I totally agree. Paxos is such a simple solution that it is
           | so amazing.
        
         | loquor wrote:
         | > I have a bit of a man-crush on Leslie Lamport
         | 
         | +1 - I really admire him too. I wonder, what makes people like
         | him and Edsger Dijkstra so charismatic in this sense?
        
           | User23 wrote:
           | Dijkstra was and Lamport is very good at writing clearly and
           | distinctly. Many people with that level of raw brainpower
           | don't make that effort, so the ones that do are easy to
           | admire and be grateful to.
        
           | tombert wrote:
           | I think a lot of it has to do with the fact that he's a very
           | prolific writer and speaker, in addition to contributing to a
           | fairly diverse set of Computer Science fields.
           | 
           | I usually enjoy Lamport's speeches, and I have said before
           | that his book Specifying Systems should be mandatory reading
           | for nearly anyone interested in CS, due to how approachable
           | it is, and yet how deep it goes.
        
         | solidangle wrote:
         | You might be interested in this paper [1] which presents Paxos
         | in the same way as Raft is presented in the Raft paper. In my
         | view it really illustrates that most of the simplicity of Raft
         | comes from its excellent presentation and not necessarily from
         | its algorithmic simplicity.
         | 
         | [1]https://arxiv.org/pdf/2004.05074
        
       | kkwteh wrote:
       | Paxos has a reputation of being hard to understand. I was under
       | the impression that Raft was now preferred for distributed
       | consensus. The creators of the Raft algorithm explicitly had
       | understandability as a goal when designing it and called out
       | Paxos as being hard to understand.
       | 
       | https://pdos.csail.mit.edu/6.824/papers/raft-extended.pdf
        
         | solidangle wrote:
         | (Multi-)Paxos and Raft are incredibly similar. I would even
         | argue that Raft is just a variant of Paxos in the same way that
         | e.g. Fast Paxos and EPaxos are variants of Paxos. Most of the
         | Raft algorithm is identical to the Paxos algorithm. The main
         | difference between the two algorithms is how leadership
         | election works. In my view the main reason why Raft is
         | considered to be simpler is because it was presented in a
         | really clear paper. Paxos becomes equally simple when using a
         | similar representation as used in that paper. See this paper:
         | https://arxiv.org/pdf/2004.05074
        
       | daveevad wrote:
       | My grey matter is failing me but does anyone remember a rant
       | about 5-10 years ago about developing software in this era that
       | had some punch line about consensus algorithms that went a little
       | like, "so, there's this guy named Diego?"
       | 
       | edit: found it
       | 
       | https://circleci.com/blog/its-the-future/
        
         | lowbloodsugar wrote:
         | >Why don't I just use Google's thing?
         | 
         | >-You think that's going to be around in 6 months?
         | 
         | bahaha.
        
       | jldugger wrote:
       | > Unmesh Joshi
       | 
       | It feels kinda strange to see an author other than Martin Fowler
       | blogging on martinfowler.com....
        
         | DerpyBaby123 wrote:
         | Maybe so, although he has been featuring other authors on his
         | blog regularly now. He works with them to keep the quality
         | high, check them out.
        
         | Jtsummers wrote:
         | This came up with another martinfowler.com post, and I checked
         | the past couple years. Roughly half the content is Fowler's,
         | and the other half are other writers (from memory, not
         | recounting it now). I don't know the ratio going back prior
         | years (only checked 2021 and 2020) but I've seen lots of posts
         | by others on his site for the past few years.
         | 
         | https://martinfowler.com/articles/patterns-of-distributed-sy...
         | - other content by the same author under title _Patterns of
         | Distributed Systems_.
        
       | throwaway984393 wrote:
       | I think a flow chart would have been easier to understand, or
       | perhaps a video or .gif
        
         | uvdn7 wrote:
         | No, it doesn't. A video might help you _feel_ like you
         | understand it but it doesn't really help you understand it.
         | That's just not how distributed systems work.
        
           | spicybright wrote:
           | Now that's a little silly to say, some people grok more from
           | video, some more from articles with pictures.
        
       | rdtsc wrote:
       | Is there a mistake on the second diagram where prepare(1,e) is
       | sent to Byzantium? Wonder if it should be prepare(1,a), since
       | Athens hasn't learned the (1,e) value which only Ephesus and
       | Delphi know about at that point?
       | 
       | Otherwise I think it's a pretty good description of Paxos. I like
       | that it walks through a good number of failure scenarios and
       | mentions interesting corner cases such as a lack of an explicit
       | commit in the original papers and that reading also needs to run
       | the full Paxos algorithm.
        
         | Jtsummers wrote:
         | It does seem to be a typo, based on the table and text below
         | it.
        
       | arsh wrote:
       | There are huge gaps between the scientific paper and how you
       | actually operate and scale Paxos. This can be seen in Paxos Made
       | Live - An Engineering Perspective by Tushar Chandra et.al.
       | 
       | Similarly, it also has real drawbacks like being slow, or lacking
       | liveness guarantees.
       | 
       | See other alternatives in https://aws.amazon.com/builders-
       | library/leader-election-in-d...
        
       ___________________________________________________________________
       (page generated 2022-01-06 23:00 UTC)