[HN Gopher] Verifying distributed systems with Isabelle/HOL
       ___________________________________________________________________
        
       Verifying distributed systems with Isabelle/HOL
        
       Author : eatonphil
       Score  : 117 points
       Date   : 2022-10-12 15:02 UTC (7 hours ago)
        
 (HTM) web link (lawrencecpaulson.github.io)
 (TXT) w3m dump (lawrencecpaulson.github.io)
        
       | whispersnow wrote:
       | Martin Kleppmann is the master of any complicated system charts!!
        
       | spinningslate wrote:
       | Enjoyed this. The underlying formalism seems based (I think?) on
       | concurrent, independently-executing processes that are stimulated
       | by messages. That has some similarities with the Actor model [0],
       | albeit Actors support concurrent, intra-process message handling
       | whereas this seems sequential at the individual process level. So
       | maybe more like Communicating Sequential Processes [1].
       | 
       | For anyone more knowledgable on the topic: is that right? And is
       | that model common across other proof systems like TLA?
       | 
       | Second (bonus) question: why separate messages from user input?
       | 
       | Thanks.
       | 
       | [0] https://en.wikipedia.org/wiki/Actor_model [1]
       | https://en.wikipedia.org/wiki/Communicating_sequential_proce...
        
         | mjb wrote:
         | > And is that model common across other proof systems like TLA?
         | 
         | Yes, it's pretty common, and is a really nice abstraction.
         | 
         | TLA+ doesn't use that model, although a lot of TLA+ code is
         | implemented that way. TLA+'s model is basically "shared memory"
         | where any step can access or change any state in any way it
         | likes. That's super flexible, but can feel a little too
         | flexible.
         | 
         | P, on the other hand, is based on a model of state machines
         | that communicate with messages. See this example for a clear
         | illustration of how that works:
         | https://p-org.github.io/P/tutorial/twophasecommit/
        
       | mcguire wrote:
       | " _Even though in reality processes may run in parallel, we do
       | not need to model this parallelism since the only communication
       | between processes is by sending and receiving messages, and we
       | can assume that a process finishes processing one event before
       | starting to process the next event. Every parallel execution is
       | therefore equivalent to some linear sequence of execution steps.
       | Other formalisations of distributed systems, such as the TLA+
       | language, also use such a linear sequence of steps._ "
       | 
       | Ha-hah! Proving that you can treat "processing one event" as
       | globally atomic was the funnest thing in grad school.
        
       | mjb wrote:
       | > For this reason, formal proofs of correctness are valuable for
       | distributed algorithms.
       | 
       | I have a massive amount of respect for Martin and his work, but I
       | think that this emphasis is the wrong one (if our goal is to
       | increase the correctness of deployed distributed algorithms).
       | 
       | Instead, I like to think (building off work of folks like Ankush
       | Desai, an AWS colleague and creator of P) that the process of
       | specification, and it's products, are more valuable than the
       | creation of a proof. Model checking - either exhaustive or
       | stochastic - is valuable for checking the properties of
       | specifications, but pushing the last step into full proof is
       | often not worth the additional effort. Instead, I think making
       | proof the focus and goal tends to turn most programmers off the
       | whole topic.
       | 
       | I think we should be saying to folks "If you pick up these tools
       | you'll end up with a crisp specification that'll help you move
       | faster at development time, extremely clear documentation of your
       | protocol, and a deeper understanding of what your protocol does"
       | instead of "if you pick up these tools you'll end up with the
       | mathematical artifact of a proof". This has been a big shift in
       | my own thinking. When I first picked up Promela, Alloy, and
       | TLA+/TLC/TLAPS, my own focus was on proof (and remained so until
       | 2014 ish, when I spent a lot of time talking to Chris Newcombe
       | and Leslie Lamport as we were writing the "Formal Methods at AWS"
       | CACM paper). Over time I've found the journey much more valuable
       | than the artifact, and tools like PlusCal and P which make
       | specification more approachable more valuable than tools with
       | less familiar syntax and semantics.
       | 
       | We shouldn't be surprised that few people can make progress when
       | we ask them to understand both Paxos and Isabelle at the same
       | time! Maybe if that's your focus as a PhD student, then it's OK.
       | 
       | > The real challenge in verifying distributed algorithms is to
       | come up with the right invariant that is both true and also
       | implies the properties you want your algorithm to have.
       | Unfortunately, designing this invariant has to be done manually.
       | 
       | This is true. Invariants are hard. But maybe I'm more optimistic:
       | I think that there are a lot of very real systems with
       | straightforward invariants. Lamport's three invariants for
       | consensus (section 2.1 of Paxos Made Simple), for example:
       | https://lamport.azurewebsites.net/pubs/paxos-simple.pdf
       | Similarly, invariants like Linearizability, and even formal
       | definitions of isolation.
       | 
       | The research I would _love_ to see is work to synthesize
       | invariants from formalisms like Adya 's or Crooks' work on
       | database isolation levels. I think that's a very tractable
       | project, and would be super useful to practitioners.
        
         | comfypotato wrote:
         | Are your focus and Dr. Kleppmann's mutually exclusive? I ask
         | because it seems like the Isabelle formulation emerges from the
         | protocol itself. The invariant is the only added work, yes?
         | 
         | I'm effectively asking if the proof generation could replace
         | the model checking as a way of verifying that you have
         | appropriately specified your protocol. Big information exchange
         | protocols can be confusing!
         | 
         | (Maybe I'm misunderstanding what an invariant is... I'm
         | thinking it's the property proved true by induction (and would
         | go something along the lines of "if and only if the good has
         | been sold, then the merchant has received payment for the
         | good"))
        
       | Nokurn wrote:
       | This is great stuff. I did something similar in a 2020 paper[1]
       | using Coq. I did not find Coq particularly suited for this type
       | of work. Isabelle/HOL looks promising.
       | 
       | One thing to note is that these types of verifications can become
       | difficult when the systems are layered. Even the simplest real
       | world systems are layered, when you factor in the underlying
       | transport for the messages. The TLC paper provided a way for
       | distributed systems to be composed in a way that reduces the
       | complexity of the proofs for each distributed component.
       | 
       | [1] https://dl.acm.org/doi/10.1145/3409005
        
       | allanrbo wrote:
       | For context, Kleppmann is the author of the book Designing Data-
       | Intensive Applications (DDIA). https://dataintensive.net/
        
       | oa335 wrote:
       | I'm interested in learning how to formally express distributed
       | systems but have limited time. Does anyone have any insight into
       | which would be better to learn - TLA+/PlusCal (e.g. Hillel
       | Wayne's courses) or Isabelle/HOL?
        
         | ocschwar wrote:
         | TLA+ definitely. The lowest rung on the ladder is actually
         | reachable.
        
       | javcasas wrote:
       | Martin Kleppmann verifying distributed systems? That's as good as
       | it can get.
        
         | eatonphil wrote:
         | With a proof assistant written in Standard ML. :)
         | 
         | https://isabelle.in.tum.de/
        
           | javcasas wrote:
           | Yeah, a distributed system with proofs of it working under
           | some specific assumptions. This is the future. It's a future
           | hard and complicated because the tools and the expertise are
           | not here yet, but definitely the future.
        
       | preseinger wrote:
       | My perspective from industry is that we're not really suffering
       | for a lack of formal verification, or anything at the
       | specification level. Basically all of the meaningful problems
       | that need help and attention are with the implementations. I've
       | been on both sides, and I can say it's much harder to implement a
       | bug free distributed system for large scale production use than
       | to specify one.
        
       ___________________________________________________________________
       (page generated 2022-10-12 23:00 UTC)