[HN Gopher] Verifying distributed systems with Isabelle/HOL ___________________________________________________________________ Verifying distributed systems with Isabelle/HOL Author : eatonphil Score : 117 points Date : 2022-10-12 15:02 UTC (7 hours ago) (HTM) web link (lawrencecpaulson.github.io) (TXT) w3m dump (lawrencecpaulson.github.io) | whispersnow wrote: | Martin Kleppmann is the master of any complicated system charts!! | spinningslate wrote: | Enjoyed this. The underlying formalism seems based (I think?) on | concurrent, independently-executing processes that are stimulated | by messages. That has some similarities with the Actor model [0], | albeit Actors support concurrent, intra-process message handling | whereas this seems sequential at the individual process level. So | maybe more like Communicating Sequential Processes [1]. | | For anyone more knowledgable on the topic: is that right? And is | that model common across other proof systems like TLA? | | Second (bonus) question: why separate messages from user input? | | Thanks. | | [0] https://en.wikipedia.org/wiki/Actor_model [1] | https://en.wikipedia.org/wiki/Communicating_sequential_proce... | mjb wrote: | > And is that model common across other proof systems like TLA? | | Yes, it's pretty common, and is a really nice abstraction. | | TLA+ doesn't use that model, although a lot of TLA+ code is | implemented that way. TLA+'s model is basically "shared memory" | where any step can access or change any state in any way it | likes. That's super flexible, but can feel a little too | flexible. | | P, on the other hand, is based on a model of state machines | that communicate with messages. See this example for a clear | illustration of how that works: | https://p-org.github.io/P/tutorial/twophasecommit/ | mcguire wrote: | " _Even though in reality processes may run in parallel, we do | not need to model this parallelism since the only communication | between processes is by sending and receiving messages, and we | can assume that a process finishes processing one event before | starting to process the next event. Every parallel execution is | therefore equivalent to some linear sequence of execution steps. | Other formalisations of distributed systems, such as the TLA+ | language, also use such a linear sequence of steps._ " | | Ha-hah! Proving that you can treat "processing one event" as | globally atomic was the funnest thing in grad school. | mjb wrote: | > For this reason, formal proofs of correctness are valuable for | distributed algorithms. | | I have a massive amount of respect for Martin and his work, but I | think that this emphasis is the wrong one (if our goal is to | increase the correctness of deployed distributed algorithms). | | Instead, I like to think (building off work of folks like Ankush | Desai, an AWS colleague and creator of P) that the process of | specification, and it's products, are more valuable than the | creation of a proof. Model checking - either exhaustive or | stochastic - is valuable for checking the properties of | specifications, but pushing the last step into full proof is | often not worth the additional effort. Instead, I think making | proof the focus and goal tends to turn most programmers off the | whole topic. | | I think we should be saying to folks "If you pick up these tools | you'll end up with a crisp specification that'll help you move | faster at development time, extremely clear documentation of your | protocol, and a deeper understanding of what your protocol does" | instead of "if you pick up these tools you'll end up with the | mathematical artifact of a proof". This has been a big shift in | my own thinking. When I first picked up Promela, Alloy, and | TLA+/TLC/TLAPS, my own focus was on proof (and remained so until | 2014 ish, when I spent a lot of time talking to Chris Newcombe | and Leslie Lamport as we were writing the "Formal Methods at AWS" | CACM paper). Over time I've found the journey much more valuable | than the artifact, and tools like PlusCal and P which make | specification more approachable more valuable than tools with | less familiar syntax and semantics. | | We shouldn't be surprised that few people can make progress when | we ask them to understand both Paxos and Isabelle at the same | time! Maybe if that's your focus as a PhD student, then it's OK. | | > The real challenge in verifying distributed algorithms is to | come up with the right invariant that is both true and also | implies the properties you want your algorithm to have. | Unfortunately, designing this invariant has to be done manually. | | This is true. Invariants are hard. But maybe I'm more optimistic: | I think that there are a lot of very real systems with | straightforward invariants. Lamport's three invariants for | consensus (section 2.1 of Paxos Made Simple), for example: | https://lamport.azurewebsites.net/pubs/paxos-simple.pdf | Similarly, invariants like Linearizability, and even formal | definitions of isolation. | | The research I would _love_ to see is work to synthesize | invariants from formalisms like Adya 's or Crooks' work on | database isolation levels. I think that's a very tractable | project, and would be super useful to practitioners. | comfypotato wrote: | Are your focus and Dr. Kleppmann's mutually exclusive? I ask | because it seems like the Isabelle formulation emerges from the | protocol itself. The invariant is the only added work, yes? | | I'm effectively asking if the proof generation could replace | the model checking as a way of verifying that you have | appropriately specified your protocol. Big information exchange | protocols can be confusing! | | (Maybe I'm misunderstanding what an invariant is... I'm | thinking it's the property proved true by induction (and would | go something along the lines of "if and only if the good has | been sold, then the merchant has received payment for the | good")) | Nokurn wrote: | This is great stuff. I did something similar in a 2020 paper[1] | using Coq. I did not find Coq particularly suited for this type | of work. Isabelle/HOL looks promising. | | One thing to note is that these types of verifications can become | difficult when the systems are layered. Even the simplest real | world systems are layered, when you factor in the underlying | transport for the messages. The TLC paper provided a way for | distributed systems to be composed in a way that reduces the | complexity of the proofs for each distributed component. | | [1] https://dl.acm.org/doi/10.1145/3409005 | allanrbo wrote: | For context, Kleppmann is the author of the book Designing Data- | Intensive Applications (DDIA). https://dataintensive.net/ | oa335 wrote: | I'm interested in learning how to formally express distributed | systems but have limited time. Does anyone have any insight into | which would be better to learn - TLA+/PlusCal (e.g. Hillel | Wayne's courses) or Isabelle/HOL? | ocschwar wrote: | TLA+ definitely. The lowest rung on the ladder is actually | reachable. | javcasas wrote: | Martin Kleppmann verifying distributed systems? That's as good as | it can get. | eatonphil wrote: | With a proof assistant written in Standard ML. :) | | https://isabelle.in.tum.de/ | javcasas wrote: | Yeah, a distributed system with proofs of it working under | some specific assumptions. This is the future. It's a future | hard and complicated because the tools and the expertise are | not here yet, but definitely the future. | preseinger wrote: | My perspective from industry is that we're not really suffering | for a lack of formal verification, or anything at the | specification level. Basically all of the meaningful problems | that need help and attention are with the implementations. I've | been on both sides, and I can say it's much harder to implement a | bug free distributed system for large scale production use than | to specify one. ___________________________________________________________________ (page generated 2022-10-12 23:00 UTC)