[HN Gopher] CreuSAT: Formally verified SAT solver written in Rus... ___________________________________________________________________ CreuSAT: Formally verified SAT solver written in Rust and verified with Creusot Author : ingve Score : 220 points Date : 2022-06-17 15:57 UTC (7 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | joshlk wrote: | What are some practical applications for a SAT solver? | Jweb_Guru wrote: | Verifying programs, for one thing--CreuSAT actually calls out | to no less than _three_ different SAT solvers in the course of | automatically discharging its proof obligations :) | | More generally, SAT solving is useful whenever you have | something that can be represented as a circuit, which is an | extremely large class of problems. Indeed, SAT can be proven to | be polynomial-time equivalent to _any_ problem in NP (which you | can read as "any decision problem whose yes solution is | efficiently verifiable"), which includes many useful problems: | common examples include the traveling salesman problem, | detecting hamiltonian cycles, subgraph isomorphism | (interestingly, graph _isomorphism_ is no longer believed to be | NP-complete), graph coloring, and many others (which in turn | often have easy reductions from other extremely useful | problems). Since SAT solvers have received so much optimization | work, in practical cases these problems are very often solved | by reducing them to some variant of a satisfiability problem, | rather than trying to solve them directly. | | It is even more useful when combined with theories that can | compute special cases (like arithmetic on natural numbers) much | more efficiently, which is known as an SMT solver | (satisfiability modulo theory). Sometimes, the theories are | undecidable, which means the SMT solver is allowed to report | "don't know" or run forever in some cases (in practice, people | then massage the inputs to try to get them within the solvable | cases). This is the form in which SAT solvers are used most | often. There are a ton of applications, but an immediate one is | determining whether version constraints in a package manager | (which may include equalities, inequalities, etc.) can all be | simultaneously satisfied (and producing a satisfying | assignment). There are a _lot_ of good examples, though, which | is why SMT solvers are so ubiquitous in the literature when it | comes to "how do we do this really hard thing?" (e.g. they are | often critical components of program synthesis). | | Sorry if this wasn't that helpful--it's a very broad topic and | I'm sure many people here can give you a much better answer! | k0k0r0 wrote: | Another SAT developer here. Most prominently, SAT Solvers are | used in the design and construction of CPUs. See the EDA | Challenge of the SAT Competition 2021. | bjornsing wrote: | So can CreuSAT be used as a backend for Creusot? | | A "self-hosted" SAT-solver, in the sense that it can prove its | own correctness, that would certainly give you a speaking slot at | some conference. But also a philosophical dilemma I guess. :) | Sarek wrote: | This was asked in the Rust Zulip as well, and is actually the | original idea which this thesis evolved from (Xavier said | something along the lines of "My supervisor and I have talked | about how cool it would be to have a solver which verifies | itself"). | | The short answer is that it should be possible to write a Why3 | driver for CreuSAT which allows it to handle basic | propositional reasoning, and that one would probably need to | add some theories. | | The a bit longer answer: | | Getting it to work as a backend should probably not be all that | hard, and it should probably be possible to make it able to | prove some things as well. Getting it to verify itself would | probably be very hard. As it stands, the proof needs Alt-Ergo, | CVC4 and Z3 (all of which are really good) to pass, and some of | the obligations take a long time. A solver capable of verifying | itself would (I imagine) have to put much more emphasis on | having only simple proofs which were solvable using a minimal | amount of theories, and would probably have to run for a very | long time. I dunno, would be a fun project, and I guess one | could decide after having tried to prove for instance Friday | (the super naive solver) if it is at all worth pursuing. | netr0ute wrote: | Will there be a version of this for the College Board type of | SAT? | jerf wrote: | That's not possible; it's well known that the College Board SAT | is inconsistent: | | https://www.youtube.com/watch?v=POMX80Q11O0 | | https://www.youtube.com/watch?v=Xh5coE5wJ2I | bergenty wrote: | Those examples are mistakes the test makers made. It doesn't | make the test "inconsistent". | eutectic wrote: | I thought proof checking tools mostly negated the need for | verified sat solvers. | k0k0r0 wrote: | A former coworker of mine was in the process of publishing a | really nice paper about a framework that allows concurrent | resp. simultaneous verification of proofs generated by most | state-of-the-art SAT Solvers, which drastically reduced the | time needed to construct and then resp. simultaneously verify | UNSAT proofs. Unfortunately, he left before publishing the | paper. | | (Among other reasons he was unhappy with the academic world, | which I do understand perfectly.) Unfortunately, I have trouble | reaching him. I believe his basically finished paper would be | of so much use to some people. It's really sad. I believe his | paper really should be published in a journal or at least put | online somewhere. It's not totally groundbreaking or something, | but like hard work nobody had the determination to, yet. At | least not publicly. I feel so bad that I can not share his work | without his permission. | Jweb_Guru wrote: | Producing the proofs (especially for UNSAT) is often very time | consuming, and verifying them can be subtle (and if an | incorrect proof is produced, you still have to deal with that | case somehow...). It's not so much that they negate the need | for verified SAT solvers as that creating and maintaining a | verified SAT solver was considered much too difficult to | justify the proof effort required. If the goal can be | accomplished with an order of magnitude less effort, that | calculus changes somewhat. | zozbot234 wrote: | SAT solving is the subtle and time consuming part. A valid | SAT proof certificate can be verified efficiently. | Jweb_Guru wrote: | I'm referring to producing the certificate as the time- | consuming part, not the verification time (I disagree that | verification is not subtle, but I suppose that doesn't | matter as there are verified verifiers). For example, many | SAT solvers no longer attempt to provide "pure" resolution | proofs because they are too expensive to produce, which | restricts the kinds of techniques you see used in | competition. In fact, producing certificates efficiently is | an active research area, which wouldn't be the case if | certificate production had negligible cost, e.g. see | https://lmcs.episciences.org/9357/pdf. | xavxav wrote: | SAT comp gives you 5x the solving time to check your proof, | which is indicative that checking isn't so simple (though | it is on paper) | Sarek wrote: | Generating a proof and checking it is definitely a step up, but | it does have a few issues. | | - SAT is in NP, and UNSAT is in CoNP. This means that proofs | take a lot of space to store and a lot of time to check. Proofs | are usually run with 5x the time limit of the initial solve, | and still end up timing out. | | - Proofs are (currently) based on resolution, which means that | research on efficient solving techniques which can not be | efficiently modelled as resolution is not being prioritised. | This is touched on briefly in the first 2 talks of Randy Bryant | on: https://fm.csl.sri.com/SSFT22/#splogic. I think the talks | in general are among the better for understanding a SAT solver | like CreuSAT. | fjdh wrote: | >Proofs are (currently) based on resolution | | Proofs are actually based on a practical version of extended | resolution [1] called DRAT. Extended resolution is an | extremely strong proof system, and research on techniques | that are not efficiently modelled by resolution are in fact | being prioritised (see for instance [2]). This is not to say | that CreuSAT is not very impressive, but just a correction on | the current state of the things. | | [1]: https://en.wikipedia.org/wiki/Frege_system#Extended_Freg | e_sy... | | [2]: https://link.springer.com/chapter/10.1007/978-3-642-3136 | 5-3_... | thesz wrote: | This project uses one-way translation of Rust code into WhyML | through Creusot. This translation is not formally guaranteed to | be correct or semantics-preserving. | | Ideally, the code produced by Creusot should be translated back | to Rust by a tool from Creusot or someone else. This way it is | possible to use fixed point iteration, just like AST pretty- | printer and parser for programming language syntax. | | This is more or less an approach taken by verification team of | Buran (USSR's space shuttle) software. They took Fortran code and | compiled it into executable code. The executable code was then | manually translated back into Fortran and discrepancies analyzed. | This way they found several errors in Fortran compiler. Buran | made fully automatic landing in late 80-s. | | Contemporary verifiers should do the same automatically. | xavxav wrote: | Indeed, Creusot itself is not formally verified, but we have | verified the metatheory in our paper RustHornBelt: | https://people.mpi- | sws.org/~dreyer/papers/rusthornbelt/paper.... Actually | verifying the translation that Creusot produces would | definitely be interesting future work, but I'm not sure it's | where the best 'bang for the buck' is found right now, though I | try to ensure the underlying translation would remain simple to | formalize. | jasinjames wrote: | Fascinating anecdote. Where can I read more about this? | thesz wrote: | I don't know where you can read more. There are resources in | Russian, but buran.ru is not quite functioning right now. | | I remember that story from the time when I worked with guys | from Russian CQG team, who, in turn, worked in CQG with the | member of the Buran's software engineering team. It was 15 | years ago. | xavxav wrote: | Hi! I'm the author of Creusot, Sarek did some amazing work on | CreuSAT, he also had to teach himself formal verification and SAT | solving along the way. | | If you have any questions about CreuSAT or Creusot, I'd be happy | to answer | rch wrote: | At first glance I didn't see a license in the repo. Am I | missing it? | Sarek wrote: | Ah, I knew I forgot something! | | Thanks, I licensed it under the MIT license now:) | OscarCunningham wrote: | How does it compare to kissat in terms of speed? | newca12 wrote: | Unsurprisingly, we can see a growing interest in the Rust | ecosystem regarding formal verification. I try to keep | https://github.com/newca12/awesome-rust-formalized-reasoning up | to date. I will add CreuSAT shortly. The goal is to have an | overview of what is actually Rust powered. Long term goal is to | identify best practices and provide usefull common libraries. | hinkley wrote: | There's a whole undiscovered country of writing software that is | both mathematically sound and intuitively sound. I had a long | list of hopes and dreams in the late 90's of what I'd hoped | software would be by now. I got about half of my wishes, and some | of them were weird for me because I went from bitter struggles to | get even lip service to some of those things, to two jobs later | having them taken for granted. It's weird to brace for push back | and then have people nod and change the subject. | | This is one of the wishes I have for 2040, that we continue to | prove the ergonomics of type theory and escape analysis, and that | having succeeded there, we start normalizing other domains of | provably correct code. If you can do that for SAT solvers, maybe | we'll start writing other kinds of AI in such a manner. | danuker wrote: | There is TLA+ for verifying specifications. Here is a talk by | its creator: | | https://www.youtube.com/watch?v=-4Yp3j_jk8Q | asimpletune wrote: | I have wondered at times if the reason why we don't have those | things is because, in such an environment, 20th century | management might not work. Basically, "the curse of lisp", | although I don't mean lisp specifically in this case, but | rather languages and tooling that unleashes the power of formal | verification and, in a larger way, the power of math. | | It's conceivable that in an alternate universe an individual | engineer's productivity could be so high, because of math, that | one developer would replace a whole team from our universe, and | bugs would be extremely rare and mostly restricted to UI. From | that point of view, the theory says, it's harder to build teams | and these super charged developers are natural information | silos. Or, in other words, it feels like the tech choices made | have demonstrated a slight preference of less efficient | developers that all understand a shared problem (e.g. this team | is responsible for testing, or this team builds integrations | with APIs, or another team parses unstructured data into | structured data), over technology that would aid or formalize | many aspects of those kinds of problems and others. | | Also, in such a world, I feel that software engineering would | be much more abstract, and it probably wouldn't be approachable | to those who aren't legitimately fascinated and interested in | this stuff. Sort of like how I've never met anyone in my life | who studied Math just to get rich, even though you can become | phenomenally wealthy if you do. | | More likely though the answer is complicated and it's a mix of | historical accident, what I mentioned, and a slew of other | factors. Nonetheless, I too think it would be so cool if I | could spend my time writing programs that write programs that | write programs, all guaranteed to match how I stated the | problem's invariants. I could spend the majority of my time | really understanding the requirements and assumptions, as | opposed to actually implementing the features. | nemothekid wrote: | > _It 's conceivable that in an alternate universe an | individual engineer's productivity could be so high, because | of math, that one developer would replace a whole team from | our universe, and bugs would be extremely rare and mostly | restricted to UI._ | | I think this only applies for strictly theoretical problems | and code golf, but not in real world programming. It's likely | useful for low-level servers like compilers and parsers, but | most of the world isn't working on those problems. | | For any engineering task (not just software engineering) half | of the problem is _modeling the problem_. Most bugs I see in | systems by non-junior engineers are in this camp; the | engineer thought one thing but the world was actually | different (and this abstracts up to, where _management_ | thought one thing, but the world was different). | | For a parser, that problem is easily defined. For other | problems, the difficulty of programming just moves into | verifier. Even if your code was bug-free, it isn't the case | that your verifier is. For example lets say you are building | a search engine; how do you embed the fact that queries for | "Justin Beiber" should rank higher than "Justin Timberlake" | except for when the term "super bowl" in included? Or how | does the verifier embed the fact that local government | regulation says that queries for "Winnie the pooh" should | only return results for the kids cartoon character? Finally, | Godel's shows us every system will have useful results that | cannot be formally verified. | | On a more human level, I think verifiers have great use in | building more stable systems but I don't think we will see a | day where they make us more efficient. It's like believing | cars will lead to people staying home all day just because it | will only take 5 minutes to get to the grocery store. No, the | human mind will invent evermore complex systems on top of the | tools we have requiring even more engineers to manage them. | Jweb_Guru wrote: | In addition to the project itself being very cool and useful (you | can count the number of verified SAT solvers that can solve most | problems in modern SAT competitions on two fingers!), it's also | an impressive demonstration of its toolchain, which essentially | exploits the fact that the purely safe fragment of Rust | (augmented with a handful of types that use unsafe code like Vec, | which I believe have been proven compatible with the rest of | Creusot in another recent paper) behaves like a functional | language. This allows borrow-checked Rust code to be treated | purely functionally in safe contexts, which allows exploiting the | large amount of existing software tailored to proving purely | functional specifications, without having to worry about memory- | related side conditions, separation logic, etc. | | This has always been the hope for Rust verification tools (that | Rust would scale to these kinds of large proofs better than other | industrial imperative languages due to the restrictions of the | borrow checker), and I think this is the first real practical | indication that that is indeed the case. Both in terms of | verification time and additional verification code, it weighs in | at a small fraction of that of the only other verified solver | with comparable performance on SAT benchmarks (IsaSAT), which is | extracted using code generation from Isabelle (which is a full | proof assistant), while other attempts to produce verified SAT | solvers directly from imperative code fall far short of its | performance because they do not implement any of the difficult | optimizations (presumably due to the amount of extra proof effort | required). | | Incidentally, another paper has come out at basically the same | time making a similar observation, Aeneas: | https://arxiv.org/abs/2206.07185, which uses a resizable hash map | as its case study. Since it just came out, it does not have any | larger proof developments yet that could be used as comparison, | and it has somewhat different goals, but overall I am also | excited about it and am very hopeful for the future of | verification tools that integrate directly with Rust code like | these two, as I think they are one of the more promising avenues | towards the regular integration of verified code with more | "ordinary" projects. | asimpletune wrote: | It would be cool if the unsafe aspects of Rust that are | necessary (like Vec) could themselves be formally verified, | like when each edition of the language gets compiled or | something. I know some languages include proof tools but | they're not super popular, and Rust's compiler helps a lot. | Still, it would be really cool to _know_ , or at least know | that you can't know. LinkedList is another example. There's no | way to really write a linked list in the rust friendly way, so | you have to use some boxing. All of that is fine, it probably | works, but what if I forgot later, and changed something about | the implementation, introducing bugs around edge cases or what | not. | Jweb_Guru wrote: | This process is actually already underway, notably with the | RustBelt project: https://plv.mpi-sws.org/rustbelt/. It has | not yet achieved the kind of tight integration with the | source language that stuff like Creusot has, but everyone | would like it to move in that direction :) | | For the specific case of Vec (and a few other types--e.g. | Cell, Mutex, etc.), as I mentioned, a recent paper by xavxav | actually proves that the important parts of the unsafe | implementation of Vec _do_ actually fulfill the signatures | that are used by the CreuSAT proof. This is part of | https://people.mpi- | sws.org/~dreyer/papers/rusthornbelt/paper..., which I believe | won the distinguished paper award at this year's PLDI! | avgcorrection wrote: | Oh wow. So efforts like this can actually eventually trickle | down to more regular programming? | Jweb_Guru wrote: | Well, that's the research question! So far signs are | promising, IMO. I think this is at least the best chance | formal methods have ever had at being relatively easy to | integrate into larger codebases--Rust's type system is strong | enough to support nontrivial verification conditions, and | effectively encapsulate them, without needing to directly | deal with "exotic" logic, heavyweight encodings, pointer | obligations that need to be manually discharged, or linking | to code generated by some external language (a feature it | shares with functional programming languages, as well as | "hybrid" linear/dependently typed lanugages, like Dafny or | ATS). Unlike functional or dependently typed programming | languages, however, Rust has a lot of adoption | _independently_ of its suitability for formal methods, | meaning that ease of integration has a good chance of | benefiting projects that don 't have correctness as a | principle requirement. For example, assuming that a useful | crate can be extracted from this project, it would be quite | easy to integrate into a video game like Veloren (e.g. to | help improve some of its AI heuristics), which is about as | far from traditional high-assurance software as you can get! | | If we can make using high-assurance software as easy (or | almost as easy) as non-high-assurance software, without | sacrificing huge amounts of performance to do so, it will be | a major step in making software more reliable for everyday | programmers, rather than restricting these kinds of | techniques to critical libraries like cryptographic | primitives where correctness and performance justify pretty | much any number of proof engineering hours. This doesn't mean | everything would be verified, but with more and more verified | "building blocks," less and less engineering time would be | spent debugging third party code (which at least IME, is | usually much more painful than verifying code at the top | level). We have a long way to go to get there, but projects | like this are an excellent start. | jerf wrote: | Does the proof encompass some sort of proof of amount of RAM | consumption in it, presumably bounded by some characteristic of | the problem presented? | | Regardless, this is very impressive, and one of the first | things to raise my eyebrows w.r.t. formal proofs being possibly | useful in the real world in a long time. If safe Rust can | function as the bridge between the "practical" world and the | world of proofs, that alone would be enough to justify Rust's | existence. | Sarek wrote: | It does assume enough RAM, as it has no way of knowing how | much RAM the target machine will have. It is thus correct | with regards to the model specified by Rust, but a program | which is proven correct in Creusot may get an OOM-error. A | note here is that it will not provide the wrong answer due to | insufficient memory, but it will crash. | | Proving some upper bound on the amount of memory used (and | potentially gracefully exiting) should be possible, but I | have not prioritised it, as OOM has not been an issue. | jerf wrote: | "has no way of knowing how much RAM the target machine will | have" | | That's why I phrased it in terms of a bound based on some | characteristic of the problem. | | But I was just asking for information. I know it's a well- | known problem. To be honest the fact that we're even | seriously discussing this as something that may be doable | is a huge step forward. AIUI, previous state of the art on | real languages would have this as hopelessly pie-in-the- | sky. | | I think in many cases memory bound is uninteresting; one | could examine a vector implementation and be satisfied that | it won't explode unnecessarily. After all, we do exactly | that on non-proved implementations without too much fear | all the time. SAT just came to mind as a problem where that | could be an interesting thing to be able to prove. | Jweb_Guru wrote: | No, it does not. This is mentioned as a limitation in the | paper (it does, however, handle things like integer overflows | correctly). I believe with some suitable extensions, it would | be possible to prove invariants like this as well, but IMO | functional correctness is a pretty good start! | riedel wrote: | This discussion is also interesting wrt the rust verification: | https://news.ycombinator.com/item?id=30174827 . Creusot author | did comment on it. | | Seems like there finally will be alternatives to Ada SPARK. | ithkuil wrote: | > on two fingers | | In which base? Can I use my knuckles? | Jweb_Guru wrote: | Base 1, no knuckles required :) | andi999 wrote: | Why are verified SAT solvers interesting? | Jweb_Guru wrote: | SAT solvers often form (part of) the backbone of other proof | assistants and verified software and hardware, which makes | them producing correct output quite critical. At the same | time, production SAT solvers (the ones people actually use on | the critical path of these proof assistants) are very | performance-sensitive, which means tons of optimizations that | are nonobvious, subtle, and in some cases not obviously | memory safe. This makes them ideal candidates for formal | verification. | | Unfortunately, in the past, the proof effort required to | produce a competitive SAT solver was so high that this was | considered infeasible, so instead the industry has been | focused on non-verified SAT solvers that produce efficiently | verifiable _certificates_ for their solution that can be | checked by an independent verifier. This was an important | improvement that has allowed SAT solvers to continue to | evolve, but (as mentioned downthread) they have disadvantages | compared to a solver that can be fully trusted without | producing a certificate, including in "forcing" people to | use techniques that produce proofs which we know how to | efficiently verify. | | What this project promises is an order of magnitude reduction | in proof effort to achieve and maintain an extensible, | competitive, verified solver, which may help change the above | calculus to favor verified solvers. IMO, it is quite an | exciting development, even if it does not pan out! | jhgb wrote: | Maybe I'm missing something blindingly obvious, but why is | "trusting" a SAT solver necessary in the first place? The | solution itself either satisfies the constraints or it does | not. Are there situations where we can't even tell whether | the solution is correct by looking at the problem instance? | Jweb_Guru wrote: | For positive solutions, you are correct--verifying the | solution is incredibly easy. For negative solutions | (UNSAT), however, this is not the case--UNSAT is in Co- | NP, not NP. Finding an efficiently verifiable | representation of the UNSAT proof can be very | challenging, which is part of why solvers are given much | more time to produce and verify certificates than they | are to solve the problem in the first place. | jhgb wrote: | Ahh, I see now what you mean. Thanks, that makes much | more sense to me now. | dwheeler wrote: | Right, unsat is the interesting case where you especially | want a proof of the SAT solver. | | Unsat can be used, in turn, to prove other things. If you | want to prove that some condition C is always true in a | set of expressions, just ask the SAT solver to solve for | "not C". If the answer is UNSAT, then C is always true... | if the answer UNSAT is actually correct. | andi999 wrote: | Thanks. Do you have a link or reference to more material on | the certificates. I dont understand a bit, but I want to | learn more. | Jweb_Guru wrote: | I will just steal from the paper's author (Sarek) and | point you to the first two talks and lecture notes by | Randy Bryant at this link: | https://fm.csl.sri.com/SSFT22/#splogic. | Animats wrote: | Nice. ___________________________________________________________________ (page generated 2022-06-17 23:00 UTC)