[HN Gopher] Emitting Safer Rust with C2Rust
       ___________________________________________________________________
        
       Emitting Safer Rust with C2Rust
        
       Author : dtolnay
       Score  : 96 points
       Date   : 2023-03-14 05:32 UTC (1 days ago)
        
 (HTM) web link (immunant.com)
 (TXT) w3m dump (immunant.com)
        
       | Animats wrote:
       | DARPA is funding this. Good.
       | 
       | They haven't reached inter-procedural static analysis yet, which
       | means they can't solve the big problem: how big is an array? Most
       | of the troubles in C come from that. Whoever creates the array
       | knows how big it is. Everybody else is guessing.
       | 
       | A bit of machine learning might help here. If you see
       | void dosomethingwitharray(int arr[], size_t n) {}
       | 
       | a good conjecture is that _n_ is the length of _arr_. So, the
       | question is, if this is translated to                   fn
       | dosomethingwitharray(arr: &[i64]) {}
       | 
       | does it break anything? Both caller and callee have to be
       | analyzed. The C caller has the constraint
       | assert_eq!(arr.len(), n);
       | 
       | That's a proof goal. If a simple SMT-type prover can prove that
       | true., then the call can be simplified to just use an ordinary
       | Rust slice. If not, conversion to Rust has to drop to those ugly
       | C pointer forms, preferably with a comment inserted. So you need
       | something that makes good guesses, which is a large language
       | model kind of thing, and something which checks them, which is a
       | formalism kind of thing.
       | 
       | The process can be assisted by putting asserts in the original C,
       | as checks on the C and hints to the conversion process. That's
       | probably the cleanest way to provide human assistance.
       | 
       | I've wanted this for conversion of OpenJPEG code to Rust. That's
       | a tangle of code doing wavelet transforms, with long blocks of
       | touchy subscripting and arithmetic, plus encoders and decoders
       | for an overly complex binary format containing offsets and
       | lengths. Someone recently ran it through c2rust. The unsafe Rust
       | code works. It's compatible with the original C - it segfaults
       | for the same test cases which cause the C code to segfault. This
       | is why a naive transpiler isn't too helpful.
       | 
       | (The date at the bottom of the article is 2022-06-13. Has there
       | been further progress?)
        
         | meepmorp wrote:
         | > The date at the bottom of the article is 2022-06-13. Has
         | there been further progress?
         | 
         | The article links to their github repo:
         | 
         | https://github.com/immunant/c2rust
         | 
         | There's commits in the last hour, so at least some signal of
         | life.
        
       | mtlmtlmtlmtl wrote:
       | Has anyone put this to serious use? I played around with it at
       | some point when it was fairly new and at that time I was able to
       | transpile the C into Rust just fine, but that didn't help me
       | much. The idea was to be able to use the Rust toolchain to better
       | understand the code, but the resulting Rust code was even less
       | understandable, and also much harder to refactor. In this case I
       | wasn't attempting a rewrite per se, just trying to understand a C
       | codebase plagued with memory safety issues. Quickly gave up on
       | this avenue at that point and just started carefully refactoring
       | the C to make the bugs easier to shake out.
       | 
       | Would love to see a technical write up of someone outside
       | Immunant using this on a real world codebase for whatever
       | purpose.
        
       | diego_moita wrote:
       | I am very curious to see how this transpiler problems will be
       | handled by gpt4 in the upcoming months.
        
       | boredumb wrote:
       | C2rust is really cool, but if you're familiar with writing rust
       | and implement even a trivial C function in there it produces
       | something absolutely terrifying. I really enjoy rust and pray I
       | don't find myself working in a code base someone just ran c2rust
       | against.
        
         | FridgeSeal wrote:
         | Isn't the point to generate _semantically_ equivalent Rust code
         | from C, so that you can just get it re-compiling under Rust,
         | and then from there you have a working base from which to start
         | rewriting into safer Rust?
        
           | masklinn wrote:
           | Yes, it's literally spelled out in TFA:
           | 
           | > this provides a starting point for manual refactoring into
           | idiomatic and safe Rust
        
       | FpUser wrote:
       | Do no know this particular tool but some automated language to
       | language transpilers I saw produce the code one would not be able
       | to comprehend never mind edit if the need comes.
        
         | masklinn wrote:
         | The goal of C2rust is not to provide a usable code base per se,
         | it's to provide a convenient base for conversion: once the
         | project is in unsafe rust it can be managed entirely via rust
         | tooling and is hopefully a lot easier to finish up than if you
         | keep having to redefine bindings as you move code from C to
         | Rust.
         | 
         | C2rust is a springboard, if you move C2rust-Ed code to
         | production you're doing it very wrong.
        
           | 0cf8612b2e1e wrote:
           | On the other hand, if I have some working C dependency which
           | I never intend to modify (owing to its complexity or
           | stability), plopping the autogenerated Rust code simplifies
           | your build step.
        
       | anticrymactic wrote:
       | What problem does c2Rust solve exactly? Isn't it just gonna
       | produce "garbage" rust.
       | 
       | Calling c directly is already possible in rust.
        
         | kelnos wrote:
         | This isn't about calling external C code from Rust; it helps
         | people "rewrite" their C code in Rust.
         | 
         | You can debate the merits of doing so, of course, but some
         | people do want to do that, and a tool to generate safe,
         | somewhat idiomatic Rust from C code would seem to be useful.
        
         | pohl wrote:
         | From c2rust.com:
         | 
         |  _The C2Rust project is being developed by Galois and Immunant.
         | This tool is able to translate most C modules into semantically
         | equivalent Rust code. These modules are intended to be compiled
         | in isolation in order to produce compatible object files. We
         | are developing several tools that help transform the initial
         | Rust sources into idiomatic Rust.
         | 
         | The translator focuses on supporting the C99 standard. C source
         | code is parsed and typechecked using clang before being
         | translated by our tool._
        
         | eptcyka wrote:
         | It helps by lowering the barrier to entry when working on
         | rewriting a codebase in rust.
        
         | masklinn wrote:
         | It moves the project directly into rust land and tooling, which
         | hopefully makes it easier to convert it without needing to set
         | up multi langage tooling and a moving barrier / interface
         | between the two langages.
        
         | dureuill wrote:
         | From reading the article, I get that the latest version can
         | transform some C into _safe_ Rust.
         | 
         | This gains us machine-proved memory safety. This is huge.
        
         | kccqzy wrote:
         | The article shows what improvements they are thinking of so
         | that it _doesn 't_ produce garbage rust. (If by garbage rust
         | you mean unsafe rust.)
        
         | hardwaregeek wrote:
         | The post does address this and shows their attempt to produce
         | higher quality Rust. I've also seen it used to move off of a C
         | toolchain and onto a pure Rust toolchain by porting C code to
         | Rust.
        
         | jandrese wrote:
         | It makes it easier to get your project on the front page of HN
         | as you can claim it is written in Rust.
        
       | hardwaregeek wrote:
       | I'm very excited at the possibilities for C2Rust! Dynamic
       | analysis to fill in the gaps of static analysis makes a lot of
       | sense. I've wanted something similar for inferring TypeScript
       | types via runtime analysis (would not be surprised if it exists
       | already).
       | 
       | I could see a really compelling use case in cross-compilation
       | where you compile your C code to Rust, then use a Rust toolchain
       | to cross compile. Or avoiding interop as well.
        
       | CharlesW wrote:
       | This seems like an interesting project to bridge the "boil the
       | ocean" approach of rewriting in Rust wholesale.
       | 
       | (For anyone else who found it slightly difficult to read, you can
       | remove the added 0.06em `letter-spacing` using your browser's
       | developer tools.)
        
       ___________________________________________________________________
       (page generated 2023-03-15 23:00 UTC)