[HN Gopher] Implementing a safe garbage collector in Rust
       ___________________________________________________________________
        
       Implementing a safe garbage collector in Rust
        
       Author : celeritascelery
       Score  : 70 points
       Date   : 2022-04-26 11:56 UTC (3 days ago)
        
 (HTM) web link (coredumped.dev)
 (TXT) w3m dump (coredumped.dev)
        
       | ridiculous_fish wrote:
       | I don't understand how the rooting works here. Usually GC
       | languages make the distinction between a "pointer" which is a
       | reference into the heap, and a "root" which is something known to
       | the GC that keeps a pointer live. I don't see that distinction
       | here. I think this is what the author is saying:
       | 
       | 1. You can allocate from an arena; you get an object.
       | 
       | 2. Allocated objects may be freed when a GC is triggered, unless
       | they are added as a root. It is your responsibility to add your
       | object as a root.
       | 
       | 3. Roots are in a LIFO stack and cannot be moved.
       | 
       | Ok, but we never actually see the root _type_. For example:
       | let rooted: Vec<Object<'root>> = ...; // we have a vec of root
       | objects         let object: Object<'arena> = arena.add("new"); //
       | object is bound to arena         rooted.push(object); // object
       | is now bound to rooted         arena.garbage_collect(); // Object
       | is marked as live and not freed
       | 
       | How is anything rooted here? The lifetime changed from 'arena to
       | 'root but I don't see a root being created.
       | 
       | Some other misc questions:
       | 
       |  _What if instead of storing them as a map, we store them as a
       | stack instead? When something is rooted, it is pushed on the
       | stack. When it drops, it is popped from the stack._
       | 
       | But later we see roots not obeying a LIFO order, under
       | "Preventing escapes" where roots are dynamically created and
       | destroyed in an arbitrary order.
       | 
       |  _The function takes a &mut Arena, and at the end it returns an
       | Object with the same lifetime_
       | 
       | But earlier it says "we have to make sure the object can't move."
       | I'm confused what is being returned here: an object reference, or
       | a root.
        
         | celeritascelery wrote:
         | > How is anything rooted here? The lifetime changed from 'arena
         | to 'root but I don't see a root being created.
         | 
         | In this example, the Vec has been rooted previously. So pushing
         | an object into the Vec will make it "transitively" rooted
         | (accessible from the root). You would root a struct with the
         | root_struct![1] macro, which works very similar to the root!
         | macro shown in the post.
         | 
         | However you made you realize one error; The rooted `Vec` in the
         | example you pointed is a by value type, but in the
         | implementation you can only get references to rooted structs,
         | so that example needs to be updated.
         | 
         | > But later we see roots not obeying a LIFO order, under
         | "Preventing escapes" where roots are dynamically created and
         | destroyed in an arbitrary order.
         | 
         | Objects are just a copyable wrapper around a pointer, so they
         | are not the part that has the LIFO semantics. inside the root!
         | macro[2] there is a `StackRoot` type that is the actual "root".
         | The object just borrows from that so that is has a 'root
         | lifetime and is valid post gc. The actual root struct is not
         | exposed outside of the macro.
         | 
         | I hope this makes the distinction between "roots" and "objects"
         | clearer. Objects are just pointers to heap data. When we root
         | an object we store the data it points to on the root stack and
         | create a new `StackRoot`. Then we say this object is rooted.
         | But the struct that "does the rooting" is inside the macro and
         | not exposed. Rooting a struct works similarly.
         | 
         | [1]
         | https://github.com/CeleritasCelery/rune/blob/5a616efbed763b9...
         | 
         | [2]
         | https://github.com/CeleritasCelery/rune/blob/5a616efbed763b9...
        
           | ridiculous_fish wrote:
           | The existence of the StackRoot type is definitely a missing
           | piece of the puzzle, thank you. So we have Object and
           | StackRoot. But then what is a "root object?"
           | let rooted: Vec<Object<'root>> = ...; // we have a vec of
           | root objects
           | 
           | I would expect this vector to just contain Objects?
           | 
           | I think conceptually you have two different Object types: a
           | "dangling" Object that is subject to collection, and a
           | "rooted" Object which is referenced by a root stored in the
           | RootSet. Part of my confusion is that there's no vocabulary
           | to distinguish these. In this code:
           | root!(value, gc)
           | 
           | Visually it appears to "do something" to value but in fact
           | there's two different variables, with two different types,
           | both named 'value'. It would be clearer at least to me to
           | reify that by giving them different types instead of calling
           | both Object.
           | 
           | In a moving GC this distinction becomes unavoidable: dangling
           | objects are raw pointers, while rooted objects refer to the
           | root location itself and require double indirection to
           | access.
           | 
           | Anyways it's just a suggestion, take it or leave it. It's
           | very cool how Rust's lifetimes let you enforce that no
           | collections happen while an Object dangles.
           | 
           | One last thought: because you're already on the hook for
           | passing in the GC when dereferencing an Object, you could
           | exploit this by making Object references just a 32 bit
           | offset, relative to the GC's heap. This would let you save
           | some memory.
        
       | amelius wrote:
       | This is cute but modern concurrent garbage collectors like the
       | ones used in VMs and languages such as Go are way more
       | complicated. In fact, the GC can be the most difficult aspect of
       | an entire environment including the compiler.
        
       | NeutralForest wrote:
       | Very cool project! What motivated you to start a Rust VM for
       | Emacs?
        
         | celeritascelery wrote:
         | I talk little bit about that in my introductory post[1].
         | Basically my interest in Emacs, Rust, and interpreters came to
         | together and I decided to see what I could do. I was also
         | inspired by the remacs project[2] and I think Rust has a lot to
         | offer as dynamic language backend. I don't know where this will
         | go, but I am going to continue to work on it.
         | 
         | [1]https://coredumped.dev/2021/10/21/building-an-emacs-lisp-
         | vm-...
         | 
         | [2]https://github.com/remacs/remacs
        
           | NeutralForest wrote:
           | I use Emacs a lot so it's always a treat to learn more about
           | the internals, one way or another, have fun!
        
       | gwbas1c wrote:
       | One thing I'm trying to understand is: Is this a garbage
       | collector _for_ Rust; or is this a garbage collector _implemented
       | in_ Rust for another environment.
       | 
       | It's confusing because the author refers to their Emacs VM; but
       | then refers to Rc in Rust.
        
         | nicoburns wrote:
         | It's currently the latter, but the author believes that this
         | approach could be cleaned up and generalised to also be the
         | former. I believe it would then be a general purpose GC library
         | in Rust that could be used for Rust code or any environment
         | written in Rust.
        
         | steveklabnik wrote:
         | The latter, implemented in Rust (in my understanding).
         | 
         | They bring up Rc<T> because using reference counting as their
         | implementation is in theory a possibility, which they then
         | quickly (and rightfully, imho) dismiss.
        
           | zamalek wrote:
           | Although you probably could use it for something else, Boa[1]
           | is a good demonstration of this (it uses the GC crate, but
           | the same principle probably applies).
           | 
           | [1]: https://github.com/boa-dev/boa
        
       ___________________________________________________________________
       (page generated 2022-04-29 23:00 UTC)