[HN Gopher] Ruby Garbage Collection Deep Dive: Tri-Color Mark an...
       ___________________________________________________________________
        
       Ruby Garbage Collection Deep Dive: Tri-Color Mark and Sweep
        
       Author : mooreds
       Score  : 90 points
       Date   : 2021-02-18 17:32 UTC (5 hours ago)
        
 (HTM) web link (jemma.dev)
 (TXT) w3m dump (jemma.dev)
        
       | jemmaissroff wrote:
       | Author here, let me know if you have any questions / suggestions
       | for future blog posts. Planning a series around Ruby's GC,
       | definitely with future posts on incremental marking, generational
       | gc and compaction.
       | 
       | Also! Writing a book about managed GC with a focus on Ruby. I'll
       | keep posting updates on twitter @jemmaissroff or you can sign up
       | for updates at https://buttondown.email/jemmaissroff
        
         | ammanley wrote:
         | Thank you for contributing to this. Very excited to see more of
         | this.
        
       | eklitzke wrote:
       | Does rvalue mean something different here than it normally means?
       | Normally rvalues refer to temporary values.
        
         | tenderlove wrote:
         | RVALUE is the base type in Ruby's heap. It refers specifically
         | to [this struct](https://github.com/ruby/ruby/blob/7bd93293621b
         | 85a87e7e117317...). All Ruby objects are instances of an RVALUE
         | struct.
         | 
         | Hope that helps!
        
           | WJW wrote:
           | In particular in Ruby it derives from "Ruby value" rather
           | than the C/C++ rvalues that can appear on the righthand side
           | of an assignment.
        
         | snvsn wrote:
         | Yes. In the Ruby VM, an RVALUE [0] is a union (a "generic
         | type") of structs that represent all the different kinds of
         | Ruby object types and a couple of runtime types.
         | 
         | [0] https://sonots.github.io/ruby-
         | capi/db/d8e/struct_r_v_a_l_u_e...
        
         | jemmaissroff wrote:
         | The first post in the series also explains RVALUES in the
         | context of the Ruby Heap if it's helpful
         | https://jemma.dev/blog/gc-internal
        
         | dragontamer wrote:
         | Rvalue seems to be a 40-byte "Ruby Object". (If an object takes
         | up more than 40-bytes, it gets an additional allocation, and
         | the first 40-bytes go into the rvalue "slot").
        
       | dragontamer wrote:
       | The tri-color invariant is "obvious" in single-threaded stop-the-
       | world mark and sweep algorithms. There's no need to track color
       | in single-threaded stop-the-world.
       | 
       | ----------
       | 
       | The reason why you need to formalize the tri-color invariant is
       | because of __multithreading__ issues. If Thread#2 changes an
       | object from "object.blah = object2", and "object" is black
       | ("already processed" by the garbage collector), then you MUST set
       | object2 to grey or black.
       | 
       | White-objects and grey-objects do NOT need to be processed in
       | this manner. (ex: if "object" is white, then the GC will process
       | object in the future. Ditto if object is grey)
       | 
       | ----------
       | 
       | That's the real secret of tri-color mark and sweep. Your explicit
       | color-tracking allows for parallel garbage collection on
       | multicore systems.
       | 
       | Now the method to hold the tri-color invariant depends on a
       | number of decisions: maybe there's a "read barrier", or maybe
       | there's a "write barrier" (not a barrier in a multithreaded
       | sense, but that's the terminology that garbage collector
       | textbooks use).
       | 
       | ------------
       | 
       | > There is one more nuance here. As of Ruby 3.0, if auto-
       | compaction is enabled, compaction will actually happen as part of
       | the sweeping phase. A more in depth explanation of how and why
       | this happens will follow in a later post about compaction in this
       | Garbage Collection Deep Dive Series.
       | 
       | That means this is not a Mark-and-Sweep algorithm, but instead a
       | Mark-and-Compact algorithm.
       | 
       | Mark-and-Compact has more technical issues than Mark-and-Sweep.
       | The devil is in the details, so to speak. If you're moving
       | pointers around at the lowest level, you need to have either:
       | 
       | 1. A level of indirection on all reads: because the GC may "move"
       | and object while you weren't looking.
       | 
       | 2. OR, a way for the GC to set all pointers in all registers,
       | variables, and objects, to the "new correct location" while your
       | code wasn't looking. (Aka: "Stop the world" methodology)
       | 
       | Compaction is great at reducing fragmentation (which makes future
       | "malloc" more efficient... and also causes the code to use less
       | memory). But it does take more effort to compact the heap.
        
         | tenderlove wrote:
         | > The reason why you need to formalize the tri-color invariant
         | is because of __multithreading__ issues.
         | 
         | Not specifically "multithreading" issues, but rather
         | concurrency. MRI implements incremental marking and lazy
         | sweeping. The GC and the mutator execute in the same thread,
         | but they execute concurrently. The tri-color algorithm ensures
         | consistency in a single threaded, concurrently executing
         | environment. It allows us to "pause" the GC in the middle of
         | marking and allow the mutator to continue in the same thread.
         | 
         | > 2. OR, a way for the GC to set all pointers in all registers,
         | variables, and objects, to the "new correct location" while
         | your code wasn't looking. (Aka: "Stop the world" methodology)
         | 
         | We're also able to do compaction concurrently with the mutator
         | via a read barrier.
         | 
         | > But it does take more effort to compact the heap.
         | 
         | Indeed it does!
        
           | ufo wrote:
           | This is very important to remember. A lot of the time, the
           | incremental GC is still single threaded. The difference is
           | that while a two-color "stop the world" algorithm has to
           | collect everything in one long GC pause, the three-color
           | incremental algorithm can spread the work over a series of
           | smaller pauses.
        
             | treis wrote:
             | How does it handle the case where a RVALUE is added to an
             | already processed black node in between pauses?
        
               | tenderlove wrote:
               | MRI implements a write barrier. You can check out the
               | write barrier implementation here:                 https:
               | //github.com/ruby/ruby/blob/7bd93293621b85a87e7e117317612
               | bb0a84efb7a/gc.c#L7802
               | 
               | The behavior for writes on a black object is defined in
               | this function:                 https://github.com/ruby/ru
               | by/blob/7bd93293621b85a87e7e117317612bb0a84efb7a/gc.c#L77
               | 66
               | 
               | Basically when a black -> white reference is written
               | during incremental marking, the white object will be
               | grey'd and added to the mark stack.
               | 
               | Happy Thursday!
        
               | ufo wrote:
               | Alternatively, you can also move the gc "backwards",
               | changing the black object. In Lua, sometimes the write
               | barrier recolors the white child object to grey and
               | sometimes it recolors the black parent object to grey.
               | 
               | IIRC, the motivation for going backwards is that if you
               | assign a large number of white children, recoloring the
               | parent once is faster than recoloring all the children.
        
       | alberth wrote:
       | Lots of research on Tri-Color from Mike Pall of LuaJIT fame.
       | 
       | http://wiki.luajit.org/New-Garbage-Collector
        
       ___________________________________________________________________
       (page generated 2021-02-18 23:00 UTC)