[HN Gopher] Ruby Garbage Collection Deep Dive: Tri-Color Mark an... ___________________________________________________________________ Ruby Garbage Collection Deep Dive: Tri-Color Mark and Sweep Author : mooreds Score : 90 points Date : 2021-02-18 17:32 UTC (5 hours ago) (HTM) web link (jemma.dev) (TXT) w3m dump (jemma.dev) | jemmaissroff wrote: | Author here, let me know if you have any questions / suggestions | for future blog posts. Planning a series around Ruby's GC, | definitely with future posts on incremental marking, generational | gc and compaction. | | Also! Writing a book about managed GC with a focus on Ruby. I'll | keep posting updates on twitter @jemmaissroff or you can sign up | for updates at https://buttondown.email/jemmaissroff | ammanley wrote: | Thank you for contributing to this. Very excited to see more of | this. | eklitzke wrote: | Does rvalue mean something different here than it normally means? | Normally rvalues refer to temporary values. | tenderlove wrote: | RVALUE is the base type in Ruby's heap. It refers specifically | to [this struct](https://github.com/ruby/ruby/blob/7bd93293621b | 85a87e7e117317...). All Ruby objects are instances of an RVALUE | struct. | | Hope that helps! | WJW wrote: | In particular in Ruby it derives from "Ruby value" rather | than the C/C++ rvalues that can appear on the righthand side | of an assignment. | snvsn wrote: | Yes. In the Ruby VM, an RVALUE [0] is a union (a "generic | type") of structs that represent all the different kinds of | Ruby object types and a couple of runtime types. | | [0] https://sonots.github.io/ruby- | capi/db/d8e/struct_r_v_a_l_u_e... | jemmaissroff wrote: | The first post in the series also explains RVALUES in the | context of the Ruby Heap if it's helpful | https://jemma.dev/blog/gc-internal | dragontamer wrote: | Rvalue seems to be a 40-byte "Ruby Object". (If an object takes | up more than 40-bytes, it gets an additional allocation, and | the first 40-bytes go into the rvalue "slot"). | dragontamer wrote: | The tri-color invariant is "obvious" in single-threaded stop-the- | world mark and sweep algorithms. There's no need to track color | in single-threaded stop-the-world. | | ---------- | | The reason why you need to formalize the tri-color invariant is | because of __multithreading__ issues. If Thread#2 changes an | object from "object.blah = object2", and "object" is black | ("already processed" by the garbage collector), then you MUST set | object2 to grey or black. | | White-objects and grey-objects do NOT need to be processed in | this manner. (ex: if "object" is white, then the GC will process | object in the future. Ditto if object is grey) | | ---------- | | That's the real secret of tri-color mark and sweep. Your explicit | color-tracking allows for parallel garbage collection on | multicore systems. | | Now the method to hold the tri-color invariant depends on a | number of decisions: maybe there's a "read barrier", or maybe | there's a "write barrier" (not a barrier in a multithreaded | sense, but that's the terminology that garbage collector | textbooks use). | | ------------ | | > There is one more nuance here. As of Ruby 3.0, if auto- | compaction is enabled, compaction will actually happen as part of | the sweeping phase. A more in depth explanation of how and why | this happens will follow in a later post about compaction in this | Garbage Collection Deep Dive Series. | | That means this is not a Mark-and-Sweep algorithm, but instead a | Mark-and-Compact algorithm. | | Mark-and-Compact has more technical issues than Mark-and-Sweep. | The devil is in the details, so to speak. If you're moving | pointers around at the lowest level, you need to have either: | | 1. A level of indirection on all reads: because the GC may "move" | and object while you weren't looking. | | 2. OR, a way for the GC to set all pointers in all registers, | variables, and objects, to the "new correct location" while your | code wasn't looking. (Aka: "Stop the world" methodology) | | Compaction is great at reducing fragmentation (which makes future | "malloc" more efficient... and also causes the code to use less | memory). But it does take more effort to compact the heap. | tenderlove wrote: | > The reason why you need to formalize the tri-color invariant | is because of __multithreading__ issues. | | Not specifically "multithreading" issues, but rather | concurrency. MRI implements incremental marking and lazy | sweeping. The GC and the mutator execute in the same thread, | but they execute concurrently. The tri-color algorithm ensures | consistency in a single threaded, concurrently executing | environment. It allows us to "pause" the GC in the middle of | marking and allow the mutator to continue in the same thread. | | > 2. OR, a way for the GC to set all pointers in all registers, | variables, and objects, to the "new correct location" while | your code wasn't looking. (Aka: "Stop the world" methodology) | | We're also able to do compaction concurrently with the mutator | via a read barrier. | | > But it does take more effort to compact the heap. | | Indeed it does! | ufo wrote: | This is very important to remember. A lot of the time, the | incremental GC is still single threaded. The difference is | that while a two-color "stop the world" algorithm has to | collect everything in one long GC pause, the three-color | incremental algorithm can spread the work over a series of | smaller pauses. | treis wrote: | How does it handle the case where a RVALUE is added to an | already processed black node in between pauses? | tenderlove wrote: | MRI implements a write barrier. You can check out the | write barrier implementation here: https: | //github.com/ruby/ruby/blob/7bd93293621b85a87e7e117317612 | bb0a84efb7a/gc.c#L7802 | | The behavior for writes on a black object is defined in | this function: https://github.com/ruby/ru | by/blob/7bd93293621b85a87e7e117317612bb0a84efb7a/gc.c#L77 | 66 | | Basically when a black -> white reference is written | during incremental marking, the white object will be | grey'd and added to the mark stack. | | Happy Thursday! | ufo wrote: | Alternatively, you can also move the gc "backwards", | changing the black object. In Lua, sometimes the write | barrier recolors the white child object to grey and | sometimes it recolors the black parent object to grey. | | IIRC, the motivation for going backwards is that if you | assign a large number of white children, recoloring the | parent once is faster than recoloring all the children. | alberth wrote: | Lots of research on Tri-Color from Mike Pall of LuaJIT fame. | | http://wiki.luajit.org/New-Garbage-Collector ___________________________________________________________________ (page generated 2021-02-18 23:00 UTC)