[HN Gopher] A Guide to the Go Garbage Collector
       ___________________________________________________________________
        
       A Guide to the Go Garbage Collector
        
       Author : ibraheemdev
       Score  : 149 points
       Date   : 2022-07-15 15:44 UTC (7 hours ago)
        
 (HTM) web link (tip.golang.org)
 (TXT) w3m dump (tip.golang.org)
        
       | [deleted]
        
       | [deleted]
        
       | cube2222 wrote:
       | This is a really great guide! Nice to have something official and
       | in-depth.
       | 
       | I have two tips I can share based on my experience optimizing
       | OctoSQL[0].
       | 
       | First, some applications might have a fairly constant live heap
       | size at any given point in time, but do a lot of allocations
       | (like OctoSQL, where each processed record is a new allocation,
       | but they might be consumed by a very-slowly-growing group by). In
       | that case the GC threshold (which is based on the last live heap
       | size) can be low and result in very frequent garbage collection
       | runs, even though your application is using just megabytes of
       | memory. In that case, using debug.SetGCPercent to modify that
       | threshold at startup to be closer to 10x the live heap size will
       | yield enormous performance benefits, while sacrificing very
       | little memory.
       | 
       | Second, even if the CPU profiler tells you the GC is consuming a
       | lot of time, that doesn't mean it's taking it away from your app,
       | if it's single-threaded. `go tool trace` can give you a much
       | better overview of how computationally intensive and problematic
       | the GC really is, even though reading it takes some getting used
       | to.
       | 
       | [0]: https://github.com/cube2222/octosql
        
         | kccqzy wrote:
         | > Second, even if the CPU profiler tells you the GC is
         | consuming a lot of time, that doesn't mean it's taking it away
         | from your app
         | 
         | I have experienced the same issue here. Our load balancer used
         | CPU usage as a proxy for deciding how much traffic should be
         | assigned when performing load balancing. When the app was
         | written in Go, we consistently found that the GC is consuming a
         | lot of CPU time even though all other metrics like request
         | latency were very good, even in the microseconds range. This
         | was the case even when the app was massively parallel with lots
         | of goroutines. But the load balancer kept sloshing traffic
         | around unnecessarily based on its observation that GC is
         | consuming a lot of CPU time.
        
           | tdudzik wrote:
           | > When the app was written in Go
           | 
           | Did you rewrite it to something else?
        
             | kccqzy wrote:
             | Yes we did. But the rewrite was not because of this issue.
        
               | silisili wrote:
               | Out of curiosity...which language did you choose and why?
               | How did it turn out?
        
               | oorza wrote:
               | They wrote it in PHP4 and it currently routes 2/3 of the
               | internet.
        
           | cube2222 wrote:
           | That does actually sound like it could be scenario one too.
           | 
           | If you have a lot of small requests, with only few requests
           | active at the same time, but many requests per second
           | overall, with each making a few allocations, you will have a
           | small live heap size, while quickly reaching the threshold
           | for another GC.
           | 
           | This way you get a lot of GC runs. Latency isn't affected too
           | much because Go is quite good at keeping the stop-the-world's
           | short. You might have interleaving application/stop-the-world
           | in a 50/50 ratio of computation time (that's something you
           | can diagnose very easily with go tool trace btw).
           | 
           | Having a higher GOGC threshold might help a lot there, since
           | it will make stop-the-world's less frequent, while keeping
           | their duration mostly unchanged (as that scales
           | proportionally to live heap size).
           | 
           | That's obviously just a guess based on the limited info I
           | have though.
        
             | morelisp wrote:
             | > it will make stop-the-world's less frequent, while
             | keeping their duration mostly unchanged (as that scales
             | proportionally to live heap size).
             | 
             | Go's STW phases are mostly not proportional to live heap
             | size; iirc one never is and the other is proportional to
             | something variable but only weakly correlated with heap
             | size (cleaning up cached spans).
             | 
             | It's hard to figure out what GP is describing exactly but I
             | don't think GOGC alone would necessarily address that. If
             | latency was still good it was probably not over-triggering
             | STW nor dragging other threads into mark assist.
             | 
             | I think they may have been hitting the pathological case
             | described where stack roots were not being counted so a
             | little allocation with a lot of stack-heavy goroutines
             | could get confused for a lot of allocation, trigger a CPU-
             | intensive mark phase, but neither clean much real memory in
             | absolute terms nor effectively count a larger live set
             | after. Prior to 1.18, GOGC might have to be set dangerously
             | large avoid that.
        
         | eatonphil wrote:
         | I'd love to read more about your experience profiling, how your
         | techniques work.
        
           | cube2222 wrote:
           | Thanks, I'll try to whip up an article about it in the not-
           | too-distant future.
           | 
           | Though I can tell that the biggest improvement to my
           | profiling flow was adding a `--profile` flag to OctoSQL
           | itself. This way I can easily create CPU/memory/trace
           | profiles of whole OctoSQL command invocations, which makes
           | experiments and debugging on weird inputs much quicker.
        
       | omginternets wrote:
       | Has anyone tried "gc_details": true in VSCode? I've just gone
       | through the configuration steps, but I'm not seeing anything
       | obvious. What should I be looking for?
       | 
       | EDIT: found it at the top of the file.
        
       | erik_seaberg wrote:
       | Hm, I was hoping for a roadmap that would talk about supporting
       | generations and more tuning options.
        
         | morelisp wrote:
         | Would generational support improve anything given a) 99% of the
         | nursery is probably already on the stack, and b) using
         | generations to inform any kind of compaction / relocation still
         | seems out of the question?
         | 
         | `GOMEMLIMIT` described in the document is a new tuning option.
        
       ___________________________________________________________________
       (page generated 2022-07-15 23:00 UTC)