[HN Gopher] A Guide to the Go Garbage Collector ___________________________________________________________________ A Guide to the Go Garbage Collector Author : ibraheemdev Score : 149 points Date : 2022-07-15 15:44 UTC (7 hours ago) (HTM) web link (tip.golang.org) (TXT) w3m dump (tip.golang.org) | [deleted] | [deleted] | cube2222 wrote: | This is a really great guide! Nice to have something official and | in-depth. | | I have two tips I can share based on my experience optimizing | OctoSQL[0]. | | First, some applications might have a fairly constant live heap | size at any given point in time, but do a lot of allocations | (like OctoSQL, where each processed record is a new allocation, | but they might be consumed by a very-slowly-growing group by). In | that case the GC threshold (which is based on the last live heap | size) can be low and result in very frequent garbage collection | runs, even though your application is using just megabytes of | memory. In that case, using debug.SetGCPercent to modify that | threshold at startup to be closer to 10x the live heap size will | yield enormous performance benefits, while sacrificing very | little memory. | | Second, even if the CPU profiler tells you the GC is consuming a | lot of time, that doesn't mean it's taking it away from your app, | if it's single-threaded. `go tool trace` can give you a much | better overview of how computationally intensive and problematic | the GC really is, even though reading it takes some getting used | to. | | [0]: https://github.com/cube2222/octosql | kccqzy wrote: | > Second, even if the CPU profiler tells you the GC is | consuming a lot of time, that doesn't mean it's taking it away | from your app | | I have experienced the same issue here. Our load balancer used | CPU usage as a proxy for deciding how much traffic should be | assigned when performing load balancing. When the app was | written in Go, we consistently found that the GC is consuming a | lot of CPU time even though all other metrics like request | latency were very good, even in the microseconds range. This | was the case even when the app was massively parallel with lots | of goroutines. But the load balancer kept sloshing traffic | around unnecessarily based on its observation that GC is | consuming a lot of CPU time. | tdudzik wrote: | > When the app was written in Go | | Did you rewrite it to something else? | kccqzy wrote: | Yes we did. But the rewrite was not because of this issue. | silisili wrote: | Out of curiosity...which language did you choose and why? | How did it turn out? | oorza wrote: | They wrote it in PHP4 and it currently routes 2/3 of the | internet. | cube2222 wrote: | That does actually sound like it could be scenario one too. | | If you have a lot of small requests, with only few requests | active at the same time, but many requests per second | overall, with each making a few allocations, you will have a | small live heap size, while quickly reaching the threshold | for another GC. | | This way you get a lot of GC runs. Latency isn't affected too | much because Go is quite good at keeping the stop-the-world's | short. You might have interleaving application/stop-the-world | in a 50/50 ratio of computation time (that's something you | can diagnose very easily with go tool trace btw). | | Having a higher GOGC threshold might help a lot there, since | it will make stop-the-world's less frequent, while keeping | their duration mostly unchanged (as that scales | proportionally to live heap size). | | That's obviously just a guess based on the limited info I | have though. | morelisp wrote: | > it will make stop-the-world's less frequent, while | keeping their duration mostly unchanged (as that scales | proportionally to live heap size). | | Go's STW phases are mostly not proportional to live heap | size; iirc one never is and the other is proportional to | something variable but only weakly correlated with heap | size (cleaning up cached spans). | | It's hard to figure out what GP is describing exactly but I | don't think GOGC alone would necessarily address that. If | latency was still good it was probably not over-triggering | STW nor dragging other threads into mark assist. | | I think they may have been hitting the pathological case | described where stack roots were not being counted so a | little allocation with a lot of stack-heavy goroutines | could get confused for a lot of allocation, trigger a CPU- | intensive mark phase, but neither clean much real memory in | absolute terms nor effectively count a larger live set | after. Prior to 1.18, GOGC might have to be set dangerously | large avoid that. | eatonphil wrote: | I'd love to read more about your experience profiling, how your | techniques work. | cube2222 wrote: | Thanks, I'll try to whip up an article about it in the not- | too-distant future. | | Though I can tell that the biggest improvement to my | profiling flow was adding a `--profile` flag to OctoSQL | itself. This way I can easily create CPU/memory/trace | profiles of whole OctoSQL command invocations, which makes | experiments and debugging on weird inputs much quicker. | omginternets wrote: | Has anyone tried "gc_details": true in VSCode? I've just gone | through the configuration steps, but I'm not seeing anything | obvious. What should I be looking for? | | EDIT: found it at the top of the file. | erik_seaberg wrote: | Hm, I was hoping for a roadmap that would talk about supporting | generations and more tuning options. | morelisp wrote: | Would generational support improve anything given a) 99% of the | nursery is probably already on the stack, and b) using | generations to inform any kind of compaction / relocation still | seems out of the question? | | `GOMEMLIMIT` described in the document is a new tuning option. ___________________________________________________________________ (page generated 2022-07-15 23:00 UTC)