[HN Gopher] What should I know about garbage collection as a Jav... ___________________________________________________________________ What should I know about garbage collection as a Java developer? Author : saikatsg Score : 37 points Date : 2023-01-10 05:38 UTC (17 hours ago) (HTM) web link (www.azul.com) (TXT) w3m dump (www.azul.com) | turtledragonfly wrote: | One thing that that I hadn't fully understood until recently is | that garbage collectors can actually allow you to write _more | efficient_ code. | | Previously, I had the general understanding that you were trading | convenience (not thinking about memory management or dealing with | the related bugs) in exchange for performance (GC slows your | program down). | | That's still true broadly, but there's an interesting class of | algorithms where GC can give you a perf. improvement: immutable | data structures, typically used in high-concurrency situations. | | Consider a concurrent hash map: when you add a new key, the old | revision of the map is left unchanged (so other threads can keep | reading from it), and your additions create a new revision. Each | revision of the map is immutable, and your "changes" to it are | really creating new, immutable copies (with tricks, to stay | efficient). | | These data structures are great for concurrent performance, but | there's a problem: how do you know when to clean up the memory? | That is: how do you know when all users are done with the old | revisions, and they should be freed? | | Using something like a reference count adds contention to this | high-concurrency data structure, slowing it down. Threads have to | fight over updating that counter, so you have now introduced | shared mutable state which was the whole thing you were trying to | avoid. | | But if there's a GC, you don't have to think about it. And the GC | can choose a "good time" to do it's bookkeeping in bulk, rather | than making all of your concurrent accesses pay a price. So, if | done properly, it's an overall performance win. | | Interestingly, a performant solution without using GC is "hazard | pointers," which are essentially like adding a teeny tiny garbage | collector, devoted just to that datastructure (concurrent map, or | whatever). | tadfisher wrote: | Well put. I find it fascinating to watch memory-safe runtimes | converge on automatic memory management (via GC or ARC) and | owner/borrower models. I'm just not sure which I like better, | or if I'm thinking too imperatively. | bob1029 wrote: | > But if there's a GC, you don't have to think about it. And | the GC can choose a "good time" to do it's bookkeeping in bulk, | rather than making all of your concurrent accesses pay a price. | So, if done properly, it's an overall performance win. | | In many environments, you can explicitly force a GC collection | from application code. I've got a few situations where | explicitly running GC helps reduce latency/jitter, since I can | decide precisely where and how often it occurs. | | In my environment, calling GC.Collect more frequently than the | underlying runtime typically will result in the runtime-induced | collections taking less time (and occurring less frequently). | But, there is a tradeoff in that you are stopping the overall | world more frequently (i.e. every frame or simulation tick) and | theoretical max throughput drops off as a result. | | Batching is the best way to do GC, but it is sometimes | catastrophic for the UX. | mike_hearn wrote: | Yeah, but it's actually deeper than just adding refcounts. The | algorithms themselves can change in some cases. | | The issue is, the hardware can usually only do | atomic/interlocked operations at the word level. If you have a | GC then you can atomically update a pointer from one thing to | another and not think about the thing that was being pointed to | previously: an object becomes unreachable atomically due to the | guarantees provided by the GC (either via global pauses or | write barriers or both). If you don't have that then you need | to both update a pointer and a refcount atomically, which goes | beyond what the hardware can easily do without introducing | locks, but that in turn creates new problems like needing an | ordering. | zackangelo wrote: | Most JVMs take advantage of a thread local "bump" allocator[1] | as well to avoid having to cross JVM or kernel boundaries to | allocate memory, which can result in huge speedups for memory- | intensive use cases. | | [1] https://shipilev.net/jvm/anatomy-quarks/4-tlab-allocation/ | eldenring wrote: | Bump allocators are incredibly fast, and are super efficient | in generational GCs where compaction is super cheap, however | almost all (maybe all) modern languages don't usually cross | kernel boundaries when allocating memory, including C++ | malloc. | Alifatisk wrote: | I don't exactly know how but I've always connected GC-languages | with slow performance, but today, I realized how wrong I was. | mike_hearn wrote: | Performance and GC is a tricky topic, partly because there's | not so many GCd languages explicitly designed for performance | above usability (maybe D would count? _maybe_ Go?). GC is | normally chosen for usability reasons, and then the language | has other usability features that reduce performance and it | gets difficult to disentangle them. Immutability is a common | problem. GC makes allocating lots of objects easy, so people | make immutable types (e.g. Java 's String type) and that | forces you to allocate lots of objects, which causes lots of | cache misses as the young gen pointer constantly moves | forwards, and that slows everything down whereas a C++ dev | might shorten a string by just inserting a NULL byte into the | middle of it. Functional programming patterns are a common | culprit because of their emphasis on immutability. You bleed | performance in ways that don't show up on profiles because | they're smeared out all over the program. | | Another complication is that people talk about the | performance of languages, when often it's about the | performance of an implementation. The most stunning example | of this is TruffleRuby in which the GraalVM EE Ruby runtime | often runs 50x faster or more than standard Ruby. Language | design matters a lot, but how smart your runtime is matters a | lot too. | | A final problem is that many people associate GC with | scripting languages like Python, JavaScript, Ruby, PHP etc | and they often have poor or non-existent support for multi- | threading. So then it's hard to get good performance on | modern hardware of course and that gets generalized to all GC | languages. | turtledragonfly wrote: | Well, there's still truth to it in other cases, I think. One | terrible thing GCs can do is make your performance | _unpredictable_. In some performance-sensitive situations | (eg: video games), your worst-case perf is more important | than your average case. Adding a GC can mess with that worst- | case behavior, and in unpredictable ways. | | That being said, modern GCs are much better (less "stop the | world" stuff), and more configurable. But it's still a real | concern. ___________________________________________________________________ (page generated 2023-01-10 23:00 UTC)