[HN Gopher] Reference count, don't garbage collect ___________________________________________________________________ Reference count, don't garbage collect Author : kcl Score : 57 points Date : 2022-07-29 13:29 UTC (9 hours ago) (HTM) web link (kevinlawler.com) (TXT) w3m dump (kevinlawler.com) | assbuttbuttass wrote: | In practice, I don't see any reference counting approaches that | do cycle detection. | | I have an example from early in my career where I accidentally | created a memory leak in Python from a cyclic reference between a | closure and a function argument | | https://stackoverflow.com/questions/54726363/python-not-dele... | yyyk wrote: | Reference counting _is_ garbage collection, just a different | strategy - and all these strategies tend to blur to the same | methods eventually, eventually offering a latency-optimized GC or | a throughput-optimized GC. | | Swift is inferior here because it uses reference counting GC | without much work towards mitigating its drawbacks like cycles | (judging by some recent posts, some of its fans apparently aren't | even aware RC has drawbacks), while more established GC languages | had much more time to mitigate their GC drawbacks - e.g. Java's | ZGC mitigates latency by being concurrent. | habibur wrote: | It's interesting how we have come full circle from "Reference | counting is the worst of two worlds [manual and GC] and will | always be slower" to now "Well, we all know it's actually | faster." in like 10 years. | pclmulqdq wrote: | Its actually usually slower than both manual memory management | and GC. It's only coming back now because people are finally | learning how to make memory allocations large and rare. | | This blog post is an answer to: "Tell me you haven't learned | about cache coherence without telling me you haven't learned | about cache coherence." | rwmj wrote: | Hiring would certainly be a lot easier if more people were to | make bold, completely wrong blog postings like these. I could | immediately give my negative recommendation without the time | and hassle of a phone interview. | pclmulqdq wrote: | I completely agree, but before we scare people away from | blogging too much, I will say that my big problem with this | post isn't the lack of knowledge, it's the willful | ignorance and lack of humility. It's clear that the author | doesn't really understand the position they are trying to | argue against. | [deleted] | hinkley wrote: | Has anyone done a good paper on how memory bank affinity for | processors affects these costs? | OskarS wrote: | > Its actually usually slower than both manual memory | management and GC | | [citation needed] | | You and the blog post are arguing opposite things, and | neither of you have shown any evidence. I get that you're | arguing that reference counted objects are bigger (to store | the reference count) and/or might use double indirection | (depending on implementation), which are both bad for caches. | It's not a bad argument. But the counter-argument that the | blog posts makes is persuasive as well: it's expensive | running a GC that scans the heap looking for loose objects, | and reference counting does not need to do that. GC is also | "stop-the-world" as well unpredictable and jittery in a way | reference counting is not. | | My instinct is that reference counting is actually faster | (which matches my personal experience), but really, this is | not an argument you can solve by arguing in the abstract, you | need actual data and benchmarks. | marginalia_nu wrote: | Eh, the GC you are describing sounds like something out of | 1994. Modern concurrent algorithms do not need to stop | execution. This type of code may paradoxically be slower in | benchmarks when there isn't any memory churn, as it | necessarily introduces barrier instructions at key points | to ensure the GC has a consistent view of the memory | (although memory consistency is also a concern for | reference counting). | adgjlsfhk1 wrote: | Note that if you can afford stop the world pauses, GCs | that use them are much more efficient. Parallel stop the | world garbage collectors are by far the best for | minimizing runtime, while concurrent GCs have higher | overhead, but can guarantee no significant pauses. | marginalia_nu wrote: | Yeah, that's true, but it shouldn't be held against GC | that it interrupts the execution, since it really doesn't | have to. | adgjlsfhk1 wrote: | The blog post is largely incoherent for several reasons. | | 1. It recommends 8 byte counters. This implies making every | object in your programming language 8 bytes bigger which is | pretty much a non-starter. Almost everyone that actually | uses reference counting uses more like 2-4 bits. | | 2. Reference counting can have arbitrarily long pauses as | well. If you decrease the reference count of an object to | zero, you have to decrease the reference count of all | referenced objects, which can do significant amounts of | work (specifically, it will do a lot of work in the cases | where regular GC does almost no work). | | 3. The blog states that atomic writes are "basically free", | but that ignores the fact that in multi-threaded code, | atomic writes can actually be fairly expensive since each | one requires communication between every thread (This is | why python still has a GIL) | | 4. Because no one uses 8 bytes for the count (since you | don't want to double your memory consumption), you still | need a GC anyway to deal with objects that get too many | references. | OskarS wrote: | > Almost everyone that actually uses reference counting | uses more like 2-4 bits. | | Here is an incomplete list of languages which use 8 bytes | for their reference count on 64-bit computers: | | 1. Rust, in Rc<T> and Arc<T> | | 2. C++, in std::shared_ptr<T> | | 3. Objective-C, in NSObject's retainCount | | 4. Swift, because of Objective-C | | 5. Python, where reference count is Py_ssize_t | | These were literally the first languages i thought of, | they all use 64-bit types. I would argue since reference | counting is much rarer than GC, these make up the bulk of | reference counting use in the real world. "Almost | everyone", huh? It's a bit rich accusing the author of | being "almost incoherent" and then say this. | | > Reference counting can have arbitrarily long pauses as | well. | | _Deallocating_ can take arbitrarily long, but | deallocation does not stop-the-world. It stops the | current thread, which is a huge difference. Malloc can | take arbitrarily long as well, it 's not like it's wait- | free. | | In addition, the GC pauses in regular programming | languages are frequently orders of magnitude longer than | deallocation. You would have to deallocate an enormous | tree of object at the root for this to be an issue. And | GCs have to do that as well, in addition to their regular | stop-the-world pauses. This argument is just irrelevant. | | > The blog states that atomic writes are "basically | free", but that ignores the fact that in multi-threaded | code, atomic writes can actually be fairly expensive | since each one requires communication between every | thread (This is why python still has a GIL) | | First off, this is not why Python has a GIL, but lets | leave that aside. | | Atomic writes are more expensive than non-atomic ones, | but they are not slow operations in the grand scheme of | things. If you properly implement acquire-release | semantics, they are not even that slow under high | contention. Compare this to a GC which literally STOPS | ALL THREADS, it's nothing. | | > you still need a GC anyway to deal with objects that | get too many references. | | This is just silly. In languages which has both reference | counting and traditional garbage collection (e.g. | Python), they do it to avoid reference cycles, not | because objects get "too many references". That is a | ridiculous statement. | | In fact! I just realized we do have data for which is | more performant: this article describes how Instagram | turned of GC for Python and just relied on reference | counting. They gained 10% increase in performance: | | https://instagram-engineering.com/dismissing-python- | garbage-... | | I think it's still an open question if reference counting | is always faster than GC, but I do not believe you have | the technical expertise to evaluate such a claim. Your | comment is four paragraphs long and riddled with factual | errors. If you want to be convincing, show data that is | better than that Instagram case study. | kaba0 wrote: | > Deallocating can take arbitrarily long, but | deallocation does not stop-the-world | | And modern GCs only have to stop the current thread to | check for which thread-local objects are alive. The alice | ones are moved to another generation, making the space | reusable _for free_. | | And atomic writes need synchronization which is crazy | expensive, I honestly don't get why you think it isn't. | | Also, just try writing rust/c++ code that relies entirely | on RC vs Java in an object heavy workload - I really | don't think it is an open question in any shape or form. | OskarS wrote: | > The alice ones are moved to another generation, making | the space reusable for free. | | It's pretty hilarious to me that you first say "they have | to move it to another generation" and then you say "it's | free!" It's like saying "I paid for my dinner, and now I | get to eat it for free!" | | Also: what do you think `free()` does? All modern memory | allocators do this trick, keeping thread-local caches. | This is not an advantage of GCs when reference counting | does it as well. | | Almost all modern GCs are stop-the-world in at least | phases, and for good reason: stop-the-world GCs are | higher performance. You pay in other ways for skipping | that stop. | | > And atomic writes need synchronization which is crazy | expensive, I honestly don't get why you think it isn't. | | Because I've actually benchmarked it: https://quick- | bench.com/q/ISEetAHOohv-GaEuYR-7MajJgTc | | 18.5 nanoseconds fits under no reasonable definition of | "crazy expensive", not when a regular increment clocks in | at 5.9 nanoseconds. And there is extremely few situations | where you increment a reference count more than, like, 5 | times. It's just not an issue. | | This is like cargo cult programming: you've been told | these things and never tested them in the real world, and | you have all these preconceived notions that just don't | stand up to two minutes of scrutiny. | | > Also, just try writing rust/c++ code that relies | entirely on RC vs Java in an object heavy workload - I | really don't think it is an open question in any shape or | form. | | Yes, _of course_ garbage collectors are easier to use | than reference counting. Nobody has ever disputed this. | That is the whole raison d 'etre of garbage collectors. | This is not what the discussion is about, it's about | performance. | | I'm done with this thread now, unless anybody can show me | any actual _data_ to back anything you say up. It 's | really tiring. I started this by saying "I don't actually | know", and everyone replying to me has been so darn | certain of everything they say while being outright | incorrect about most things, and without any actual data | to back up the rest. | hayley-patton wrote: | You'd get very different results doing atomic operations | on counters shared between multiple threads, with other | stuff to do that causes your CPU to serialise and run | less out-of-order. That single-threaded benchmark is | data, but it doesn't appear awfully relevant to how naive | RC barriers perform versus barriers for coalesced RC or | tracing. | | Indeed the issue with measuring barriers is that | measuring the barrier doesn't suffice; one wants to | measure how the barrier affects the rest of execution. | This entails coming up with programs that are much less | trivial than repeatedly incrementing a counter. | barsonme wrote: | Is your benchmark running concurrently? Or single- | threaded? | kaba0 wrote: | Moving to another generation can be done completely | asyncronously on another thread that likely doesn't do | any useful work on a modern, highly parallel hardware. | `free` doesn't do much, but `malloc` does -- with the | method I am talking about (TLAB in the JVM), you get as | fast allocations as it gets, it's nothing more than a | NON-ATOMIC pointer bump. Meanwhile malloc has to find an | empty space that can fit the object at hand. | | > > Also, just try writing rust/c++ code that relies | entirely on RC vs Java in an object heavy workload - I | really don't think it is an open question in any shape or | form. > Yes, of course garbage collectors are easier to | use than reference counting. Nobody has ever disputed | this. That is the whole raison d'etre of garbage | collectors. This is not what the discussion is about, | it's about performance. | | I am talking about performance exactly. Java's GC will | smoke the hell out of C++'s shared pointers and Rust's | (A)RC. Noone said anything about productivity/ease of | usage. | | And as mentioned by another commenter - your benchmark | didn't take into account anything related to _parallel_ | execution, which would be the point. | pclmulqdq wrote: | The blog post about this is pretty much incoherent, and | comes from a bad understanding of performance, atomic, GC | algorithms, and reference counting. | | The ONLY reason to reference count is when you need GC-like | behavior on a few objects, but do not want to impose a GC | on all objects. It is a very valuable tool in performance | code not because it is fast, but because it allows you to | make other things fast. Suggesting that you should | reference count the world is patently ridiculous. This is | the reason for Rust's Arc and C++ shared_ptr, not that they | are faster than a GC. | | The blog post completely brushes aside the costs of | _atomic_ counter increments and decrements, calling them | "just increments." The "atomic" is the key performance | problem, not the increment. | | Reference counting also makes objects larger so small | objects cannot fit inside a machine register. | | Modern GC algorithms are very efficient. They do not need | to pause the world, and they do not need to be jittery. | However, someone who doesn't understand how expensive | atomic ops are probably wouldn't understand this either. | kaba0 wrote: | With all due respect, why do you believe that your | experience is meaningful compared to people writing actual | industrial-scale runtimes, when RC is the hello world of GC | algorithms? It's not that they don't know about it, it's | that they studied the topic countless times and found | significant difference between the two approaches, to the | point that no language considered fast uses RC (or if they | do, they do so because it is a useful escape hatch to | manual memory management that doesn't need support from the | runtime). | flohofwoe wrote: | Except that "refcounting is faster than a GC" is mostly a myth, | both are equally bad if predictable performance matters. | klodolph wrote: | Looking at metrics from Go's garbage collector, what else do | you want? GC pauses are damn low, and you'll see numbers in | the sub 500ms range. | | If I needed hard-realtime, I would avoid allocation entirely. | dpryden wrote: | This article is naive to the point of being flat-out wrong, since | it makes extremely naive assumptions about how a garbage | collector works. This is basically another C++-centric programmer | saying that smart pointers work better than the Boehm GC -- which | is completely true but also completely misleading. | | I'm not saying that GC is _always_ the best choice, but this | article gets the most important argument wrong: | | > 1. Updating reference counts is quite expensive. > > No, it | isn't. It's an atomic increment, perhaps with overflow checks for | small integer widths. This is about as minimal as you can get | short of nothing at all. | | Yes, it is. Even an atomic increment is a _write_ to memory. That | is not "about as minimal as you can get short of nothing at | all". | | Additionally, every modern GC does generational collection, so | for the vast majority of objects, _the GC literally does "nothing | at all"_. No matter how little work it does, a RC solution has to | do O(garbage) work, while a copying GC can do O(not garbage) | work. | | Now, that's not to say that GC is automatically better. There are | trade-offs here. It depends on the workload, the amount of | garbage being created, and the ratio of read to write operations. | | The article says: | | > I've already stated I'm not going to do benchmarks. I am aware | of two orgs who've already run extensive and far-reaching | experiments on this: Apple, for use in their mobile phones, and | the Python project. | | I can counterpoint that anecdata: Google extensively uses Java in | high-performance systems, and _invented a new GC-only language_ | (Go) as a replacement for (their uses of) Python. | | The right answer is to do benchmarks. Or even better yet, don't | worry about this and just write your code! Outside of a | vanishingly small number of specialized use cases, by the time GC | vs RC becomes relevant in any meaningful way to your performance, | you've already succeeded, and now you're dealing with scaling | effects. | dfox wrote: | One great advantage of garbage collection is that it removes need | for thread synchronization in cases where is it only needed to | make sure that object jou are going to call free/decref on is not | in use in another thread. Corollaly to that GC is the thing that | you need for many lock-free data structures to be practical and | not of only theoretical interest. | | It might seem that it is simply about pushing your | synchronizations problems onto the GC, but the synchronization | issue that GC solves internally is different and usually more | coarse-grained, so in the end you have significantly smaller | synchronization overhead. | byefruit wrote: | This article provides very little evidence for it's claims and | seems to only have a superficial understanding of modern GCs. | | "Increments and decrements happen once and at a predictable time. | The GC is running all the time and traversing the universe of GC | objects. Probably with bad locality, polluting the cache, etc." | | This is only the case with a mark-sweep collector, usually most | of your allocations die young in the nursery. With reference | counting you pay the counting cost for everything. | | "In object-oriented languages, where you can literally have a | pointer to something, you simply mark a reference as a weak | reference if it might create a cycle." | | As someone who has tried to identify memory leaks in production | where someone has forgotten to "simply" mark a reference in some | deep object graph as weak, this is naive. | | "With an 8-byte counter you will never overflow. So...you | know...just expand up to 8-bytes as needed? Usually you can get | by with a few bits." | | So now my "about as minimal as you can get short of nothing at | all" check as an unpredictable branch in it? | | "If you must overflow, e.g., you cannot afford an 8-byte counter | and you need to overflow a 4-byte counter with billions of | references, if you can copy it, you create a shallow copy." | | I don't even know where to begin with this. | | "If GC is so good, why wouldn't Python just garbage collect | everything, which they already did once and could trivially do, | instead of going through the hassle of implementing reference | counting for everything but the one case I mentioned?" | | This probably has more to do with finalising resources and | deterministic destruction than anything else. | | -- | | Anyone who is interested in actually studying this area would | probably find | https://courses.cs.washington.edu/courses/cse590p/05au/p50-b... | interesting. Also https://gchandbook.org/ | knome wrote: | >If GC is so good, why wouldn't Python just garbage collect | everything, which they already did once and could trivially do | | I don't think python ever did pure mark-and-sweep ( cpython, at | least, I'm sure jython and other alternate implementations have | ). | | My understanding was that they did pure reference counting, and | kludged on a sweep GC to do cycle breaking eventually, as | manually breaking cycles in early versions of python was a pain | point. A quick lookup seems to indicate python1 was pure | reference counting, and they added the cycle breaking when they | released python2. | cogman10 wrote: | That's my recollection as well (though I've not found | evidence to support it). In fact, IIRC, as part of the python | performance thing one of the topics was adding a proper | generational GC into python for objects that don't refer to | pinned memory. | carry_bit wrote: | You can optimize reference counting: | https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.23... | | With allocating as dead, you're basically turning it into a | tracing collector for the young generation. | jerf wrote: | From what I can see, the myth that needs to be debunked isn't | that garbage collection is super fast and easy with no | consequences, it's the myth that garbage collection always | automatically means your program is going to be spending 80% of | its time doing it and freezing for a second every five seconds | the instant you use a language with garbage collection. I see far | more "I'm writing a web server that's going to handle six | requests every hour but I'm afraid the garbage collector is going | to trash my performance" than people who believe it's magically | free. | | It's just another engineering decision. On modern systems, and | especially with any runtime that can do the majority of the GC | threaded and on an otherwise-unused core, you need to have some | pretty serious performance requirements for GC to ever get to | being your biggest problem. You should almost always know when | you're setting out to write such a system, and then, sure, think | about the GC strategy and its costs. However for the vast bulk of | programs the correct solution is to spend on the order of 10 | seconds thinking about it and realizing that the performance | costs of _any_ memory management solution are trivial and | irrelevant and the only issue in the conversation is what | benefits you get from the various options and what the non- | performance costs are. | | It is in some sense as big a mistake (proportional to the program | size) to write every little program like it's a AAA game as it is | to write a AAA game as if it's just some tiny little project, but | by the sheer overwhelming preponderance of programming problems | that are less complicated than AAA games, the former happens | overwhelmingly more often than the latter. | | Edit: I can be specific. I just greased up one of my production | systems with Go memstats. It periodically scans XML files via | network requests and parses them with a parser that cross-links | parents, siblings, and children using pointers and then runs a | lot of XPath on them, so, it's kinda pessimal behavior for a GC. | I tortured it far out of its normal CPU range by calling the | "give me all your data" JSON dump a 100 times. I've clicked | around on the website it serves to put load on it a good 10x what | it would normally see in an hour, minimum. In 15 minutes of this | way-above-normal use, it has so far paused my program for 14.6 | milliseconds total. If you straight-up added 14.6 milliseconds of | latency to every page it scanned, every processing operation, and | every web page I loaded, I literally wouldn't be able to notice, | and of course that's not what actually happened. Every second | worrying about GC on this system would be wasted. | nickbauman wrote: | Yes; I have a friend who is part of a small team that wrote a | very successful stock market trading gateway in Java. Turns out | the JVM's GC can be tuned in very specific ways based on your | needs. And there are ways to avoid having to do JVM GC in | critical areas of the code as well. | marginalia_nu wrote: | There's several exchanges and clearing platforms running | Java[1], although I'm not sure how many are still around | after Nasdaq's hostile takeover. | | [1] https://www.marketswiki.com/wiki/TRADExpress | PaulHoule wrote: | Generally Java has made a huge investment in the garbage | collector over a long period of time to address the problems | that people have in some use cases. JDK 17 is much better | than JDK 8. If you were writing a GC from scratch you are not | going to do as well. | hinkley wrote: | To be fair, they definitely got into a rut in the JDK 6-7 | timeframe. I maintain it's no accident that memcache came | into its own during this period. That was a major pain | point, and going out-of-process shouldn't have been | necessary. | kaba0 wrote: | The JVM GC's are absolutely insanely good. G1 can sustain | loads with heap sizes well into TERAbytes. | code_runner wrote: | The biggest GC issues I've personally seen manifest were arrays | of historical data that grew to tens of millions of entries and | due to array storage in .net, the array was placed in the large | object heap. Swapping to a linked list actually fixed the issue | and the team lived to fight another day. | | Like a lot of premature optimization, it isn't a problem until | it is... but solutions aren't unattainable. | hinkley wrote: | I still mostly remember the day a coworker convinced me that | object pooling was dead because it tears the mature | generation a new one over and over. | | It's nice when the runtime solves a problem you've had to | solve yourself, but it also takes a bit of your fun away, | even if your coworkers are relieved not to have to deal with | 'clever' code anymore. | agentultra wrote: | A paper I quite enjoyed on automatic reference counting for pure, | immutable functional programming: | https://arxiv.org/abs/1908.05647 | | It can be quite "fast." | miloignis wrote: | Indeed, and this line of research has been continuing to | improve in follow on work on Perceus in Koka: | https://xnning.github.io/papers/perceus.pdf and | https://www.microsoft.com/en-us/research/uploads/prod/2021/1... | | Very cool stuff! | shwestrick wrote: | This debate has gone round and round for decades. There are no | hard lines; this is about performance tradeoffs, and always will | be. | | Perhaps the biggest misconception about reference counting is | that people believe it avoids GC pauses. That's not true. | Essentially, whereas tracing GC has pauses while tracing live | data, reference counting has pauses while tracing garbage. | | Reference counting is really just another kind of GC. I'd highly | recommend perusing this paper for more details: A Unifying Theory | of Garbage Collection. | https://web.eecs.umich.edu/~weimerw/2012-4610/reading/bacon-... | | One of the biggest issues with reference counting, from a | performance perspective, is that it turns reads into writes: if | you read a heap-object out of a data structure, you have to | increment the object's reference count. Modern memory | architectures have much higher read bandwidth than write | bandwidth, so reference counting typically has much lower | throughput than tracing GC does. | goodpoint wrote: | It's not always a tradeoff: | | Nim switched from GC to RC and it even increased performance. | waterhouse wrote: | > Essentially, whereas tracing GC has pauses while tracing live | data, reference counting has pauses while tracing garbage. | | _This_ is pretty trivial to avoid. When your thread finds | itself freeing a big chain of garbage objects, you can have it | stop at any arbitrary point and resume normal work, and find | some way to schedule the work of freeing the rest of the chain | (e.g. on another thread). It 's much more complex and expensive | to do this for tracing live data, because then you need to | manage the scenario where the user program is modifying the | graph of live objects while the GC is tracing it, using a write | or read barrier; whereas for garbage, by definition you know | the user _can 't_ touch the data, so a simple list of "objects | to be freed" suffices. | | "Reads become writes" (indeed, they become atomic read-modify- | writes when multiple threads might be refcounting | simultaneously) is a problem, though. | arcticbull wrote: | RC may turn reads into writes, but of course, GC ends up having | to go through literally every piece of memory ever from bottom | to top once in a while. | | RC limits itself to modifying only relevant objects, whereas GC | reads _all_ objects in a super cache-unfriendly way. Yes, an | atomic read-modify-write is worse than a read, but it 's not | worse than a linked-list traversal of all of memory all the | time. | | And of course, not all kinds of object lend themselves to | garbage collection - for instance, file descriptors, since you | can't guarantee when or if they'll ever close. So you have to | build your own reference counting system on top of your garbage | collected system to handle these edge cases. | | There's trade-offs, yes, but the trade-off is simply that | garbage collected languages refuse to provide the compiler and | the runtime all the information they need to know in order to | do their jobs - and a massive 30 year long effort kicked off to | build a Rube Goldberg machine for closing that knowledge gap. | klodolph wrote: | > RC may turn reads into writes, but of course, GC ends up | having to go through literally every piece of memory ever | from time to time. | | Depends on the GC algorithm used. Various GC algorithms only | trace reachable objects, not unreachable ones. | | Reference counting does the opposite, more or less. When you | deallocate something, it's tracing unreachable objects. | | One of the problems with this is that reference counting | touches all the memory right before you're done with it. | | > And of course, not all kinds of object lend themselves to | garbage collection - for instance, file descriptors, since | you can't guarantee when or if they'll ever close. So you | have to build your own reference counting system on top of | your garbage collected system. | | This is not a typical solution. | | Java threw finalizers into the mix and everyone overused them | at first before they realized that finalizers suck. This is | bad enough that, in response to "too many files open" in your | Java program, you might invoke the GC. Other languages | designed since then typically use some kind of scoped system | for closing file descriptors. This includes C# and Go. | | Garbage collection does not need to be used to collect all | objects. | | > There's trade-offs, yes, but the trade-off is simply that | garbage collected languages refuse to provide the compiler | and the runtime all the information they need to know in | order to do their jobs - and a massive 30 year long rube | goldberg machine was built around closing that gap. | | When I hear rhetoric like this, all I think is, "Oh, this | person really hates GC, and thinks everyone else should hate | GC." | | Embedded in this statement are usually some assumptions which | should be challenged. For example, "memory should be freed as | soon as it is no longer needed". | arcticbull wrote: | > When I hear rhetoric like this, all I think is, "Oh, this | person really hates GC, and thinks everyone else should | hate GC." | | Yeah, I think it's an inelegant, brute-force solution to a | language problem - and that we continue to throw good money | after bad improving it. | | We should be investing in removing the need for GC through | smarter compilers and through languages that allow us to | better express our intent - and our object graph. Instead, | we continue to improve on a giant linked-list traversal | through all of live memory to try and infer the object | graph at runtime. Without sufficient information to do it | efficiently. The fundamental issue is that we don't have | languages that express our goals in a way that allows | memory management to be elided efficiently. | | That doesn't mean I have some particular affinity for | reference counting, though. It has its own issues as you | rightly point out. I prefer it for its determinism, nothing | more. | hinkley wrote: | What is your read on the lack of discussion of escape analysis? | | My own read on this is that it blurs the line with deferred | collection/counting, because you could either use it to | complement deferral making it cheaper, or avoid deferral | because you're getting enough of the benefits of deferral by | proving objects dead instead of discovering that they are dead. | kaba0 wrote: | One great example would be a C++ program that runs fast, and | then just spends 10s of seconds doing "nothing" while it | deallocates shared pointers' huge object graphs at the end. | They really are two sides of the same coin, with tracing GCs | being actually correct (you need cycle detection for a correct | RC implementation), and having much better throughput. It's not | an accident that runtimes with thousands dev hours are | polishing their GC solutions instead of using a much more | trivial RC. | hinkley wrote: | I don't know what the current state of the art is, but at one | point the answer to GC in a realtime environment was to | amortize free() across malloc(). Each allocation would clear | up to 10 elements from queue of free-able memory locations. | That gives a reasonably tight upper bound on worst case alloc | time, and most workflows converge on a garbage-free heap. Big | malloc after small free might still blow your deadlines, but | big allocations after bootstrapping are generally frowned | upon in realtime applications, so that's as much a social | problem as a technical one. | titzer wrote: | I read most the article and it's just a lot of the same tired old | arguments and an extremely simplified worldview of both GC and | reference counting. I wish I had the author's address, because | I'd like to mail them a copy of the Garbage Collection Handbook. | They clearly have a very naive view of both garbage collection | and reference counting. And there isn't a single dang measurement | anywhere, so this can be completely dismissed IMHO. | cogman10 wrote: | Agreed. What I particularly disliked is how absent of nuance it | is. RC is a form of GC and all GC algorithms make tradeoffs. RC | trades throughput for latency. Compacting mark and sweep trade | latency (and usually memory) for throughput. | | The rant at the end can be boiled down to "I use confirmation | bias [1] to make my engineering decisions". The OP has already | decided that "GC" is slow, so I'm sure every time a runtime | with it misbehaves it's "Well, that darn GC, I knew it was | bad!" and every time RC misbehaves it's likely "Oh, well you | should have nulled out your link here to break the cycle | dummy!" | | I really don't like such absolutist thinking in software dev. | All of software dev is about making tradeoffs. RC and GC aren't | superior or inferior to each other, they are just different and | either (or both) could be valid depending on the circumstance. | | [1] https://en.wikipedia.org/wiki/Confirmation_bias | titzer wrote: | > absent of nuance | | Yes, this is a good point. It makes overly general claims. | | E.g. a GC proponent could claim "well, tracing collectors do | _no_ work for dead objects, so they have _no_ overhead! " | Which is a good point, but not the whole story. Tracing | collectors may need to repeatedly traverse live objects. | Sure. But then generational collectors only traverse modified | live objects that point to new objects. True. And concurrent | collectors can trace using spare CPU resources, incremental | collectors can break marking work up into small pauses, on | and on. There are zillions of engineering tradeoffs and the | GC Handbook covers most of them really well. | glouwbug wrote: | Isn't garbage collection needed to solve circular reference | counts? | arcticbull wrote: | Nope, you can just mark the back-reference as weak. | | GC is only required if you as a programmer (or programming | language) do not provide sufficient information to the compiler | or runtime to understand the object graph. | klodolph wrote: | It's not always obvious to know which reference to mark as | weak, and there's not necessarily a clear indication of which | reference is a back-reference. | | You can find various algorithms in journals or whatnot | written with the assumption that there's GC. Algorithms | designed with this assumption may not have clear ownership | for objects, and those objects my have cyclic references. | | It's easy to say, "objects should have clear ownership | relationships" but that kind of maxim, like most maxims, | doesn't really survive if you try to apply it 100% of the | time. Ownership is a tool that is very often useful for | managing object lifetimes--it's not _always_ the tool that | you want. | flohofwoe wrote: | Such a claim really needs hard data to back it up. Reference | counting can be _very_ expensive, especially if the refcount | update is an atomic operation. It 's hard to capture in profiling | tools because the performance overhead is smeared all over the | code base instead of centralized in a few hot spots, so most of | the time you don't actually know how much performance you're | losing because of refcounting overhead. | | The most performant approach is still manual memory management | with specialized allocators tuned for specific situations, and | then still only use memory allocation when actually needed. | okennedy wrote: | This. Exactly this. | | Garbage collection has a huge, and generally entirely | unappreciated win when it comes to threaded code. As with most | things, there are tradeoffs, but every reference counting | implementation that I've used has turned any concurrent access | to shared memory into a huge bottleneck. | arcticbull wrote: | > The most performant approach is still manual memory | management with specialized allocators tuned for specific | situations, and then still only use memory allocation when | actually needed. | | RAII gets you a lot of the way there. | cosmotic wrote: | > This is about as minimal as you can get short of nothing at | all. | | With GC, you _can_ do nothing at all. In a system with lots of | garbage, you can do a GC by copying everything accessible from | the GC root, then de-allocating all the garbage in a single free. | [deleted] | bitwize wrote: | Boy, I can't wait for theangeryemacsshibe (posts here as hayley- | patton) to tear into this one. | | But yeah, the correct way to handle resources (not just memory!) | is with value semantics and RAII. Because then you know the | object will be cleaned up as soon as it goes out of scope with | zero additional effort on your part. In places where this is not | appropriate, a simple reference counting scheme may be used, but | the idea is to keep the number of rc'd objects small. Do not use | cyclical data structures. Enforce a constraint: zero cycles. For | data structures like arbitrary graphs, keep an array of vertices | and an array of edges that reference vertices by index. | | If you use a language with GC, you're probably just contributing | to global warming. | kaba0 wrote: | Why not just write embedded programs with fixed size memory | allocation then if we are that okay with restricting the | programs we write? | bitwize wrote: | Because maybe we're not okay with restricting the programs we | write _that much_. | kaba0 wrote: | Nor am I okay with restricting my programs as much as a | tree-like only allocation strategy would suffice and | circular data structures are not evil. | slavboj wrote: | People have been managing garbage collection schedules for | decades now. It's quite possible for many systems to have | completely deterministic performance, with the | allocation/deallocation performance made extremely fast, gc | restricted to certain times or a known constant overhead, etc. | Ironically, from a programming perspective it's incredibly easy | in a language like Java to see exactly what allocates and bound | those cases. | | Conversely, it's also possible for reference counting to have | perverse performance cases over a truly arbitrary reference graph | with frequent increments and decrements. You're not just doing | atomic inc/dec, you're traversing an arbitrary number of pointers | on every reference update, and it can be remarkably difficult to | avoid de/allocations in something like Python where there's not | really a builtin notion of a primitive non-object type. | | Generally speaking, memory de/allocation patterns are the issue, | not the specific choice of reference counting vs gc. | mirekrusin wrote: | What about - garbage collect by reference counting, like Python? | knome wrote: | Reference counting can also have unpredictable hits if you | release any large data structures. Whoever drops the last | reference suddenly gets to sit through the entire deep set of | items to release ( unless you can hand off the release cascade to | a background thread ). | | I've never heard of a reference counting implementation that can | handle memory compaction. | | Every time you update a reference count, which is every time you | touch any object, you're going to have to write to that RAM, | which means stealing it from any other threads using it on any | other processors. If you share large trees of data between | threads, traversing that tree in different threads will always | end up with your threads constantly fighting with each other | since there's no such thing as read only memory in reference | counting. | | When releasing something like a huge list in reference counting, | how does the release avoid blowing the stack with recursive | releasing? My guess is this just a "don't use a large list whose | release may blow the stack with recursive releasing" situation. | mamcx wrote: | > hits if you release any large data structures. | | Well, that depends in how is the RC done. This is key to | understand because if you can control it, the RC become | cheaper. | | You can see this way on | | http://sblom.github.io/openj-core/iojNoun.htm | | ie: If instead of `[Rc(1), Rc(2)]` you do `Rc([1, 2])` that | work great. | kaba0 wrote: | How is that not the exact same for tracing GC? | mamcx wrote: | Well, you don't need the GC. That is the point of this: Rc | _could_ be piece-meal. That is something that could be | exploited very well making a interpreter, for example (that | is what Array langs do under the hood) | exabrial wrote: | > I've already stated I'm not going to do benchmarks. | | Yikes | jsnell wrote: | > Basically, you attach the reference to the object graph once, | and then free it when you're done with it. | | So reference counting works by the programmer knowing the | lifetime of each object allowing them to only increment / | decrement the refcount once, and trusting that the raw uncounted | pointers they use elsewhere are always valid? There's another | word we have for this: manual memory management. It's unsafe and | unergonomic, and it's pretty telling that the author needs to | this pattern to make RC appear competitive. It's because actually | doing reference counting safely is really expensive. | | > If GC is so good, why wouldn't Python just garbage collect | everything, which they already did once and could trivially do, | instead of going through the hassle of implementing reference | counting for everything but the one case I mentioned? | | Because they've made reference counting a part of their C | extension API and ABI. If they wanted to use a GC, they'd instead | need a very different API, and then migrate all the modules to | the new API. (I.e. a way for those native extension to | register/unregister memory addresses containing pointers to | Python objects for the GC to see.) | | Early on the deterministic deallocation given by reference | counting would also have been treated by programmers as a | language level feature, making it so that a migration would have | broken working code. But I don't think that was ever actually | guaranteed in the language spec, and anyway this was not carried | over to various alternative Python implementations. | melony wrote: | You are in for a fun time when you need to make circular data | structures with ARC. ___________________________________________________________________ (page generated 2022-07-29 23:00 UTC)