[HN Gopher] Reference count, don't garbage collect
       ___________________________________________________________________
        
       Reference count, don't garbage collect
        
       Author : kcl
       Score  : 57 points
       Date   : 2022-07-29 13:29 UTC (9 hours ago)
        
 (HTM) web link (kevinlawler.com)
 (TXT) w3m dump (kevinlawler.com)
        
       | assbuttbuttass wrote:
       | In practice, I don't see any reference counting approaches that
       | do cycle detection.
       | 
       | I have an example from early in my career where I accidentally
       | created a memory leak in Python from a cyclic reference between a
       | closure and a function argument
       | 
       | https://stackoverflow.com/questions/54726363/python-not-dele...
        
       | yyyk wrote:
       | Reference counting _is_ garbage collection, just a different
       | strategy - and all these strategies tend to blur to the same
       | methods eventually, eventually offering a latency-optimized GC or
       | a throughput-optimized GC.
       | 
       | Swift is inferior here because it uses reference counting GC
       | without much work towards mitigating its drawbacks like cycles
       | (judging by some recent posts, some of its fans apparently aren't
       | even aware RC has drawbacks), while more established GC languages
       | had much more time to mitigate their GC drawbacks - e.g. Java's
       | ZGC mitigates latency by being concurrent.
        
       | habibur wrote:
       | It's interesting how we have come full circle from "Reference
       | counting is the worst of two worlds [manual and GC] and will
       | always be slower" to now "Well, we all know it's actually
       | faster." in like 10 years.
        
         | pclmulqdq wrote:
         | Its actually usually slower than both manual memory management
         | and GC. It's only coming back now because people are finally
         | learning how to make memory allocations large and rare.
         | 
         | This blog post is an answer to: "Tell me you haven't learned
         | about cache coherence without telling me you haven't learned
         | about cache coherence."
        
           | rwmj wrote:
           | Hiring would certainly be a lot easier if more people were to
           | make bold, completely wrong blog postings like these. I could
           | immediately give my negative recommendation without the time
           | and hassle of a phone interview.
        
             | pclmulqdq wrote:
             | I completely agree, but before we scare people away from
             | blogging too much, I will say that my big problem with this
             | post isn't the lack of knowledge, it's the willful
             | ignorance and lack of humility. It's clear that the author
             | doesn't really understand the position they are trying to
             | argue against.
        
             | [deleted]
        
           | hinkley wrote:
           | Has anyone done a good paper on how memory bank affinity for
           | processors affects these costs?
        
           | OskarS wrote:
           | > Its actually usually slower than both manual memory
           | management and GC
           | 
           | [citation needed]
           | 
           | You and the blog post are arguing opposite things, and
           | neither of you have shown any evidence. I get that you're
           | arguing that reference counted objects are bigger (to store
           | the reference count) and/or might use double indirection
           | (depending on implementation), which are both bad for caches.
           | It's not a bad argument. But the counter-argument that the
           | blog posts makes is persuasive as well: it's expensive
           | running a GC that scans the heap looking for loose objects,
           | and reference counting does not need to do that. GC is also
           | "stop-the-world" as well unpredictable and jittery in a way
           | reference counting is not.
           | 
           | My instinct is that reference counting is actually faster
           | (which matches my personal experience), but really, this is
           | not an argument you can solve by arguing in the abstract, you
           | need actual data and benchmarks.
        
             | marginalia_nu wrote:
             | Eh, the GC you are describing sounds like something out of
             | 1994. Modern concurrent algorithms do not need to stop
             | execution. This type of code may paradoxically be slower in
             | benchmarks when there isn't any memory churn, as it
             | necessarily introduces barrier instructions at key points
             | to ensure the GC has a consistent view of the memory
             | (although memory consistency is also a concern for
             | reference counting).
        
               | adgjlsfhk1 wrote:
               | Note that if you can afford stop the world pauses, GCs
               | that use them are much more efficient. Parallel stop the
               | world garbage collectors are by far the best for
               | minimizing runtime, while concurrent GCs have higher
               | overhead, but can guarantee no significant pauses.
        
               | marginalia_nu wrote:
               | Yeah, that's true, but it shouldn't be held against GC
               | that it interrupts the execution, since it really doesn't
               | have to.
        
             | adgjlsfhk1 wrote:
             | The blog post is largely incoherent for several reasons.
             | 
             | 1. It recommends 8 byte counters. This implies making every
             | object in your programming language 8 bytes bigger which is
             | pretty much a non-starter. Almost everyone that actually
             | uses reference counting uses more like 2-4 bits.
             | 
             | 2. Reference counting can have arbitrarily long pauses as
             | well. If you decrease the reference count of an object to
             | zero, you have to decrease the reference count of all
             | referenced objects, which can do significant amounts of
             | work (specifically, it will do a lot of work in the cases
             | where regular GC does almost no work).
             | 
             | 3. The blog states that atomic writes are "basically free",
             | but that ignores the fact that in multi-threaded code,
             | atomic writes can actually be fairly expensive since each
             | one requires communication between every thread (This is
             | why python still has a GIL)
             | 
             | 4. Because no one uses 8 bytes for the count (since you
             | don't want to double your memory consumption), you still
             | need a GC anyway to deal with objects that get too many
             | references.
        
               | OskarS wrote:
               | > Almost everyone that actually uses reference counting
               | uses more like 2-4 bits.
               | 
               | Here is an incomplete list of languages which use 8 bytes
               | for their reference count on 64-bit computers:
               | 
               | 1. Rust, in Rc<T> and Arc<T>
               | 
               | 2. C++, in std::shared_ptr<T>
               | 
               | 3. Objective-C, in NSObject's retainCount
               | 
               | 4. Swift, because of Objective-C
               | 
               | 5. Python, where reference count is Py_ssize_t
               | 
               | These were literally the first languages i thought of,
               | they all use 64-bit types. I would argue since reference
               | counting is much rarer than GC, these make up the bulk of
               | reference counting use in the real world. "Almost
               | everyone", huh? It's a bit rich accusing the author of
               | being "almost incoherent" and then say this.
               | 
               | > Reference counting can have arbitrarily long pauses as
               | well.
               | 
               |  _Deallocating_ can take arbitrarily long, but
               | deallocation does not stop-the-world. It stops the
               | current thread, which is a huge difference. Malloc can
               | take arbitrarily long as well, it 's not like it's wait-
               | free.
               | 
               | In addition, the GC pauses in regular programming
               | languages are frequently orders of magnitude longer than
               | deallocation. You would have to deallocate an enormous
               | tree of object at the root for this to be an issue. And
               | GCs have to do that as well, in addition to their regular
               | stop-the-world pauses. This argument is just irrelevant.
               | 
               | > The blog states that atomic writes are "basically
               | free", but that ignores the fact that in multi-threaded
               | code, atomic writes can actually be fairly expensive
               | since each one requires communication between every
               | thread (This is why python still has a GIL)
               | 
               | First off, this is not why Python has a GIL, but lets
               | leave that aside.
               | 
               | Atomic writes are more expensive than non-atomic ones,
               | but they are not slow operations in the grand scheme of
               | things. If you properly implement acquire-release
               | semantics, they are not even that slow under high
               | contention. Compare this to a GC which literally STOPS
               | ALL THREADS, it's nothing.
               | 
               | > you still need a GC anyway to deal with objects that
               | get too many references.
               | 
               | This is just silly. In languages which has both reference
               | counting and traditional garbage collection (e.g.
               | Python), they do it to avoid reference cycles, not
               | because objects get "too many references". That is a
               | ridiculous statement.
               | 
               | In fact! I just realized we do have data for which is
               | more performant: this article describes how Instagram
               | turned of GC for Python and just relied on reference
               | counting. They gained 10% increase in performance:
               | 
               | https://instagram-engineering.com/dismissing-python-
               | garbage-...
               | 
               | I think it's still an open question if reference counting
               | is always faster than GC, but I do not believe you have
               | the technical expertise to evaluate such a claim. Your
               | comment is four paragraphs long and riddled with factual
               | errors. If you want to be convincing, show data that is
               | better than that Instagram case study.
        
               | kaba0 wrote:
               | > Deallocating can take arbitrarily long, but
               | deallocation does not stop-the-world
               | 
               | And modern GCs only have to stop the current thread to
               | check for which thread-local objects are alive. The alice
               | ones are moved to another generation, making the space
               | reusable _for free_.
               | 
               | And atomic writes need synchronization which is crazy
               | expensive, I honestly don't get why you think it isn't.
               | 
               | Also, just try writing rust/c++ code that relies entirely
               | on RC vs Java in an object heavy workload - I really
               | don't think it is an open question in any shape or form.
        
               | OskarS wrote:
               | > The alice ones are moved to another generation, making
               | the space reusable for free.
               | 
               | It's pretty hilarious to me that you first say "they have
               | to move it to another generation" and then you say "it's
               | free!" It's like saying "I paid for my dinner, and now I
               | get to eat it for free!"
               | 
               | Also: what do you think `free()` does? All modern memory
               | allocators do this trick, keeping thread-local caches.
               | This is not an advantage of GCs when reference counting
               | does it as well.
               | 
               | Almost all modern GCs are stop-the-world in at least
               | phases, and for good reason: stop-the-world GCs are
               | higher performance. You pay in other ways for skipping
               | that stop.
               | 
               | > And atomic writes need synchronization which is crazy
               | expensive, I honestly don't get why you think it isn't.
               | 
               | Because I've actually benchmarked it: https://quick-
               | bench.com/q/ISEetAHOohv-GaEuYR-7MajJgTc
               | 
               | 18.5 nanoseconds fits under no reasonable definition of
               | "crazy expensive", not when a regular increment clocks in
               | at 5.9 nanoseconds. And there is extremely few situations
               | where you increment a reference count more than, like, 5
               | times. It's just not an issue.
               | 
               | This is like cargo cult programming: you've been told
               | these things and never tested them in the real world, and
               | you have all these preconceived notions that just don't
               | stand up to two minutes of scrutiny.
               | 
               | > Also, just try writing rust/c++ code that relies
               | entirely on RC vs Java in an object heavy workload - I
               | really don't think it is an open question in any shape or
               | form.
               | 
               | Yes, _of course_ garbage collectors are easier to use
               | than reference counting. Nobody has ever disputed this.
               | That is the whole raison d 'etre of garbage collectors.
               | This is not what the discussion is about, it's about
               | performance.
               | 
               | I'm done with this thread now, unless anybody can show me
               | any actual _data_ to back anything you say up. It 's
               | really tiring. I started this by saying "I don't actually
               | know", and everyone replying to me has been so darn
               | certain of everything they say while being outright
               | incorrect about most things, and without any actual data
               | to back up the rest.
        
               | hayley-patton wrote:
               | You'd get very different results doing atomic operations
               | on counters shared between multiple threads, with other
               | stuff to do that causes your CPU to serialise and run
               | less out-of-order. That single-threaded benchmark is
               | data, but it doesn't appear awfully relevant to how naive
               | RC barriers perform versus barriers for coalesced RC or
               | tracing.
               | 
               | Indeed the issue with measuring barriers is that
               | measuring the barrier doesn't suffice; one wants to
               | measure how the barrier affects the rest of execution.
               | This entails coming up with programs that are much less
               | trivial than repeatedly incrementing a counter.
        
               | barsonme wrote:
               | Is your benchmark running concurrently? Or single-
               | threaded?
        
               | kaba0 wrote:
               | Moving to another generation can be done completely
               | asyncronously on another thread that likely doesn't do
               | any useful work on a modern, highly parallel hardware.
               | `free` doesn't do much, but `malloc` does -- with the
               | method I am talking about (TLAB in the JVM), you get as
               | fast allocations as it gets, it's nothing more than a
               | NON-ATOMIC pointer bump. Meanwhile malloc has to find an
               | empty space that can fit the object at hand.
               | 
               | > > Also, just try writing rust/c++ code that relies
               | entirely on RC vs Java in an object heavy workload - I
               | really don't think it is an open question in any shape or
               | form. > Yes, of course garbage collectors are easier to
               | use than reference counting. Nobody has ever disputed
               | this. That is the whole raison d'etre of garbage
               | collectors. This is not what the discussion is about,
               | it's about performance.
               | 
               | I am talking about performance exactly. Java's GC will
               | smoke the hell out of C++'s shared pointers and Rust's
               | (A)RC. Noone said anything about productivity/ease of
               | usage.
               | 
               | And as mentioned by another commenter - your benchmark
               | didn't take into account anything related to _parallel_
               | execution, which would be the point.
        
             | pclmulqdq wrote:
             | The blog post about this is pretty much incoherent, and
             | comes from a bad understanding of performance, atomic, GC
             | algorithms, and reference counting.
             | 
             | The ONLY reason to reference count is when you need GC-like
             | behavior on a few objects, but do not want to impose a GC
             | on all objects. It is a very valuable tool in performance
             | code not because it is fast, but because it allows you to
             | make other things fast. Suggesting that you should
             | reference count the world is patently ridiculous. This is
             | the reason for Rust's Arc and C++ shared_ptr, not that they
             | are faster than a GC.
             | 
             | The blog post completely brushes aside the costs of
             | _atomic_ counter increments and decrements, calling them
             | "just increments." The "atomic" is the key performance
             | problem, not the increment.
             | 
             | Reference counting also makes objects larger so small
             | objects cannot fit inside a machine register.
             | 
             | Modern GC algorithms are very efficient. They do not need
             | to pause the world, and they do not need to be jittery.
             | However, someone who doesn't understand how expensive
             | atomic ops are probably wouldn't understand this either.
        
             | kaba0 wrote:
             | With all due respect, why do you believe that your
             | experience is meaningful compared to people writing actual
             | industrial-scale runtimes, when RC is the hello world of GC
             | algorithms? It's not that they don't know about it, it's
             | that they studied the topic countless times and found
             | significant difference between the two approaches, to the
             | point that no language considered fast uses RC (or if they
             | do, they do so because it is a useful escape hatch to
             | manual memory management that doesn't need support from the
             | runtime).
        
         | flohofwoe wrote:
         | Except that "refcounting is faster than a GC" is mostly a myth,
         | both are equally bad if predictable performance matters.
        
           | klodolph wrote:
           | Looking at metrics from Go's garbage collector, what else do
           | you want? GC pauses are damn low, and you'll see numbers in
           | the sub 500ms range.
           | 
           | If I needed hard-realtime, I would avoid allocation entirely.
        
       | dpryden wrote:
       | This article is naive to the point of being flat-out wrong, since
       | it makes extremely naive assumptions about how a garbage
       | collector works. This is basically another C++-centric programmer
       | saying that smart pointers work better than the Boehm GC -- which
       | is completely true but also completely misleading.
       | 
       | I'm not saying that GC is _always_ the best choice, but this
       | article gets the most important argument wrong:
       | 
       | > 1. Updating reference counts is quite expensive. > > No, it
       | isn't. It's an atomic increment, perhaps with overflow checks for
       | small integer widths. This is about as minimal as you can get
       | short of nothing at all.
       | 
       | Yes, it is. Even an atomic increment is a _write_ to memory. That
       | is not  "about as minimal as you can get short of nothing at
       | all".
       | 
       | Additionally, every modern GC does generational collection, so
       | for the vast majority of objects, _the GC literally does "nothing
       | at all"_. No matter how little work it does, a RC solution has to
       | do O(garbage) work, while a copying GC can do O(not garbage)
       | work.
       | 
       | Now, that's not to say that GC is automatically better. There are
       | trade-offs here. It depends on the workload, the amount of
       | garbage being created, and the ratio of read to write operations.
       | 
       | The article says:
       | 
       | > I've already stated I'm not going to do benchmarks. I am aware
       | of two orgs who've already run extensive and far-reaching
       | experiments on this: Apple, for use in their mobile phones, and
       | the Python project.
       | 
       | I can counterpoint that anecdata: Google extensively uses Java in
       | high-performance systems, and _invented a new GC-only language_
       | (Go) as a replacement for (their uses of) Python.
       | 
       | The right answer is to do benchmarks. Or even better yet, don't
       | worry about this and just write your code! Outside of a
       | vanishingly small number of specialized use cases, by the time GC
       | vs RC becomes relevant in any meaningful way to your performance,
       | you've already succeeded, and now you're dealing with scaling
       | effects.
        
       | dfox wrote:
       | One great advantage of garbage collection is that it removes need
       | for thread synchronization in cases where is it only needed to
       | make sure that object jou are going to call free/decref on is not
       | in use in another thread. Corollaly to that GC is the thing that
       | you need for many lock-free data structures to be practical and
       | not of only theoretical interest.
       | 
       | It might seem that it is simply about pushing your
       | synchronizations problems onto the GC, but the synchronization
       | issue that GC solves internally is different and usually more
       | coarse-grained, so in the end you have significantly smaller
       | synchronization overhead.
        
       | byefruit wrote:
       | This article provides very little evidence for it's claims and
       | seems to only have a superficial understanding of modern GCs.
       | 
       | "Increments and decrements happen once and at a predictable time.
       | The GC is running all the time and traversing the universe of GC
       | objects. Probably with bad locality, polluting the cache, etc."
       | 
       | This is only the case with a mark-sweep collector, usually most
       | of your allocations die young in the nursery. With reference
       | counting you pay the counting cost for everything.
       | 
       | "In object-oriented languages, where you can literally have a
       | pointer to something, you simply mark a reference as a weak
       | reference if it might create a cycle."
       | 
       | As someone who has tried to identify memory leaks in production
       | where someone has forgotten to "simply" mark a reference in some
       | deep object graph as weak, this is naive.
       | 
       | "With an 8-byte counter you will never overflow. So...you
       | know...just expand up to 8-bytes as needed? Usually you can get
       | by with a few bits."
       | 
       | So now my "about as minimal as you can get short of nothing at
       | all" check as an unpredictable branch in it?
       | 
       | "If you must overflow, e.g., you cannot afford an 8-byte counter
       | and you need to overflow a 4-byte counter with billions of
       | references, if you can copy it, you create a shallow copy."
       | 
       | I don't even know where to begin with this.
       | 
       | "If GC is so good, why wouldn't Python just garbage collect
       | everything, which they already did once and could trivially do,
       | instead of going through the hassle of implementing reference
       | counting for everything but the one case I mentioned?"
       | 
       | This probably has more to do with finalising resources and
       | deterministic destruction than anything else.
       | 
       | --
       | 
       | Anyone who is interested in actually studying this area would
       | probably find
       | https://courses.cs.washington.edu/courses/cse590p/05au/p50-b...
       | interesting. Also https://gchandbook.org/
        
         | knome wrote:
         | >If GC is so good, why wouldn't Python just garbage collect
         | everything, which they already did once and could trivially do
         | 
         | I don't think python ever did pure mark-and-sweep ( cpython, at
         | least, I'm sure jython and other alternate implementations have
         | ).
         | 
         | My understanding was that they did pure reference counting, and
         | kludged on a sweep GC to do cycle breaking eventually, as
         | manually breaking cycles in early versions of python was a pain
         | point. A quick lookup seems to indicate python1 was pure
         | reference counting, and they added the cycle breaking when they
         | released python2.
        
           | cogman10 wrote:
           | That's my recollection as well (though I've not found
           | evidence to support it). In fact, IIRC, as part of the python
           | performance thing one of the topics was adding a proper
           | generational GC into python for objects that don't refer to
           | pinned memory.
        
       | carry_bit wrote:
       | You can optimize reference counting:
       | https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.23...
       | 
       | With allocating as dead, you're basically turning it into a
       | tracing collector for the young generation.
        
       | jerf wrote:
       | From what I can see, the myth that needs to be debunked isn't
       | that garbage collection is super fast and easy with no
       | consequences, it's the myth that garbage collection always
       | automatically means your program is going to be spending 80% of
       | its time doing it and freezing for a second every five seconds
       | the instant you use a language with garbage collection. I see far
       | more "I'm writing a web server that's going to handle six
       | requests every hour but I'm afraid the garbage collector is going
       | to trash my performance" than people who believe it's magically
       | free.
       | 
       | It's just another engineering decision. On modern systems, and
       | especially with any runtime that can do the majority of the GC
       | threaded and on an otherwise-unused core, you need to have some
       | pretty serious performance requirements for GC to ever get to
       | being your biggest problem. You should almost always know when
       | you're setting out to write such a system, and then, sure, think
       | about the GC strategy and its costs. However for the vast bulk of
       | programs the correct solution is to spend on the order of 10
       | seconds thinking about it and realizing that the performance
       | costs of _any_ memory management solution are trivial and
       | irrelevant and the only issue in the conversation is what
       | benefits you get from the various options and what the non-
       | performance costs are.
       | 
       | It is in some sense as big a mistake (proportional to the program
       | size) to write every little program like it's a AAA game as it is
       | to write a AAA game as if it's just some tiny little project, but
       | by the sheer overwhelming preponderance of programming problems
       | that are less complicated than AAA games, the former happens
       | overwhelmingly more often than the latter.
       | 
       | Edit: I can be specific. I just greased up one of my production
       | systems with Go memstats. It periodically scans XML files via
       | network requests and parses them with a parser that cross-links
       | parents, siblings, and children using pointers and then runs a
       | lot of XPath on them, so, it's kinda pessimal behavior for a GC.
       | I tortured it far out of its normal CPU range by calling the
       | "give me all your data" JSON dump a 100 times. I've clicked
       | around on the website it serves to put load on it a good 10x what
       | it would normally see in an hour, minimum. In 15 minutes of this
       | way-above-normal use, it has so far paused my program for 14.6
       | milliseconds total. If you straight-up added 14.6 milliseconds of
       | latency to every page it scanned, every processing operation, and
       | every web page I loaded, I literally wouldn't be able to notice,
       | and of course that's not what actually happened. Every second
       | worrying about GC on this system would be wasted.
        
         | nickbauman wrote:
         | Yes; I have a friend who is part of a small team that wrote a
         | very successful stock market trading gateway in Java. Turns out
         | the JVM's GC can be tuned in very specific ways based on your
         | needs. And there are ways to avoid having to do JVM GC in
         | critical areas of the code as well.
        
           | marginalia_nu wrote:
           | There's several exchanges and clearing platforms running
           | Java[1], although I'm not sure how many are still around
           | after Nasdaq's hostile takeover.
           | 
           | [1] https://www.marketswiki.com/wiki/TRADExpress
        
           | PaulHoule wrote:
           | Generally Java has made a huge investment in the garbage
           | collector over a long period of time to address the problems
           | that people have in some use cases. JDK 17 is much better
           | than JDK 8. If you were writing a GC from scratch you are not
           | going to do as well.
        
             | hinkley wrote:
             | To be fair, they definitely got into a rut in the JDK 6-7
             | timeframe. I maintain it's no accident that memcache came
             | into its own during this period. That was a major pain
             | point, and going out-of-process shouldn't have been
             | necessary.
        
           | kaba0 wrote:
           | The JVM GC's are absolutely insanely good. G1 can sustain
           | loads with heap sizes well into TERAbytes.
        
         | code_runner wrote:
         | The biggest GC issues I've personally seen manifest were arrays
         | of historical data that grew to tens of millions of entries and
         | due to array storage in .net, the array was placed in the large
         | object heap. Swapping to a linked list actually fixed the issue
         | and the team lived to fight another day.
         | 
         | Like a lot of premature optimization, it isn't a problem until
         | it is... but solutions aren't unattainable.
        
           | hinkley wrote:
           | I still mostly remember the day a coworker convinced me that
           | object pooling was dead because it tears the mature
           | generation a new one over and over.
           | 
           | It's nice when the runtime solves a problem you've had to
           | solve yourself, but it also takes a bit of your fun away,
           | even if your coworkers are relieved not to have to deal with
           | 'clever' code anymore.
        
       | agentultra wrote:
       | A paper I quite enjoyed on automatic reference counting for pure,
       | immutable functional programming:
       | https://arxiv.org/abs/1908.05647
       | 
       | It can be quite "fast."
        
         | miloignis wrote:
         | Indeed, and this line of research has been continuing to
         | improve in follow on work on Perceus in Koka:
         | https://xnning.github.io/papers/perceus.pdf and
         | https://www.microsoft.com/en-us/research/uploads/prod/2021/1...
         | 
         | Very cool stuff!
        
       | shwestrick wrote:
       | This debate has gone round and round for decades. There are no
       | hard lines; this is about performance tradeoffs, and always will
       | be.
       | 
       | Perhaps the biggest misconception about reference counting is
       | that people believe it avoids GC pauses. That's not true.
       | Essentially, whereas tracing GC has pauses while tracing live
       | data, reference counting has pauses while tracing garbage.
       | 
       | Reference counting is really just another kind of GC. I'd highly
       | recommend perusing this paper for more details: A Unifying Theory
       | of Garbage Collection.
       | https://web.eecs.umich.edu/~weimerw/2012-4610/reading/bacon-...
       | 
       | One of the biggest issues with reference counting, from a
       | performance perspective, is that it turns reads into writes: if
       | you read a heap-object out of a data structure, you have to
       | increment the object's reference count. Modern memory
       | architectures have much higher read bandwidth than write
       | bandwidth, so reference counting typically has much lower
       | throughput than tracing GC does.
        
         | goodpoint wrote:
         | It's not always a tradeoff:
         | 
         | Nim switched from GC to RC and it even increased performance.
        
         | waterhouse wrote:
         | > Essentially, whereas tracing GC has pauses while tracing live
         | data, reference counting has pauses while tracing garbage.
         | 
         |  _This_ is pretty trivial to avoid. When your thread finds
         | itself freeing a big chain of garbage objects, you can have it
         | stop at any arbitrary point and resume normal work, and find
         | some way to schedule the work of freeing the rest of the chain
         | (e.g. on another thread). It 's much more complex and expensive
         | to do this for tracing live data, because then you need to
         | manage the scenario where the user program is modifying the
         | graph of live objects while the GC is tracing it, using a write
         | or read barrier; whereas for garbage, by definition you know
         | the user _can 't_ touch the data, so a simple list of "objects
         | to be freed" suffices.
         | 
         | "Reads become writes" (indeed, they become atomic read-modify-
         | writes when multiple threads might be refcounting
         | simultaneously) is a problem, though.
        
         | arcticbull wrote:
         | RC may turn reads into writes, but of course, GC ends up having
         | to go through literally every piece of memory ever from bottom
         | to top once in a while.
         | 
         | RC limits itself to modifying only relevant objects, whereas GC
         | reads _all_ objects in a super cache-unfriendly way. Yes, an
         | atomic read-modify-write is worse than a read, but it 's not
         | worse than a linked-list traversal of all of memory all the
         | time.
         | 
         | And of course, not all kinds of object lend themselves to
         | garbage collection - for instance, file descriptors, since you
         | can't guarantee when or if they'll ever close. So you have to
         | build your own reference counting system on top of your garbage
         | collected system to handle these edge cases.
         | 
         | There's trade-offs, yes, but the trade-off is simply that
         | garbage collected languages refuse to provide the compiler and
         | the runtime all the information they need to know in order to
         | do their jobs - and a massive 30 year long effort kicked off to
         | build a Rube Goldberg machine for closing that knowledge gap.
        
           | klodolph wrote:
           | > RC may turn reads into writes, but of course, GC ends up
           | having to go through literally every piece of memory ever
           | from time to time.
           | 
           | Depends on the GC algorithm used. Various GC algorithms only
           | trace reachable objects, not unreachable ones.
           | 
           | Reference counting does the opposite, more or less. When you
           | deallocate something, it's tracing unreachable objects.
           | 
           | One of the problems with this is that reference counting
           | touches all the memory right before you're done with it.
           | 
           | > And of course, not all kinds of object lend themselves to
           | garbage collection - for instance, file descriptors, since
           | you can't guarantee when or if they'll ever close. So you
           | have to build your own reference counting system on top of
           | your garbage collected system.
           | 
           | This is not a typical solution.
           | 
           | Java threw finalizers into the mix and everyone overused them
           | at first before they realized that finalizers suck. This is
           | bad enough that, in response to "too many files open" in your
           | Java program, you might invoke the GC. Other languages
           | designed since then typically use some kind of scoped system
           | for closing file descriptors. This includes C# and Go.
           | 
           | Garbage collection does not need to be used to collect all
           | objects.
           | 
           | > There's trade-offs, yes, but the trade-off is simply that
           | garbage collected languages refuse to provide the compiler
           | and the runtime all the information they need to know in
           | order to do their jobs - and a massive 30 year long rube
           | goldberg machine was built around closing that gap.
           | 
           | When I hear rhetoric like this, all I think is, "Oh, this
           | person really hates GC, and thinks everyone else should hate
           | GC."
           | 
           | Embedded in this statement are usually some assumptions which
           | should be challenged. For example, "memory should be freed as
           | soon as it is no longer needed".
        
             | arcticbull wrote:
             | > When I hear rhetoric like this, all I think is, "Oh, this
             | person really hates GC, and thinks everyone else should
             | hate GC."
             | 
             | Yeah, I think it's an inelegant, brute-force solution to a
             | language problem - and that we continue to throw good money
             | after bad improving it.
             | 
             | We should be investing in removing the need for GC through
             | smarter compilers and through languages that allow us to
             | better express our intent - and our object graph. Instead,
             | we continue to improve on a giant linked-list traversal
             | through all of live memory to try and infer the object
             | graph at runtime. Without sufficient information to do it
             | efficiently. The fundamental issue is that we don't have
             | languages that express our goals in a way that allows
             | memory management to be elided efficiently.
             | 
             | That doesn't mean I have some particular affinity for
             | reference counting, though. It has its own issues as you
             | rightly point out. I prefer it for its determinism, nothing
             | more.
        
         | hinkley wrote:
         | What is your read on the lack of discussion of escape analysis?
         | 
         | My own read on this is that it blurs the line with deferred
         | collection/counting, because you could either use it to
         | complement deferral making it cheaper, or avoid deferral
         | because you're getting enough of the benefits of deferral by
         | proving objects dead instead of discovering that they are dead.
        
         | kaba0 wrote:
         | One great example would be a C++ program that runs fast, and
         | then just spends 10s of seconds doing "nothing" while it
         | deallocates shared pointers' huge object graphs at the end.
         | They really are two sides of the same coin, with tracing GCs
         | being actually correct (you need cycle detection for a correct
         | RC implementation), and having much better throughput. It's not
         | an accident that runtimes with thousands dev hours are
         | polishing their GC solutions instead of using a much more
         | trivial RC.
        
           | hinkley wrote:
           | I don't know what the current state of the art is, but at one
           | point the answer to GC in a realtime environment was to
           | amortize free() across malloc(). Each allocation would clear
           | up to 10 elements from queue of free-able memory locations.
           | That gives a reasonably tight upper bound on worst case alloc
           | time, and most workflows converge on a garbage-free heap. Big
           | malloc after small free might still blow your deadlines, but
           | big allocations after bootstrapping are generally frowned
           | upon in realtime applications, so that's as much a social
           | problem as a technical one.
        
       | titzer wrote:
       | I read most the article and it's just a lot of the same tired old
       | arguments and an extremely simplified worldview of both GC and
       | reference counting. I wish I had the author's address, because
       | I'd like to mail them a copy of the Garbage Collection Handbook.
       | They clearly have a very naive view of both garbage collection
       | and reference counting. And there isn't a single dang measurement
       | anywhere, so this can be completely dismissed IMHO.
        
         | cogman10 wrote:
         | Agreed. What I particularly disliked is how absent of nuance it
         | is. RC is a form of GC and all GC algorithms make tradeoffs. RC
         | trades throughput for latency. Compacting mark and sweep trade
         | latency (and usually memory) for throughput.
         | 
         | The rant at the end can be boiled down to "I use confirmation
         | bias [1] to make my engineering decisions". The OP has already
         | decided that "GC" is slow, so I'm sure every time a runtime
         | with it misbehaves it's "Well, that darn GC, I knew it was
         | bad!" and every time RC misbehaves it's likely "Oh, well you
         | should have nulled out your link here to break the cycle
         | dummy!"
         | 
         | I really don't like such absolutist thinking in software dev.
         | All of software dev is about making tradeoffs. RC and GC aren't
         | superior or inferior to each other, they are just different and
         | either (or both) could be valid depending on the circumstance.
         | 
         | [1] https://en.wikipedia.org/wiki/Confirmation_bias
        
           | titzer wrote:
           | > absent of nuance
           | 
           | Yes, this is a good point. It makes overly general claims.
           | 
           | E.g. a GC proponent could claim "well, tracing collectors do
           | _no_ work for dead objects, so they have _no_ overhead! "
           | Which is a good point, but not the whole story. Tracing
           | collectors may need to repeatedly traverse live objects.
           | Sure. But then generational collectors only traverse modified
           | live objects that point to new objects. True. And concurrent
           | collectors can trace using spare CPU resources, incremental
           | collectors can break marking work up into small pauses, on
           | and on. There are zillions of engineering tradeoffs and the
           | GC Handbook covers most of them really well.
        
       | glouwbug wrote:
       | Isn't garbage collection needed to solve circular reference
       | counts?
        
         | arcticbull wrote:
         | Nope, you can just mark the back-reference as weak.
         | 
         | GC is only required if you as a programmer (or programming
         | language) do not provide sufficient information to the compiler
         | or runtime to understand the object graph.
        
           | klodolph wrote:
           | It's not always obvious to know which reference to mark as
           | weak, and there's not necessarily a clear indication of which
           | reference is a back-reference.
           | 
           | You can find various algorithms in journals or whatnot
           | written with the assumption that there's GC. Algorithms
           | designed with this assumption may not have clear ownership
           | for objects, and those objects my have cyclic references.
           | 
           | It's easy to say, "objects should have clear ownership
           | relationships" but that kind of maxim, like most maxims,
           | doesn't really survive if you try to apply it 100% of the
           | time. Ownership is a tool that is very often useful for
           | managing object lifetimes--it's not _always_ the tool that
           | you want.
        
       | flohofwoe wrote:
       | Such a claim really needs hard data to back it up. Reference
       | counting can be _very_ expensive, especially if the refcount
       | update is an atomic operation. It 's hard to capture in profiling
       | tools because the performance overhead is smeared all over the
       | code base instead of centralized in a few hot spots, so most of
       | the time you don't actually know how much performance you're
       | losing because of refcounting overhead.
       | 
       | The most performant approach is still manual memory management
       | with specialized allocators tuned for specific situations, and
       | then still only use memory allocation when actually needed.
        
         | okennedy wrote:
         | This. Exactly this.
         | 
         | Garbage collection has a huge, and generally entirely
         | unappreciated win when it comes to threaded code. As with most
         | things, there are tradeoffs, but every reference counting
         | implementation that I've used has turned any concurrent access
         | to shared memory into a huge bottleneck.
        
         | arcticbull wrote:
         | > The most performant approach is still manual memory
         | management with specialized allocators tuned for specific
         | situations, and then still only use memory allocation when
         | actually needed.
         | 
         | RAII gets you a lot of the way there.
        
       | cosmotic wrote:
       | > This is about as minimal as you can get short of nothing at
       | all.
       | 
       | With GC, you _can_ do nothing at all. In a system with lots of
       | garbage, you can do a GC by copying everything accessible from
       | the GC root, then de-allocating all the garbage in a single free.
        
       | [deleted]
        
       | bitwize wrote:
       | Boy, I can't wait for theangeryemacsshibe (posts here as hayley-
       | patton) to tear into this one.
       | 
       | But yeah, the correct way to handle resources (not just memory!)
       | is with value semantics and RAII. Because then you know the
       | object will be cleaned up as soon as it goes out of scope with
       | zero additional effort on your part. In places where this is not
       | appropriate, a simple reference counting scheme may be used, but
       | the idea is to keep the number of rc'd objects small. Do not use
       | cyclical data structures. Enforce a constraint: zero cycles. For
       | data structures like arbitrary graphs, keep an array of vertices
       | and an array of edges that reference vertices by index.
       | 
       | If you use a language with GC, you're probably just contributing
       | to global warming.
        
         | kaba0 wrote:
         | Why not just write embedded programs with fixed size memory
         | allocation then if we are that okay with restricting the
         | programs we write?
        
           | bitwize wrote:
           | Because maybe we're not okay with restricting the programs we
           | write _that much_.
        
             | kaba0 wrote:
             | Nor am I okay with restricting my programs as much as a
             | tree-like only allocation strategy would suffice and
             | circular data structures are not evil.
        
       | slavboj wrote:
       | People have been managing garbage collection schedules for
       | decades now. It's quite possible for many systems to have
       | completely deterministic performance, with the
       | allocation/deallocation performance made extremely fast, gc
       | restricted to certain times or a known constant overhead, etc.
       | Ironically, from a programming perspective it's incredibly easy
       | in a language like Java to see exactly what allocates and bound
       | those cases.
       | 
       | Conversely, it's also possible for reference counting to have
       | perverse performance cases over a truly arbitrary reference graph
       | with frequent increments and decrements. You're not just doing
       | atomic inc/dec, you're traversing an arbitrary number of pointers
       | on every reference update, and it can be remarkably difficult to
       | avoid de/allocations in something like Python where there's not
       | really a builtin notion of a primitive non-object type.
       | 
       | Generally speaking, memory de/allocation patterns are the issue,
       | not the specific choice of reference counting vs gc.
        
       | mirekrusin wrote:
       | What about - garbage collect by reference counting, like Python?
        
       | knome wrote:
       | Reference counting can also have unpredictable hits if you
       | release any large data structures. Whoever drops the last
       | reference suddenly gets to sit through the entire deep set of
       | items to release ( unless you can hand off the release cascade to
       | a background thread ).
       | 
       | I've never heard of a reference counting implementation that can
       | handle memory compaction.
       | 
       | Every time you update a reference count, which is every time you
       | touch any object, you're going to have to write to that RAM,
       | which means stealing it from any other threads using it on any
       | other processors. If you share large trees of data between
       | threads, traversing that tree in different threads will always
       | end up with your threads constantly fighting with each other
       | since there's no such thing as read only memory in reference
       | counting.
       | 
       | When releasing something like a huge list in reference counting,
       | how does the release avoid blowing the stack with recursive
       | releasing? My guess is this just a "don't use a large list whose
       | release may blow the stack with recursive releasing" situation.
        
         | mamcx wrote:
         | > hits if you release any large data structures.
         | 
         | Well, that depends in how is the RC done. This is key to
         | understand because if you can control it, the RC become
         | cheaper.
         | 
         | You can see this way on
         | 
         | http://sblom.github.io/openj-core/iojNoun.htm
         | 
         | ie: If instead of `[Rc(1), Rc(2)]` you do `Rc([1, 2])` that
         | work great.
        
           | kaba0 wrote:
           | How is that not the exact same for tracing GC?
        
             | mamcx wrote:
             | Well, you don't need the GC. That is the point of this: Rc
             | _could_ be piece-meal. That is something that could be
             | exploited very well making a interpreter, for example (that
             | is what Array langs do under the hood)
        
       | exabrial wrote:
       | > I've already stated I'm not going to do benchmarks.
       | 
       | Yikes
        
       | jsnell wrote:
       | > Basically, you attach the reference to the object graph once,
       | and then free it when you're done with it.
       | 
       | So reference counting works by the programmer knowing the
       | lifetime of each object allowing them to only increment /
       | decrement the refcount once, and trusting that the raw uncounted
       | pointers they use elsewhere are always valid? There's another
       | word we have for this: manual memory management. It's unsafe and
       | unergonomic, and it's pretty telling that the author needs to
       | this pattern to make RC appear competitive. It's because actually
       | doing reference counting safely is really expensive.
       | 
       | > If GC is so good, why wouldn't Python just garbage collect
       | everything, which they already did once and could trivially do,
       | instead of going through the hassle of implementing reference
       | counting for everything but the one case I mentioned?
       | 
       | Because they've made reference counting a part of their C
       | extension API and ABI. If they wanted to use a GC, they'd instead
       | need a very different API, and then migrate all the modules to
       | the new API. (I.e. a way for those native extension to
       | register/unregister memory addresses containing pointers to
       | Python objects for the GC to see.)
       | 
       | Early on the deterministic deallocation given by reference
       | counting would also have been treated by programmers as a
       | language level feature, making it so that a migration would have
       | broken working code. But I don't think that was ever actually
       | guaranteed in the language spec, and anyway this was not carried
       | over to various alternative Python implementations.
        
       | melony wrote:
       | You are in for a fun time when you need to make circular data
       | structures with ARC.
        
       ___________________________________________________________________
       (page generated 2022-07-29 23:00 UTC)