[HN Gopher] Taming Go's Memory Usage, or How We Avoided Rewritin... ___________________________________________________________________ Taming Go's Memory Usage, or How We Avoided Rewriting Our Client in Rust Author : jeanyang Score : 198 points Date : 2021-09-21 18:25 UTC (4 hours ago) (HTM) web link (www.akitasoftware.com) (TXT) w3m dump (www.akitasoftware.com) | notamy wrote: | Bit confused by this part of the article: | | > PRO-REWRITE: Rust has manual memory management, so we would | avoid the problem of having to wrestle with a garbage collector | because we would just deallocate unused memory ourselves, or more | carefully be able to engineer the response to increased load. | | > ANTI-REWRITE: Rust has manual memory management, which means | that whenever we're writing code we'll have to take the time to | manage memory ourselves. | | Isn't part of the point of Rust that you _don 't_ manage memory | yourself, and rather that the compiler is smart enough to manage | it for you? | saghm wrote: | There are already a lot of replies to this comment explaining | the ideas behind Rust memory management in different ways, but | I'll throw in my handwavy explanation as well: | | In GC languages, memory management is generally runtime through | the interpreter/runtime. In C, memory management is generally | done at programming time by the (human) programmer. In Rust, | memory management is generally done at compile time by the | compiler. There are exceptions in all three cases, but the | "default" paradigm of a language informs a lot about how it's | designed and used. | NovemberWhiskey wrote: | Go = you do no explicit memory management and the GC/runtime | takes care of it for you | | Rust = when writing your code, you explicitly describe the | ownership and lifetime of your objects and how your functions | are allowed to consume/copy etc. them and get safety as a | result | | C = when writing your code, you explicitly allocate and free | your objects and you get no assistance from the language about | when it is safe to copy/dereference/free/etc. a | pointer/allocation | throwaway894345 wrote: | I prefer to think that in Go you don't do explicit memory | management _by default_ , while in Rust you do. Although you | _can_ laboriously opt out of explicit memory management | (e.g., by tagging everything Rc <> or Gc<> and all of the | ceremony that entails). | Spartan-S63 wrote: | I feel that this is one of those common misconceptions about | Rust. Rust's memory management is nothing like C or non-modern | C++'s with malloc/free or new/delete. Rust uses modern-C++'s | RAII model, typically, to allocate memory. The compiler is | smart enough to know when to call drop() (which is essentially | free/delete, but with the possibility of additional behavior). | You can also call drop() yourself. | | What I think people _should_ focus on with Rust versus Go (et | al) is that Rust allows you to choose where you _place_ memory. | You can choose the stack or the heap. The placement can matter | in hot regions of code. Additionally, Rust is pretty in-your- | face when it comes to concurrency and sharing memory across | thread/task boundaries. | jcelerier wrote: | It kills me that RAII is considered modern c++. It's there | since 1983 aha, what do you think fstream and std::vector are | if not RAII wrappers over files or memory | bluGill wrote: | before unique_ptr we didn't have a good way to handle raii | for a lot of things. I wrote a lot of RAII wrappers for | various things (still do, but a lot less). Attempts like | auto_ptr show just how hard it is to make raii work well | before C++11. | | Yes we had RAII, but it didn't work for a lot of cases | where we needed it. | oconnor663 wrote: | I think before the introduction of move semantics in C++11, | there were a lot of cases where you needed new and delete | to get basic things working. (Moving an fstream around is a | relevant example.) So the modern rule of "don't use new and | delete in application code" really wasn't practical before | that. | jcelerier wrote: | No, pretty much everything could be done with swap (like | moving an fstream as you say). Sure, it's a bit more | cumbersome, but it was still RAII. | brink wrote: | > Additionally, Rust is pretty in-your-face when it comes to | concurrency and sharing memory across thread/task boundaries. | | Use channels whenever possible. | angelzen wrote: | Tangentially, I did a bit of Rust work recently. I was sadly | unable to find a concise credible answer to a rather | elementary best-practices question: How does ownership | interact with nested datastructures? Is it possible to build | a heap tree without Boxing every node explicitly? | miloignis wrote: | This question is a bit subtle, it depends on exactly what | you mean. You could make a tree using only borrow checked | references and the compiler would make sure that parent | nodes go out of scope at the same time or before the child | nodes they point to, but I don't think that's what you're | talking about. | | In general, if it's a datastructure where you have to use | pointers, you'll have them Box'ed, but you would try to | avoid that if you can. In your example of a heap, you'd | want to use an array-based implementation, probably backed | by a growable Vec, and use indexes internally. A peek | function would still return a normal Rust reference to the | data, and the borrow checker would make sure that you don't | mutate the heap's backing array while that reference was | still in use, etc. | slaymaker1907 wrote: | I never thought about using a Vec for these, but that is | a great idea for keeping the memory management sane for | tree/linked lists. | | One thing I would add that you need to be wary of | destructors with large pointer data structures in Rust | since it can easily stack overflow. When using | Option<Box<T>> you need to be careful to call | Option::take on the pointers in a loop to avoid stack | overflow. | steveklabnik wrote: | You'd do the same stuff you'd do in C++ here; allocate | every node explicitly, use an arena, whatever you want. | slaymaker1907 wrote: | While some commenters have pointed out that you still need to | deal with lifetimes/thinking about where stuff lives, in | practice you can avoid almost all of this by using Rc<Type> | instead of Type everywhere (or Arc in a multithreaded | scenario). | | Yes Rc and equivalents have a performance overhead, but for | many use cases the overhead really isn't that bad since you | typically aren't creating tons of copies. In practice, I've | found one can ignore lifetimes in almost all cases even when | using references except when storing them in structs or | closures. So really you would just need to increment the Rc | counter for structs/closures outside of allocation/deallocation | which is dominated by calls to malloc/free. | throwaway894345 wrote: | I've tried this before and it was so laborious that I | regretted it. I'm not sure I saved myself any time over | writing "vanilla" Rust or whatever one might call the default | alternative. If I was really interested in writing Rust more | quickly, I would just clone everything rather than Rc it, but | in whichever case you're still moving quite a lot slower than | you would in Go. | sgift wrote: | I also was confused about that part but for another reason: The | whole post is basically "despite go having a GC we had to | manually manage the memory to make it work" and then the anti- | rewrite is "go does memory management for us". IMO people | sometimes have really weird ideas what is and isn't part of | managing memory. | steveklabnik wrote: | Yes, Rust kinda doesn't fit super cleanly into a very | black/white binary here. It is automatic in the sense that you | do not generally call malloc/free. The compiler handles this | for you. At the same time, you have a lot more control than you | do in a language with a GC, and so to some people, it feels | more manual. | | It's also like, a perception thing in some sense. Imagine | someone writes some code. They get a compiler error. There are | two ways to react to this event: | | "Wow the compiler didn't make this work, I have to think about | memory all the time." | | "Ah, the compiler caught a mistake for me. Thank goodness I | don't have to think about this for myself." | | Both perceptions make sense, but seem to be in complete and | total opposition. | throwaway894345 wrote: | "Manual vs automatic" is mostly just a semantic problem IMHO. | We could say "runtime versus compile time" to be more | precise, but maybe there are problems there as well. The more | interesting question to me is "how much time/energy do I | spend thinking about memory management, and is that how my | time is best spent?". In cases of high performance code, you | might spend more time fighting with the GC than you would | with the borrow checker to get the performance you need, but | for everything else the hot paths are so few and far between | you're most likely better off fighting with the GC 1% of the | time and not fighting anything the other 99%. | | The Rust community has done laudable work in bringing down | the cognitive threshold of "manual / compile-time" memory | management, but I think we're finding out that the returns | are diminishing quickly and there's still quite a chasm | between borrow checking and GC with respect to developer | velocity. | steveklabnik wrote: | "developer velocity" is also, in some sense, a semantic | question. I am special, of course, but basically, if you | include things like "time fixing bugs that would have been | prevented in Rust in the first place", my velocity is | higher in Rust than in many GC'd languages I've used in the | past. It just depends on so many factors it's impossible to | say definitively one way or another. | tptacek wrote: | I have trouble believing this, at least in any | generalizable way. I'm comfortable in both Go and Rust at | this point (my Rust has gotten better since last year | when I was griping about it on HN), and it's simply the | case that I have to think more carefully about things in | Rust because Go takes care of them for me. It's not a | "think more carefully and you're rewarded with a program | that runs more reliably and so you make up the time in | debugging" thing; it's just slower to write a Rust | program, because the memory management is much fiddlier. | | This seems pretty close to objective. It doesn't seem | like a semantic question at all. These things are all | "knowable" and "catalogable". | | (I like Rust more now than I did last year; I'm not | dunking on it.) | steveklabnik wrote: | I know you're not :) I try to be explicit that I'm only | talking about my own experience here. I try not to write | about my experiences with Go because it was a _very_ long | time ago at this point, and I find it a bit distasteful | to talk about for various reasons, but we apparently have | quite different experiences. | | Maybe it depends on other factors too. But in practice, I | basically never think about memory management. I write | code. The compiler sometimes complains. When it does, | 99.9% of the time I go "oh yeah" and then fix it. It's | not a significant part of my experience when writing | code. It does not slow me down, and the 0.1% of the time | when it does, it's made up for it in some other part of | the process. | | I wish there was a good way to _actually_ test these | sorts of things. | throwaway894345 wrote: | This jives very well with my experience. I _like_ writing | Rust, but I do so well aware that I could write the same | thing in Go and still have quite a lot of time left-over | for debugging issues. | | I can also get user feedback sooner and thus pivot my | implementation more quickly, which is a more subtle angle | that is so rarely broached in these kinds of | conversations. | | The places where I think the gap between Go and Rust is | the smallest (due to Rust's type system) are things like | compilers where you have a lot of algebraic data types to | model--Rust's enums + pattern matching are great here. | tptacek wrote: | I always miss match and options (I could go either way on | results, which tend to devolve into a shouting match | between my modules with the type system badly | refereeing). But my general experience is, I switch from | writing in Rust to Go, and I immediately notice how much | more quickly I'm getting code into the editor. It's | pretty hard to miss the difference. | smoldesu wrote: | It's very much a confusing process. If C-styled memory | management is skydiving and Python is parachuting, Rust can | feel a bit like bungee-jumping. It's neither working for or | against you, but it will behave in a specific way that you | have to learn to work around. Your reward for getting better | at that system is less mental hassle overall, but it's | _definitely_ a strange feeling, particularly if you 're | already comfortable with traditional memory management. | sreque wrote: | I'm not a rust user, but I would argue you are still managing | memory manually, you're just doing a lot of it through rust's | type system, which can check for errors at compile time, rather | than through runtime APIs like the C or C++ standard library. | The question then becomes whether it is easier to manage memory | through Rust's type system versus via standard runtime APIs. | | From what I've read, Rust memory management actually requires | _more_ work but provides fantastic safety guarantees. This | could mean that rust actually lowers productivity at first, but | as the complexity of the code base grows, some of that | productivity is restored or even supercedes C /C++ because you | spend no time chasing runtime memory bugs. | | For some products or projects, the costs of shipping a security | flaw caused by a memory bug exploit could be high enough that a | drop in productivity from Rust relative to C is still more than | justified due to external costs that Rust mitigates. | oconnor663 wrote: | I think sometimes the "compiler manages memory for you" concept | gets overplayed a bit. It's not as complex as that description | makes it sound. If you understand C++ destructors, it's really | the same thing. Objects get destroyed when they go out of | scope, and any memory or other resources they own get freed. | The differences come up when you look at what happens when you | make a mistake, like holding a pointer to a freed object. (Rust | catches these mistakes at compile time, which does indeed | involve some new complexity.) | pjmlp wrote: | Try to implement a data structure that works across async | runtimes, or a couple of GUI widgets, then you will get the | point why some of us complain about the borrow checker, even | with decades of experience in C and C++. | dgb23 wrote: | You are still managing memory in Rust, it's just more | constrained, statically checked and inferred. Within those | constraints you have full control. | pjc50 wrote: | You can also kind of do your own management of memory in GC | languages, you just have to be extremely careful in code review | to spot inadvertant allocations in the hot path. A great | example is the "LMAX Disruptor" in Java: https://lmax- | exchange.github.io/disruptor/ | | The trick is to pre-allocate all your objects and buffers and | reuse them in a ring buffer. Similar techniques work in zero- | malloc embedded C environments. | bilboa wrote: | While you may not have to directly call malloc and free in | Rust, the memory management still feels very manual compared to | a language with GC. When I want to pass an object around I have | to decide whether to pass a &_, a Box<_>, Rc<_>, or | Rc<RefCell<_>>, or a &Rc<RefCell<_>>, etc. And then there are | lifetime parameters, and having to constantly be aware of | relative lifetimes of objects. Those are all manual decisions | related to memory management that you have to constantly make | in Rust that you wouldn't need to think about in Go or Python | or Java. | | Similarly, idiomatic modern C++ rarely needs new and delete | calls, but I'd still say it has manual memory management. | | I suppose it's reasonable to talk about degrees of manual-ness, | and say that memory management in Rust or modern C++ is less | manual than C, but more manual than Go/Python/Java. | mullr wrote: | > Isn't part of the point of Rust that you don't manage memory | yourself, and rather that the compiler is smart enough to | manage it for you? | | For trivial cases, kind of. But once you start to do anything | remotely sophisticated, no. Everything you do in Rust is | _checked_ w.r.t. memory management, but you still need to make | many choices about it. All the stuff about lifetimes, | borrowing, etc: that 's memory management. The compiler's | checking it for you, but you still need to design stuff sanely, | with memory management (and the checking thereof) in mind. It's | easy to back yourself into a corner if you ignore this. | jgrant27 wrote: | After using Rust for a few years professionally it's my take that | people that really want to use it haven't had much experience | with it on real world projects. It just doesn't live up to the | hype that surrounds it. | | The memory and CPU savings are negligible between Go and Rust in | practice no matter what people might claim in theory. However, | the side effects of making your team less productive by using | Rust is a much higher price to pay than just running you Go | service on more powerful hardware. | | There are many other non-obvious problems with going to Rust that | I won't get into here but they can be quite costly and invisible | at first and impossible to fix later. | | Simple is better. Stay with Go. | adamnemecek wrote: | Can you name some non-obvious problems? | angelzen wrote: | Explicitly managed memory is useful for handling buffers. | Everything else is peanuts anyways and could use a GC for | ergonomics reasons. That being said, some really prefer the | ergonomics of working with Result and combinators compared with | the endless litany "x, err = foo(); if err !== null". IMHO | there is still room for significant progress in this space, | neither Rust nor Go have hit the sweetspot yet. | IshKebab wrote: | Why do you say "less productive with Rust"? In my experience | I'm more productive with Rust because it's very strong type | system catches so many bugs. | srcreigh wrote: | > But our profile wasn't ever showing us 500GB of live data, just | a little bit more than 200MB in the worst cases. This suggested | to me that we'd done all we could with live objects. | | Is this a typo? Weren't seeing 500 MB of live data, just a little | more than 200MB in the worst case? | | EDIT: Btw, I read the entire article. It was fascinating, thank | you! | [deleted] | markgritter wrote: | Yes, that's a typo, thanks! | henning wrote: | > Rust has manual memory management, which means that whenever | we're writing code we'll have to take the time to manage memory | ourselves. | | No. | arsome wrote: | Yeah, sounds like someone doesn't understand lifetimes and | RAII. Even in modern C++ the number of times you have to | actually think about memory management instead of lifetimes is | basically zero unless you have to work with old libraries. | tsimionescu wrote: | But thinking about lifetimes and RAII is 90% of memory | management. | | Basically whether you write C, C++, or Rust, you have to | track ownership the same ways, the only thing that changes is | how much the compiler helps you with that. However, if you | write your program in Java, Lisp or Haskell, you simply do | not care about ownership for memory-only objects, and can | structure your program significantly differently. | | This can have significant impact on certain types of | workflows, especially when it comes to shared objects. A | well-known example is when implementing lock-free data | structures based on compare-and-swap, where you need to free | the old copy of the structure after a successful compare-and- | swap; but, you can't free it since you don't know who may | still be reading from it. Here is an in-depth write-up from | Andrei Alexandrescu on the topic [0]. | | Note: I am using "object" here in the sense from C - | basically any piece of data that was allocated. | | [0] http://erdani.org/publications/cuj-2004-10.pdf | bluGill wrote: | With modern C++ your memory checklist is two steps: put it | on the stack, put it in a unique_ptr on the stack. There | are more steps after that, but you almost never get to them | and wouldn't remember them if you discovered the need for | them (which is okay because you never get there). | tsimionescu wrote: | Your checklist is only covering the simplest case, direct | ownership of small data structures. | | I'm not going to put a large array on the stack. I'm not | going to pass unique_ptr (exclusive ownership) of every | resource I allocate to every caller. I still need to | decide between passing a copy, a unique_ptr, a reference, | or a shared_ptr. When I design a data structure with | interior pointers, I need to define some ownership | semantics and make sure they are natural (for example, in | a graph that supports cycles, there is no natural notion | of ownership between graph nodes). | | These are all questions that are irrelevant in a GC | langauge, for memory resources. | pjmlp wrote: | Not really irrelevant when the said GC language also does | value types, e.g. // C# | Span<byte> buffer = stackalloc byte[1024]; | UncleEntity wrote: | > ... put it on the stack, put it in a unique_ptr on the | stack. | | What happens when the stack frame gets destroyed but you | kept a reference to the data around somewhere because you | needed it for further compilation? | | I, for one, am a fan of using the heap when doing the C++ | things... | pjmlp wrote: | I guess something like Android Oboe, macOS DriverKit, Windows | Runtime C++ Template Library, or C++/WinRT could be | considered old libraries then. | david422 wrote: | Even then, just add a wrapper and off you go. | tptacek wrote: | The big wins in this article, in what I believe was the order of | impact: | | * They do raw packet reassembly using gopacket, and gopacket | keeps TCP reassembly buffers that can grow without bound when you | miss a TCP segment. They capped the buffers, and the huge 5G | spikes went away. | | * They were reading whole buffers into memory before handing them | off to YAML and JSON parsers. They passed readers instead. | | * They were using a protobuf diffing library that used `reflect` | under the hood, which allocates. They generated their own | explicit object inspection thingies. | | * They stopped compiling regexps on the fly and moved the regexps | to package variables. (I actually don't know if this was a | significant win; there might just be the three big wins.) | | This is a great article. But none of these seem Go-specific+, or | even GC-specific. They're doing something really ambitious | (slurping packets up off the wire against busy API servers, | reassembling them in userland into streams, and then parsing the | contents of the streams). Memory usage was going to be fiddly no | matter what they built with. The problems they ran up against | seem pretty textbook. | | Frankly I'm surprised Go acquitted itself as well as it did here. | | + _Maybe the perils of `reflect` count as a Go thing; it 's worth | noting that there's folk wisdom in Go-land to avoid `reflect` | when possible._ | jrockway wrote: | Agree strongly here. These are common sources of memory leaks | in any language, and it's very likely that rewriting this code | in Rust would lead to the exact same problems. (Other cases on | HN, like Discord's in-memory cache and Twitch's "memory | ballast" thing, are pretty Go specific -- the identical C | program wouldn't have those particular bugs. But, the Go | developers read these incident reports and do fix the | underlying causes; I think Twitch's need for the "memory | ballast" got fixed a few years ago, but well after the "don't | use Go for that" meme was popularized.) | | Buffering is a pretty common bad habit. As programmers, we know | stuff is going to go wrong, and we don't want to tell the user | "come back later" (or in this case, undercount TCP stream | metrics)... we want to save the data and automatically process | it when we can so they don't have to. But, unfortunately it's | an intrinsic Law Of The Universe that if data comes in a X | bytes per second, and leaves at X-k bytes per second, then | eventually you will use all storage space in the Universe for | your buffer, and then you have the same problem you started | with. (Storage limits in mirror may be closer than they | appear.) Getting it into your mind that you have to apply back | pressure when the system is out of its design specification is | pretty crucial. Monitor it, alert on it, fix it, but don't | assume that X more bytes of RAM will solve your problem -- | there will eventually be a bigger event that exceeds those | bounds. | | Incidentally, the reason why you can make Zoom calls and use | SSH while you download a file is because people added software | to your networking stack that drops packets even though buffer | space in your consumer-grade router are available. That tells | your download to chill out so SSH and video conferencing | packets get a chance to be sent to the network. The people that | made the router had one focus -- get the highest possible | Speedtest score. Throughput, unfortunately, comes at the cost | of latency (bandwidth * buffer size for every single packet!), | and it's not the right decision overall. | | I don't know where I was going with this rant but ... when your | system is overloaded, apply backpressure to the consumers. A | packet monitoring system can't do that (people wouldn't accept | "monitoring is overloaded, stop the main process"), but it does | have to give up at some point. If you don't have any more | memory to reassemble TCP connections, mark the stream as an | error and give up. If you're dumping HTTP requests into a | database, and the database stops responding, you'll just have | to tell the HTTP client at the other end "too many requests" or | "temporarily unavailable". To make the system more reliable, | keep an eye on those error metrics and do work to get them | down. Don't just add some buffers and cross your fingers; | you'll just increase latency and still be paged to fight some | fire when an upstream system gets slow ;) | | Edit to add: I have a few stories here. One of them is about | memory limits, which I always put on any production service I | run. sum(memory limits) < sum(memory installed in the machine), | of course. One time I had Prometheus running in a k8s cluster, | with no memory limit. Sometimes people would run queries that | took a lot of RAM, and there was often slack space on the | machine, so nothing bad happened. Then someone's mouse driver | went crazy, and they opened the same Grafana tab thousands of | times. On a high memory query. Obviously, Prometheus used as | much RAM as it could, and Linux started OOM killing everything. | Prometheus died, was rescheduled on a healthy node, and the | next group of tabs killed it. Eventually, the OOM killer had | killed the Kubelet on every node, and no further progress could | be made. The moral of the story is that it would have been | better to serve that user 1000 "sorry, Prometheus died horribly | and we can't serve your request right now", which memory limits | would have achieved. Instead, we used up all the RAM in the | Universe to try to satisfy them, and still failed. (What was | the resolution? I think we killed the bad browser, which | happened to be a dashboard-displaying TV next to our desks. | Then kubelets restarted, and I of course updated Prometheus to | have a 4G memory limit. Retried 1000 tabs with an expensive | query, and Prometheus died and the frontend proxy served 990 of | the tabs an error message. Back pressure! It works! You can | imagine how fun this story would have been if I had cluster | autoscaling, though. Would have just eventually come back to a | $1,000,000 AWS bill and a 1000 node Kubernetes cluster ;) | Karrot_Kream wrote: | > it's an intrinsic Law Of The Universe that if data comes in | a X bytes per second, and leaves at X-k bytes per second, | then eventually you will use all storage space in the | Universe for your buffer, | | This is known as Little's Law. Using Little's Law, you know | that if the average time spent in queue is more than the | average time it takes for a new entry to be added to the | queue, then your queue fills up. | kevingadd wrote: | Reflection APIs seem to be pretty messy and slow in every | runtime I've ever used, perhaps because the idea of optimizing | them might encourage more use. The C# reflection APIs also | allocate a lot. | tptacek wrote: | A thing you can ding Go for is that you can find yourself | relying on `reflect` (under the hood) more than you expect, | because it's how you do things like read struct tags for | things like JSON. | | But that's not what the problem was here; the product they | were building was using `reflect` in anger. They were relying | on something that did magic, pulling a rabbit out of its hat | to automatically compare protobuf thingies. They used it on a | hot path. The room quickly filled with rabbit corpses. I | guess you can blame Go for the existence of those kinds of | libraries, but most perf-sensitive devs know that they're a | risk. | atombender wrote: | Reflection is also typically needed for anything that needs | to be generic over types. For example, if you want to write | a function that can traverse or transform a map or slice, | where the actual types aren't known at compile time. We | have a lot of this in our Go code at the company I work | for. I'm really looking forward to generics, which will | help us rip out a ton of reflect calls. | tptacek wrote: | That kind of code is generally non-idiomatic in Go. An | experienced Go programmer looks at something that is | generic over types and does something interesting and | instinctively asks "what gives, where are the dead | rabbits?". | | I'm less excited about generics. There's a cognitive cost | to them, and the constraint current Go has against | writing type-generic code is often very useful, the same | way a word count limit is useful when writing a column. | It changes the way you write, and often for the better. | josephg wrote: | I'm so conflicted on that point. I've been writing a high | performance CRDT in rust for the last few months, and I'm | leaning heavily on generics. For example, one of my types | is a special b-tree for RLE data. (So each entry is a | simple range of values). The b-tree is used in about 3-4 | different contexts, each time with a different type | parameter depending on what I need. Without genetics I'd | need to either duplicate my code or do something simpler | (and slower). I can imagine the same library in | javascript with dynamic types and I think the result | would be easier to read. But the resulting executable | would run much slower, and the code would be way more | error prone. (I couldn't lean on the compiler to find | bugs. Even TS wouldn't be rich enough.) | | Generics definitely make code harder to write and | understand. But they can also be load bearing - for | compile time error checking, specialisation and | optimization. I'm not convinced it's worth giving that | up. | tptacek wrote: | If we can be at a place where reasonable people can | disagree about generics, I'm super happy, and think we've | moved the discourse forward. There are things I like | about generics, particularly in Rust (I've had the | displeasure of dealing with them in C++, too). They're | just not an unalloyed good thing. | pgwhalen wrote: | This argument is getting a little tiresome though, isn't | it? It isn't simply enough to call something "non- | idiomatic" to gloss over a deficiency. There's a | cognitive cost to all language features, but most other | general purpose statically typed programming languages | seem to have come to the conclusion that the benefit | outweighs the cost for some form of generics. | | I am by no means a Go basher, it is one of my favorite | languages. But I eagerly await generics. | tptacek wrote: | I could have written this more clearly. The fact that | things that are generic over types are non-idiomatic | today in Go has nothing to do with whether the upcoming | generics feature is good or bad. They're unrelated | arguments. | | The latter argument is subjective and you might easily | disagree. The former argument, about experienced Go | programmers being wary when an API is generic over types, | is pretty close to an objective fact; it is a true | statement about conventional Go code. | pgwhalen wrote: | That's a fair point. Knowing not to try to write generic | code (since you don't have the tools) is the sign of an | experienced Go programmer. | | That being said, I'm curious how much kubernetes (a | large, famous, Go codebase) still has code that does | this. I used to read that it used a ton of interface{} | and type assertion, but maybe that narrative is out of | date (or never really true). I was never too familiar | with the codebase myself. | ComputerGuru wrote: | The usual C# reflection APIs that devs turn to allocate a | lot, but there are ways to make them almost performant by | (re)using delegates and expressions. There are a number of | good libraries to use reflection faster, as well. | aidenn0 wrote: | Before writing Clojure, Rich Hickey wrote FOIL[1], which used | sockets to communicate between common lisp and the JVM (or | CLR). When asked about making it in-process, Rich observed | that the reflection overhead on the JVM was often as large, | or larger, than the serialization overhead, so the gains to | be had were limited. | | 1: http://foil.sourceforge.net/ | hinkley wrote: | From what I recall, the Java team copped to the | intentionally slow accusation, but that started to change | when they decided to embrace the notion of other languages | besides Java running on the JVM. Unfortunately that would | have been shortly after Clojure was born. It took a few | releases for them to really improve that situation, and | that was still shortly before they started doing faster | releases. | titzer wrote: | > Frankly I'm surprised Go acquitted itself as well as it did | here. | | As opposed to, e.g. Java, which I ranted elsewhere in the | thread, is a trashy mess. I programmed for over a decade in | Java, and yeah, it's only gotten worse over the years. They | would have done _even more_ custom processing and bypassing of | the layers underneath due to Java 's typical copy-happiness. | pvg wrote: | This kind of analysis and remediation would work just as well | in Java and is often a more rigorous and effective approach | than the author's somewhat Java-inspired initial idea of | fiddling with GC parameters. | | One big difference is that the Java runtime design intent is | more in the vein of 'converting memory into performance'. On | HN, Ron Pressler ('pron) has written a bunch of interesting | stuff about that over the years | | https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu. | .. | marricks wrote: | Yeah, and perhaps someone who knows rust well could argue some | things are easier to do right in rust. For example, in the | second bullet, pass readers could be more of the norm in | libraries since rust in a systems programming language. Third | bullet to similar point. | | I'm not saying rust is better or they made the wrong choice, | sounds like C++ would let users easily make the same "wrong" | choices, just interesting to carry the thoughts through a bit | further. | Thaxll wrote: | io.Reader() and io.Writer() are used everywhere in Go, it's | really a standard practice. | | https://tour.golang.org/methods/21 | Thaxll wrote: | It's a question I ask often in interview, how do you upload a | 5GB file over the network with only 1MB of memory. | jhgb wrote: | > with only 1MB of memory | | Is that total system memory? | brundolf wrote: | "How we avoided rewriting in Rust" feels like clickbait given | that the answer is "our problems were algorithmic, not | language-specific" | gameswithgo wrote: | Memory issues are amplified a bit by garbage collection | though, in that every pointer must be stored twice, and | collection will take time and evict things from cpu cache | etc. | | If you were struggling with this, turning to Rust might be a | thing people would try, even if it wasn't fixing the first | order problems, and only addressing the 2nd order ones. | tptacek wrote: | The whole post is about how Rust turned out not to be the | answer to exactly this problem. | akira2501 wrote: | A bit yea, but it is somewhat telling that their first | instinct was to find a GC "knob" and twist it around until | they could go back to ignoring their basic architecture. | | Go and Rust are great in that they let you write code at good | speed, although, I think this just highlights the well known | problems of over optimizing a single metric. | throwaway894345 wrote: | I assume it's tongue-in-cheek; because "rewrite in Rust to | improve performance" is such a meme, the headline is subtly | calling attention to the fact that this is rarely good advice | and certainly not the first lever an engineer should reach | for upon running into a performance problem. | brundolf wrote: | It's not the first lever an engineer should reach for | regardless of the languages involved. Calling out Rust | specifically feels like a bit of a cheap shot | pjmlp wrote: | To be fair, that is now the common "I rewrote X in Y" | theme, which followed upon the Y [?] { Ruby, Clojure, | Scala, Kotlin,.... } from previous years. | Zababa wrote: | And Go too! It's always fun to see posts from around | 2014/2015 complaining about how every submission to | Hacker News is now "I wrote X in Go", while now Go is the | boring stuff and Rust is the hot new thing. I wonder what | will be the next Rust though. | tptacek wrote: | BPF-verified C. | pjmlp wrote: | Some GC based language with dependent types. | throwaway894345 wrote: | It's a shot at the "just rewrite it in Rust" meme, not at | Rust or the Rust community. | maleldil wrote: | > subtly calling attention | | That's generous. I'd call it clickbait. | wibagusto wrote: | Well consider all the projects titled "blah blah blah... | written in Rust" | | Who gives a shit what it's written in--what does it do? | marcos100 wrote: | People who is interested in rust maybe want to see how it | was used. | | The author could have just kept "Taming Go's Memory | Usage". | | Maybe they never considered rewriting in rust. The pros | and cons looks like just some random arguments to add | rust to the title. | typical182 wrote: | Very nice write up. | | _Go's focus on simplicity means that there is only a single | parameter, SetGCPercent, which controls how much larger the heap | is than the live objects within it_. | | FWIW, there is a new proposal from a member of the core Go team | to add a second GC knob in the form of a soft limit on total | memory: | | https://github.com/golang/proposal/blob/master/design/48409-... | | It includes some provisions to make sure that the application can | keep making progress and avoid death spirals (part of the reason | why it is a "soft" limit), and also includes some new GC-related | telemetry. | | From the blog write up, a second GC knob with a soft limit might | have only been a minor help here, with the bigger wins coming | from the code changes they described in the blog. | option_greek wrote: | I have a feeling that they will end up eventually rewriting this | in Rust as the use case they describe is where a non GC language | can definitely provide more performance (beyond the case they | solved). APM tools usually need to be more performant to ensure | they add as little overhead to the actual service as possible. I | guess what's helping here is that this is passive monitoring | which allows a little lag in the system. Question relavent here | is will there be more issues with memory in general based on | their current roadmap. | rossmohax wrote: | Every article on Go allocations can benefit from a heap escape | analysis section. I was hoping to find one here, but no luck. | Stack allocation is a powerfull technique to reduce GC times. | CraigJPerry wrote: | > For our application, it would be acceptable to simply exit when | memory usage gets too large | | Could you not just set a ulimit on memory usage of the process in | that case? (And use another process as the parent, e.g. a | supervisor or init, to avoid exiting the container and just | restart the process instead) | [deleted] | void_mint wrote: | Rebuilding in a different language is just trading one problem | set for another. Better using the tools you've already taken on | is a much better strategy if you don't have the money to hire a | whole new set of devs or a year to burn onboarding onto a new | language. | geodel wrote: | Well good for author that they were able to fix the issue. | | However I think writing efficient code in even in managed memory | languages for large, heavily used service is kind of normal thing | and not above and beyond normal work. | favorited wrote: | If I was going to write a satire piece representing a typical HN | post, I would 100% start it with the same opening 2 sentences. | wrs wrote: | Buried in here are great examples of why rewrites don't help: | | "The module that does this inference was recompiling those | regular expressions each time it was asked to do the work." | | "The reason for the allocation was a buffer holding decompressed | data, before feeding it to a parser. ...the output of the | decompression could be fed directly into the parser, without any | extra buffer." | | The problem here isn't that the language has GC, it's that memory | usage was just not considered. If you want performance, you have | to pay attention to allocations no matter what kind of memory | management your language has. And as the article demonstrates, if | you pay attention, you can _get_ performance no matter what kind | of memory management your language has. | coliveira wrote: | That is correct. | | In the worst case, you can always (even on GC'd languages) pre- | allocate buffers and do your work without new memory requests. | But you need to plan for this, in the same way you'd do in a | language without GC. | zamadatix wrote: | Rewrites can definitely help but rushing into them before doing | these other things is going to net you a lot less gain for the | time. | olau wrote: | > The problem here isn't that the language has GC, it's that | memory usage was just not considered. | | While I agree with the gist of what you're saying, I do think | runtimes based on the we'll-clean-it-up-some-day GC paradigm | makes it more important to consider memory allocation than less | laissez-faire paradigms (like RAII or reference counting), | contrary to how it's presented in the glamorous brochures. | jerf wrote: | Put it this way: Each of the things mentioned in that post | were errors that could just as easily have been made in Rust, | and Rust would not necessarily have helped avoid. At best you | can make a case for the errors being more explicit, but in my | personal experience even that would be weak. | | The last error in particular, using byte buffers instead of a | streaming abstraction, is _pervasive_ in programming. I don | 't know if Rust is necessarily any worse than Go's library | environment for dealing with that problem but I doubt it's | any better. By having io.Reader in the standard library from | the beginning (and not because of any other particular virtue | of the language, IMHO) it has had one of the best ecosystems | for dealing with streams without having to manifest them as | full bytes around [1]. | | It amounts to, the root problem is that they didn't have the | problem they thought they have. Rust will blow the socks off | the competition w.r.t. memory efficiency of lots of small | objects, which is why it's so solid in the browser space. But | that's not the problem they were having. Go's just fine where | they seem to have ultimately ended up, stream processing | things with transient per-object processing. Even if you do | some allocation in the processing, the GC ends up not being a | big deal because the runs end up scanning over not much | memory not all that frequently. This is why Go is so popular | in network servers. Could Rust do better? Yes. Absolutely, | beyond a shadow of a doubt. But not enough to matter, in a | lot of cases. | | [1]: An expansion on that thought if you like: | https://news.ycombinator.com/item?id=28368080 | tptacek wrote: | I think the Rust and Go stories with buffers vs. readers is | pretty comparable. They both have good support for readers, | and to-good support for reading whole messages into slices | or Vec<u8>'s. | jerf wrote: | Good to hear. I hope it's something all new languages | have going forward, because like I mentioned in my | extended post it's almost all about setting the tone | correctly early in the standard library & culture, rather | than any sort of "language feature" Go had. | | As mostly-a-network engineer it's a major pet peeve of | mine when I have to step back into some environment where | everything works with strings. I can just feel the memory | screaming. | sreque wrote: | More importantly, GC'ed languages tend to use at least 2x the | memory of un-GC'ed languages and have to deal with the | consequences of GC-induced pauses and generally inferior | native code interop. Whether that matters to you or not | depends on your application. No one is going to use a GC'ed | language in the Linux Kernel, but practically 100% of backend | applications are written in GC'ed languages because the | productivity benefits are of automatic memory management are | massive. | fiddlerwoaroof wrote: | I'm not really sure if that 2x figure is accurate. I've | seen charts on both sides of this and a lot here depends on | your programming language and the things it can optimize: | with Linear/Affine types, I'm fairly sure Haskell could, in | theory, eliminate GC deterministically from the critical | sections of your code-base without forcing you to adopt | manual memory management universally. | | But, there's just the fact that people writing real- | time/near real-time systems do, in fact, choose GC | languages and make it work: video games are one example | with Minecraft and Unity being the major examples. But also | HFT systems: Jane Street heavily uses Ocaml and other | companies use Java/etc. with specialized GCs. | | This is not even to mention the microbenchmarks that seem | to indicate that Common Lisp and Java can match or exceed | Rust for tasks like implementing lock-free hash maps and | various other things https://programming-language- | benchmarks.vercel.app/problem/s... | sreque wrote: | I am aware that you can hit really good latency targets | with GC'ed languages, like in the video game and finance | industry. Whenever I investigate examples, though, I find | the devs have to go through a ton of effort to avoid | memory allocations, and then I ask if using the GC'ed | language was even worth it in the first place? | | I'm actually fascinated with the idea of going off-heap | in the hotspots of GC'ed languages to get better | performance. Netty, for instance, relies on off-heap | allocations to achieve better networking performance. | But, once you do so, you start incurring the | disadvantages of languages like C/C++, and it can get | complicated mixing the two styles of code. | vp8989 wrote: | "Whenever I investigate examples, though, I find the devs | have to go through a ton of effort to avoid memory | allocations" | | Yep, also the median dev in a GC'ed language is simply | incapable of writing super efficient code in these | languages because they rarely have to. You would have to | bring in the best of the best people from those | communities or put your existing devs through a pretty | significant education process that is similar in | difficulty to just learning/using Rust. | | The resulting code will be very different to what typical | code looks like in those languages, so the supposed | homogeneity benefits of just writing fast C#/Java when | it's needed are probably not quite true. You'd basically | have to keep that project staffed up with these kinds of | people and ensure they have very good Prod observability | to ensure regressions don't appear. | sreque wrote: | Yes, and I think one important aspect to this is the | necessary CI/CD changes needed to support these kinds of | optimizations. If your performance targets are tight | enough that you are making significant non-standard | optimizations in your GC'ed language, you're probably | going to want some automated performance regression | testing in your deployment pipeline to ensure you don't | ship something that falls down under load. In my | experience, building and maintaining those pipeline | components is not easy. | tsimionescu wrote: | I mostly agree with what you're saying, but I'll also add | that GC pauses are mostly a problem of yester-year unless | you're either managing truly enormous amounts of memory or | have hard real-time requirements (and even then it's | debatable). Modern GCs, as seen in Go, Java 11+, .NET 4.5+ | guarantee sub-millisecond pauses on terrabyte-large heaps | (I believe the JS GC does as well, but I'm less sure). | sreque wrote: | I downvoted you at first and then changed my mind. I think I | would like your comment more if it were more worded like: | "buried in here are great examples of important optimizations | that did not require a rewrite". Or something like: "this | article does a great job of showing that you can hit many | reasonable performance targets while using a GC'ed language | like Go." | | You can pretty much always get better performance with more | control over memory, and more importantly, you can dramatically | lower overall memory usage and avoid GC pauses, but you have to | weigh that against the fact that automated memory management is | one of the few programming language features that is basically | proven to give a massive developer productivity boost. In my | corner of the industry, everyone chooses the GC'ed languages | and performance isn't really a major concern most of the time. | [deleted] | xondono wrote: | > Buried in here are great examples of why rewrites don't help | | That has not been my experience. Rewrites do _sometimes_ help, | because in a lot of codebases there's too many "pet" modules or | badly designed frozen interfaces. | | Rewrites _can_ help in those situations, because there's no | sacred cows anymore. The issue is that a lot of people do | rewrites as translations, without touching structures. | laumars wrote: | This is where profiling helps more. Find the weak parts of | the code, try to optimise those. If the language proves to be | a barrier then you have a justification for a rewrite. | | All too often people don't understand how to performance tune | software properly and instead blame other things first (eg | garbage collection) | bluGill wrote: | Most slow languages make escape to C easy for cases where | the language is the issue. Most fast languages make writing | a C APIed interface easy, so if the language is your issue | just rewrite the parts where that is the problem. | | Of course eventually you get to the point where enough of | the code is in a fast language that writing everything in | the fast language to avoid the pain of language interfaces | is worth it. | pjmlp wrote: | Except when the program is actually written in C, then | better hold the Algorithms and Data Structures book and | dust it off, or Intel/AMD/ARM/... manuals. | laumars wrote: | And there's time when even C isn't sufficient and a | developer needs to resort to inlined assembly. But most | of the time the starting language (whatever that might | be) is good enough. Even here, the issue wasn't the | language, it was the implementation. And even where the | problem is the language, there will always be hot paths | that need hardware performant code (be that CPU, memory, | or sometimes other devices like disk IO) and there will | be other parts in most programs that need to be optimised | for developer performance. | | Not everyone is writing sqlite or kernel development | level software. Most software projects are a trade off of | time vs purity. | | That all said, backend web development is probably the | edge case here. But even there, that's only true if | you're trying to serve several thousand requests a second | on a monolithic site in something like CGI/Perl. _Then_ | I'd argue there's not point fixing any hot paths and just | rewrite the entire thing. But even then, there's still no | need to jump straight to C, skipping Go, Java, C#, and | countless others. | silisili wrote: | Agreed with this 100%. | | So many posts here over the years of examples of 'how we | rewrote from x to y and saw 2000% gains', where x and y are | languages. Such examples are 100% meaningless. Rewrites from | the ground up -should- always be way faster, since it's all | greenfield. If trying to make a language comparison, rewrite | the entire thing in both languages! | josephg wrote: | Yes absolutely. I wrote an article a couple months ago | which was trending here where I got a 5000x performance | improvement over an existing system. One of the changes I | made was moving to rust, and some people seemed to think | the takeaway was "rewriting the code in rust made it 5000x | faster". It wasn't that. Automerge already had a rust | version of their code which ran a benchmark in 5 minutes. | Yjs does the same benchmark in less than 1 second in | javascript. | | Yjs is so fast because it makes better choices with its | data structures. A recent PR in automerge-rs brought the | same 5 minute test down to 2 seconds by changing the data | structure it uses. | | Rust/C/C++ give you more tools to write high performance | code. But if you put everything on the heap with copies | everywhere, your code won't be necessarily any faster than | it would in JS / python / ruby. And on the flip side, you | can achieve very respectable performance in dynamic | languages with a bit of care along the hot path. | coliveira wrote: | This is less an argument for a rewrite than an argument for | redesigning parts of your codebase, which can be done much | more easily than a complete rewrite. | xondono wrote: | The tricky thing is that it's easy to end up with a result | that's not far off. Some modules will improve, but a lot of | the time these kind of bottlenecks tend to happen because | the performant version is not very idiomatic (feels weird), | it's too verbose, or it's to confusing to think through. | | Unless you have the same team (and they learned the lesson | the first time), it's very likely to end up with modules | that perform in a similar way. | | Sometimes changing the language makes thinking about the | problems easier. | bluGill wrote: | Last time I was in a rewrite the boss had the old software on | a computer next to him with the label "Product owner of | rewrite". He regularly when asked how to do something looked | at what that did. | hinkley wrote: | I would argue that the rewrites help when the information | architecture for the original code is proven to be wrong, | _and_ there is either no way to refactor the old code to the | new model, or employee turnover has resulted in nobody having | an emotional attachment to the old code. | | That said, to slot in a new implementation you often have to | make the external API very similar to the old one, which can | complicate making the improvements you're after. | bluGill wrote: | > there is either no way to refactor the old code to the | new model | | That doesn't happen. Write facades as needed. Even if they | are slower than everything else write the facades so you | can keep in production all along. | hinkley wrote: | If you get the object ownership and the internal state | model wrong (information architecture) facades don't help | you. | | You can't put an idempotent or pure functional wrapper | around a design that isn't re-entrant and expect anything | good to come from it. IF you get it to work, it'll be dog | slow. | wrs wrote: | Quite true, a rewrite can help if it is also a "rethink". But | you don't have to switch languages to get that effect--in | fact you'll probably do _better_ if you don 't throw a new | language/library into the mix. | | My point was that, contrary to what is apparently a common | impulse, rewriting the same thing in a different language | while maintaining the lack of attention to performance | considerations that was present in the first version isn't | going to help much. | jjoonathan wrote: | Right, but GC encourages you to not think about memory at all | until the program starts tipping over and fixing the underlying | cause of the leak now requires an architecture change because | the "we hold onto everything" assumption got baked into the | structure in 2 places that you know about and 5 that you don't. | | I don't miss the rote parts of manual memory management, but it | had the enormously beneficial side effect of making people | consider object lifetimes upfront (to keep the retain graph | acyclic) and cultivate occasional familiarity with leak | tracking tools. Problematic patterns like the undo queue or | query correlator that accidentally leak everything tended to | become obvious when writing the code, rather than while running | it. These days, I keep seeing those same memory management | anti-patterns show up when I ask interviewees to tell a | debugging war story. Sometimes I even see otherwise capable | devs shooting in the dark and missing when it comes to the | "what's eating RAM" problem. | | I feel like GC in long-form program development substitutes a | small problem for a big one. Short-form programming can get | away with just leaking everything, which is what GC does | anyway, so I'm not sure there's any benefit there either. | | tl;dr: get off my lawn. | titzer wrote: | GC will not fix trashy programming. The problem is that many | GC'd languages have adopted a style guide that commits to a | lot of unnecessary allocations. For example, in Java, you | can't parse an integer out of the middle of a string without | allocating in-between. Ditto with lots of other common | operations. Java has oodles of trashy choices. With auto- | boxing, allocations are hidden. Without reified (let's say, | type-specialized) generics, all the collection classes carry | extra overhead for boxing values. | | I write almost all of my code in Virgil these days. It is | fully garbage-collected but nothing forces you into a trashy | style. E.g. I use (and reuse) StringBuilders, DataReaders, | and TextReaders that don't create unnecessary intermediate | garbage. It makes a big difference. | | Sometimes avoiding allocation means reusing a data structure | and "resetting" or clearing its internal state to be empty. | This works if you are careful about it. It's a nightmare if | you are _not_ careful about it. | | I'm not going back to manual memory management, and I don't | want to think about ownership. So GC. | | edit: Java also highly discourages reimplementing common JDK | functionality, but I've found building a customized | datastructure that fits exactly my needs (e.g. an intrusive | doubly-linked list) can work wonders for performance. | jjoonathan wrote: | > many GC'd languages have adopted a style guide that | commits to a lot of unnecessary allocations. | | Oh, that too. I forgot to rant about that. | | > Virgil | | Unfortunately I'd rather live with a crummy language that | has strong ecosystem, tooling, and developer availability, | so I'll never really know. It does sound nice, though. | pjmlp wrote: | Yeah, but that was one of Java's 1.0 mistakes, that | thankfully Go, .NET, D, Swift, among others, did not make. | | Now lets see if Valhalla actually happens. | josephg wrote: | > Right, but GC encourages you to not think about memory at | all | | I've come to a new obvious realisation with this sort of | thing recently: if you care about some metric, make a test | for it early and run it often. | | If you care about correctness, grow unit tests and run them | at least every commit. | | If you care about performance, write a benchmark and run it | often. You'll start noticing what makes performance improve | and regress, which over time improves your instincts. And | you'll start finding it upsetting when a small change drops | performance by a few percent. | | If you care about memory usage, do the same thing. Make a | standard test suite and measure it regularly. Ideally write | the test as early as possible in the development process. | Doing things in a sloppy way will start feeling upsetting | when it makes the metric get worse. | | I find when I have a clear metric, it always feels great when | I can make the numbers improve. And that in turn makes it | really effortless bring my attention to performance work. | tptacek wrote: | Plenty of C programs do the equivalent of ioutil.ReadAll; | it's not a GC thing. | jjoonathan wrote: | "Leak everything because we can get away with it here" is a | fine memory management strategy. "Why does my program keep | getting killed?" isn't. | tptacek wrote: | This has nothing to do with leaking (nothing "leaked"; | it's a garbage-collected runtime). It's about memory | pressure, which, I promise you, is a very real perf | problem in C programs, and why we memory profile them. | The difference between incremental and one-shot reads is | not a GC vs. non-GC thing. ___________________________________________________________________ (page generated 2021-09-21 23:00 UTC)