[HN Gopher] Generics can make your Go code slower ___________________________________________________________________ Generics can make your Go code slower Author : tanoku Score : 327 points Date : 2022-03-30 15:47 UTC (7 hours ago) (HTM) web link (planetscale.com) (TXT) w3m dump (planetscale.com) | YesThatTom2 wrote: | People that demanded generics don't care about performance. | | They care about making excuses about not using Go. | throwoutway wrote: | The first code-to-assembly highlighting example here is | beautiful. Question to the authors-- is that custom just for this | article? | | Is there an open source CSS library or something that does this? | tanoku wrote: | Hey, author here. Thanks for the kind words! This is a custom | pipeline that I designed for the article. It's implemented as a | Node.js library using SVG.js and it statically generates the | interactive SVGs directly in the static site generator I was | using (Eleventy) by calling out to the Go compiler and | extracting assembly for any lines you mark as interesting. It | turned out very handy for iterating, but it's not particularly | reusable I'm afraid! | BeeOnRope wrote: | I came here to ask about the same thing. Very cool! I would | be very interested even in a blog post just on how you did | the SVG generation. | msla wrote: | I agree with the commenter you're replying to; I'd only add | that Intel syntax is much more readable than AT&T. | mcronce wrote: | FWIW, I'm fairly sure this is the assembly syntax used by | Go - the author may not have made a decision to use this vs | another | eatonphil wrote: | Key tldr from me: | | > Ah well. Overall, this may have been a bit of a disappointment | to those who expected to use Generics as a powerful option to | optimize Go code, as it is done in other systems languages. We | have learned (I hope!) a lot of interesting details about the way | the Go compiler deals with Generics. Unfortunately, we have also | learned that the implementation shipped in 1.18, more often than | not, makes Generic code slower than whatever it was replacing. | But as we've seen in several examples, it needn't be this way. | Regardless of whether we consider Go as a "systems-oriented" | language, it feels like runtime dictionaries was not the right | technical implementation choice for a compiled language at all. | Despite the low complexity of the Go compiler, it's clear and | measurable that its generated code has been steadily getting | better on every release since 1.0, with very few regressions, up | until now. | | And remember: | | > DO NOT despair and/or weep profusely, as there is no technical | limitation in the language design for Go Generics that prevents | an (eventual) implementation that uses monomorphization more | aggressively to inline or de-virtualize method calls. | jatone wrote: | I agree. I find this snippet interestingly incorrect. | | > with very few regressions, up until now. | | the idea that this is a regression is silly. you can't have a | regression unless old code is slower as a result. which is | clearly not the case. its just a less than ideal outcome for | generics. which will likely get resolved. | nvarsj wrote: | I'd argue that golang is inherently not a systems language, with | its mandatory GC managed memory. I think it's a poor choice for | anything performance or memory sensitive, especially a database. | I know people would disagree (hence all the DBs written in golang | these days, and Java before it), but I think C/C++/Rust/D are all | superior for that kind of application. | | All of which is to say, I don't think it matters. Use the right | tool for the job - if you care about generic overhead, golang is | not the right thing to use in the first place. | jksmith wrote: | Sure, so can Modula-2 or Ada. Point is, degrees of separation | in our biases, especially when it comes to C sugared languages. | JamesBarney wrote: | I've used a couple of DBs written in Go before and they were | great. Loved using InfluxDB it was robust and performant. | mhh__ wrote: | Go is a system (no s) language IMO. | DoctorOW wrote: | I've said this before, and I'll say it again. People lump Go in | with C/C++/Rust because they all (can) produce static binaries. | I don't need to install Go's runtime like I install | Java/NodeJS/Python runtimes. Honestly, I think it speaks so | much to Go's accomplishments that it performs so well people | intuitively categorize it with the systems languages rather | than other managed languages. | pjmlp wrote: | Managed languages like Oberon, D, Modula-3, System C#... | Thaxll wrote: | There are very fast DB written in Go so this comment is | irrelevant, what is the equivalent of | https://github.com/VictoriaMetrics/VictoriaMetrics in an other | language? | chakkepolja wrote: | There's highly tuned java software too, like Lucene, do you | call java a systems language? | | All in all I think the semantics debate is irrelevant. No one | is going to use go for an OS only because someone on internet | calls it a systems language. | jpgvm wrote: | Gorilla which many of VM ideas are based on is in C++. | | Druid is Java and very fast but not like for like as it's an | event database not a timeseries database. Pinot is in the | same vein. | | Most of the very big and very fast databases you have used | indirectly though web services like Netflix (Cassandra), etc | are written in Java. | samatman wrote: | Large programs require memory management. | | Are you writing an application where Go's garbage collector | will perform poorly relative to rolling your own memory | management? | | Maybe, those applications exist, but maybe not, it shouldn't be | presumed. | | I'm more open to the argument from definition, which might be | what you mean by 'inherently': there isn't an RFC we can point | to for interpreting what a systems language _is_ , and it could | be useful to have a consensus that manual memory management is | a necessary property of anything we call a systems language. | | No such consensus exists, and arguing that Go is a poor choice | for $thing isn't a great way to establish whether it is or is | not a systems language. | | Go users certainly seem to think so, and it's not a molehill I | wish to die upon. | zozbot234 wrote: | Rust itself will most likely get some form of support for | local, "pluggable" garbage collection in the near future. It's | needed for managing general graphs with possible cycles, which | might come up even in "systems programming" scenarios where | performance is a focus - especially as the need for auto- | managing large, complex systems increases. | nu11ptr wrote: | `Rc` and `Arc` have weak references which have worked just | fine for me to break cycles in a graph. Not saying my use | case is the most complex, but I haven't noticed this as a | problem yet. YMMV | zozbot234 wrote: | But the whole point of weak references is that they don't | get shared ownership (i.e. extend the lifetime) of their | referenced values. That's not doing GC. They're fine when | there's always some other (strong) reference that will | ensure whatever lifetime is needed, but that's not the | general case that GC is intended for. | nu11ptr wrote: | Weak pointers don't create a strong reference, correct, | and that is exactly what is needed to break a cycle. | Since it is a cycle, there is some other "owning" strong | reference out there. Every use case I've seen generally | has an obvious strong and weak reference (usually parent | vs child). I'm sure there are trickier corner cases, but | that is the typical IMO. | | For everything else, Rust has no need for a tracing GC, | as it has "compile time" GC via static lifetime analysis | which is much better IMO already and often avoid heap | allocation all together. | jerf wrote: | If your data is "mostly a tree but has the occasional back | reference I can easily identify as a back reference", that | works great and I've used it in other languages. | | But in the limit, as your data becomes maximally "graphy" | and there isn't much you can do about it, this ceases to be | a viable option. You have to be able to carve a "canonical | tree" out of your graph for this to work and that's not | always possible. (Practically so, I mean. Mathematically | you can always define a tree, by simple fiat if nothing | else, but that doesn't help if the only available | definitions are not themselves practical.) | nu11ptr wrote: | Fair point - so far I've been lucky enough to avoid such | use cases | bborud wrote: | Let's start at the beginning. What is a <<systems language>> | and for what is it typically used? | sophacles wrote: | A systems language is a language used (typically) to write | systems :P | | Jokes aside, this is kind of a fundamental problem with the | term, and many terms around classifying programs. Also worth | noting - "program" is a term that is a lot looser than people | who typically live above the kernel tend to think. | jeffffff wrote: | i agree with you that garbage collected languages are bad for | systems programming but it's not because garbage collection is | inherently bad, it's because gc doesn't handle freeing | resources other than memory. for better or worse i've spent | most of my professional career writing databases in java and i | will not start a database product in java or any other garbage | collected language again. getting the error handling and | resource cleanup right is way harder in java or go than c++ or | rust because raii is the only sane way to do it. | baq wrote: | this has been argued ad nauseum a decade ago and it boils down | to your definition of 'systems'. at google scale, a system is a | mesh of networked programs, not a kernel or low-level bit- | banging tool. | stingraycharles wrote: | By that definition, Java is a systems language as well. | | I think Go makes a better trade-off than Java, but I struggle | to come up with decent examples of projects one could write | in Go and not in Java. Most of the "systems" problems that | Java is unsuitable for, also apply to Go. | coder543 wrote: | > By that definition, Java is a systems language as well. | | I would fully agree that Java is a systems language. | | However, the definition of "systems language" has been a | contentious issue for a very long time, and that debate | seems unlikely to be resolved in this thread. I don't think | the term itself is very useful, so it's probably better if | everyone focuses on discussing things that actually matter | to the applications they're trying to develop instead of | arguing about nebulous classifications. | phplovesong wrote: | Go compiles to static binaries. Java needs the JVM. That | already is a HUGE difference in "picking the right tool". | | Also, the JVM is way more heavy and resource intensive than | a typical Go program. Go is great for cli tools, servers, | and the usual "microservices" stuff, whatever it means to | you. | pjmlp wrote: | Java can just be AOT compiled as Go, the only difference | is that until recently it wasn't a free beer option to do | so. | cb321 wrote: | >until recently | | Actually, for a long time (almost 20 years, I think), the | gcc subsystem gcj let you do AOT compilation of Java to | ELF binaries. [1] I think they had to be dynamically | linked, but only to a few shared objects (unless you | pulled in a lot via native dependencies, but that's kind | of "on you"). | | I don't recall any restrictions on how to use generated | code, anymore than gcc-generated object code. So, I don't | think the FSF/copyleft on the compiler itself nullifies a | free beer classification. :-) gcj may not have done a | great job of tracking Java language changes. So, there | might be a "doesn't count as 'real Java'" semantic issue. | | [1] https://manpages.ubuntu.com/manpages/trusty/man1/gcj. | 1.html | ptsneves wrote: | The interesting thing I always heard in the Java world is | that AOT was actually a drawback as the virtual machine | allowed for just in time optimizations according to | hotspots. Actually if I remember correctly the word | hotspot itself was even used as a tech trademark. | | I was always a bit skeptical but given i was never much | into Java i just assumed my skepticism was out of | ignorance. Now with what I know about profile guided | compilation I can see it happening; A JIT language should | have a performance advantage, especially if the optimal | code paths change dynamically according to workload. Not | even profile guided compilation can easily handle that, | unless I am ignorant of more than i thought. | pjmlp wrote: | Sun was ideologically against AOT, as all commercial | vendors always had some form of either AOT or JIT caches. | | In fact, the JIT caches in OpenJDK come from Oracle/Bea's | J/Rockit, while IBM's OpenJ9 AOT compilation is from | WebSphere Real Time JVM implementation. | fluoridation wrote: | I've heard the exact opposite. The supposed performance | benefits of JIT compared to AOT (profile-guided | optimization, run-time uarch-targeted optimization) never | really materialized. There's been a lot of research since | the late '90s into program transformation and it turned | out that actually the most effective optimizations are | architecture-independent and too expensive to be | performed over and over again at startup or when the | program is running. At the same time, deciding when it's | worthwhile to reoptimize based on new profiling data | turned out to be a much more difficult problem than | expected. | | So the end result is that while both AOT (GCC, LLVM) and | JIT (JVM, CLR) toolchains have been making gradual | progress in performance, the JIT toolchains never caught | up with the AOT ones as was expected in the '90s. | pjmlp wrote: | Good luck with inlining and devirtualization across DLLs | with AOT. | | JIT caches with PGO get most of AOT benefits, that is why | after the short stint with AOT on Android, Google decided | to invest in JIT caches instead. | | The best toolchains can do both, so it is never a matter | of either AOT or JIT. | | GCC and clang aren't investing in JIT features just for | fun. | fluoridation wrote: | What's with the snappy tone? | | >Good luck with inlining and devirtualization across DLLs | with AOT. | | An AOT compiler/linker is unable to inline calls across | DLL boundaries because DLLs present a black-box | interface. A JIT compiler would run into the exact same | problem when presented with a DLL whose interface it | doesn't understand or is incompatible with. If you really | want a call inlined the solution is to link the caller | and function statically (whether the native code | generation happens at compile- or run-time), not to | depend on unknown capabilities of the run-time. | | >The best toolchains can do both, so it is never a matter | of either AOT or JIT. | | You're refuting a false dichotomy no one raised. | cb321 wrote: | Theory and practice can diverge and it's easy to over- | conclude based on either with such complex systems. For | example, I have seen gcc PGO make the very same training | case used to measure the profile run _more slowly_. One | might think that impossible naively, but maybe it sounds | more plausible if I put it differently - "steering the | many code generation heuristics with the profile failed | in practice in that case". As with almost everything in | computer systems, "it all depends...." | pjmlp wrote: | I seldom mention it, because gcj was abandoned in 2009 | when most contributors moved into the newly released | OpenJDK, eventually removed from GCC tree, and it never | was as foolproof as the commercial versions. | tptacek wrote: | People do huge amounts of systems programming in Java, | including in systems that are incredibly performance- | sensitive. | jpgvm wrote: | Go is strictly less useful than Java because it has | strictly less power. This is true for general purpose | programming (though somewhat remediated through | introduction of generics) it's doubly true for "systems" | applications: | | No access to raw threads. No ability to allocate/utilize | off-heap memory (without CGo and nonsense atleast). Low | throughput compared to Java JIT (unsuitable for CPU | intensive tasks). | | The only thing I can think of in it's favor is lower memory | usage by default but this is mostly just a JVM | misconception, you can totally tune it for low memory usage | (in constrained env) or high memory efficiency - especially | if using off-heap structures. | | On a stdlib level Java mostly wins but Go has some | highlights, it has an absolutely rock solid and well built | HTTP and TLS/X.509/ASN1 stack for instance, also more | batteries vs Java. | | Overall I think if the requirement is "goes fast" I will | always choose Java. | | I may pick Go if the brief calls for something like a | lightweight network proxy that should be I/O bound rather | than CPU bound and everything I need is in stdlib and I | don't need any fancy collections etc. | jen20 wrote: | > On a stdlib level Java mostly wins | | This isn't even true compared to other comparable | platforms like .NET, let alone Go which has hands down | the most useful and well constructed standard library in | existence (yes, even better than Python). | jpgvm wrote: | Yeah I don't buy that. | | Especially not when things like this exist: | https://pkg.go.dev/container | | And things like this don't: https://docs.oracle.com/en/ja | va/javase/17/docs/api/java.base... | | As I mentioned Go does have a great HTTP and TLS stack | but that doesn't do enough to put it on the same level. | throwaway894345 wrote: | I think you're mistaken on nearly every count. :) | | First of all, Go and Java exist at roughly the same | performance tier. It will be less work to make Java beat | Go for some applications and vice versa for other | applications. Moreover, typical Go programs use quite a | lot less memory than typical Java programs (i.e., there's | more than one kind of performance). | | Secondly, Go can make syscalls directly, so it absolutely | can use raw-threads and off-heap memory. These are | virtually never useful for the "systems" domain (as | defined above). | | Thirdly, I think Go's stdlib is better if only because it | isn't riddled with inheritance. It also has a standard | testing library that works with the standard tooling. | | Lastly, I think you're ignoring other pertinent factors | like maintainability (does a new dev have to learn a new | framework, style, conventions, etc to start | contributing?), learning curve (how long does it take to | onboard someone who is unfamiliar with the language? Are | there plugins for their text editor or are they going to | have to learn an IDE?), tooling (do you need a DSL just | to define the dependencies? do you need a DSL just to | spit out a static binary? do you need a CI pipeline to | publish source code or documentation packages?), runtime | (do you need a GC tuning wizard to calibrate your | runtime? does it "just work" in all environments?), etc. | jpgvm wrote: | I disagree. | | Go is definitely not as fast as Java for throughput. It's | gotten pretty good for latency sensitive workloads but | it's simply left in the dust for straight throughput, | especially if you are hammering the GC. | | Sure it can make syscalls directly but if you are going | to talk about a maintainability nightmare I can't think | of anything worse than trying to manipulate threads | directly in Go. I had to do this in a previous Go project | where thread pinning was important and even that sucked. | | That is just taste. Objectively collections and many | other aspects of the Java stdlib completely destroy Go, I | pointed out the good bits already. | | Again, taste. Java has a slightly steeper and longer | learning curve but that is a cost you pay once and is | amortized over all the code that engineer will contribute | over their tenure. | | Using an IDE (especially if everyone is using the same | one) is actually a productivity improvement, not an | impairment but again - taste. Some people just don't like | IDEs or don't like that you need to use a specific one to | get the most out of a specific tech stack. | | Build systems in Java by and large fall into only 3 | camps, Maven, Gradle and a very small (but | loud/dedicated) Bazel camp. Contrast that to Go which is | almost always a huge pile of horrible Makefiles, CMake, | Bazel or some other crazy homebrewed bash build system. | | You don't escape CI because you used Go, if you think you | did then you are probably doing Go wrong. | | Java runtime trades simplicity for ability to be tuned, | again taste. I personally prefer it. | | So no, I don't think I am mistaken. I think you just | prefer Go over Java for subjective reasons. Which is | completely OK but doesn't invalidate anything I said. | Thaxll wrote: | > Build systems in Java by and large fall into only 3 | camps, Maven, Gradle and a very small (but | loud/dedicated) Bazel camp. Contrast that to Go which is | almost always a huge pile of horrible Makefiles, CMake, | Bazel or some other crazy homebrewed bash build system. | | Well Go does not need a book of 400+ pages to understand | Maven. | throwaway894345 wrote: | > Go is definitely not as fast as Java for throughput. | It's gotten pretty good for latency sensitive workloads | but it's simply left in the dust for straight throughput, | especially if you are hammering the GC. | | Java is better for _GC throughput_ , but your claim was | about _compute throughput_ in general. Moreover, Go doesn | 't lean nearly as hard on GC as Java does in the first | place (idiomatic value types, less boxing, etc), so GC | throughput doesn't imply overall throughput. | | > Sure it can make syscalls directly but if you are going | to talk about a maintainability nightmare I can't think | of anything worse than trying to manipulate threads | directly in Go. I had to do this in a previous Go project | where thread pinning was important and even that sucked. | | Thread pinning is a very rare requirement, typically you | only need it when you're calling some poorly-written C | library. If this is your requirement, then Go's solution | will be less maintainable, but for everyone else the | absence of the foot-gun is the more maintainable solution | (i.e., as opposed to an ecosystem of intermingled OS | threads and goroutines). | | > That is just taste. Objectively collections and many | other aspects of the Java stdlib completely destroy Go, I | pointed out the good bits already. | | Agreed that it's taste. Agreed that Java has more | collections than Go, but I think it's a good thing that | Go pushes people toward slices and hashmaps because those | are the right tool for the job 90% of the time. I think | there's some broader point here about how Java doesn't do | a good job of encouraging people away from misfeatures | (e.g., inheritance, raw threads, off-heap memory, etc). | | > Again, taste. Java has a slightly steeper and longer | learning curve but that is a cost you pay once and is | amortized over all the code that engineer will contribute | over their tenure. | | Java has a _significantly_ steeper /longer curve--it's | not only the language that you must learn, but also the | stdlib, runtime, tools, etc and these are typically | considerably more complicated than Go. Moreover, it's a | cost an engineer pays once, but it's a cost an | organization pays over and over (either because they have | to train people in Java or narrow their hiring pool). | | > Build systems in Java by and large fall into only 3 | camps, Maven, Gradle and a very small (but | loud/dedicated) Bazel camp. Contrast that to Go which is | almost always a huge pile of horrible Makefiles, CMake, | Bazel or some other crazy homebrewed bash build system. | | Go has one build system, `go build`. Some people will | wrap those in Makefiles (typically very lightweight | makefiles e.g., they just call `go build` with a few | flags). A minuscule number of projects use Bazel--for all | intents and purposes, Bazel is not part of the Go | ecosystem. I haven't seen any "crazy homebrewed bash | build system" either, I suspect this falls into the "for | all intents and purposes not part of the Go ecosystem" | category as well. I've been writing Go regularly since | 2012. | | > You don't escape CI because you used Go, if you think | you did then you are probably doing Go wrong. | | I claimed the CI burden is lighter for Go than Java, not | that it goes away entirely. | | > Java runtime trades simplicity for ability to be tuned, | again taste. I personally prefer it. | | I think it's difficult to accurately quantify, but I | don't think it's a matter of taste. Specifically, I would | wager that Go's defaults + knobs are less work than Java | for something like 99% of applications. | | > So no, I don't think I am mistaken. I think you just | prefer Go over Java for subjective reasons. Which is | completely OK but doesn't invalidate anything I said. | | I agree that some questions are subjective, but I think | on many objective questions you are mistaken (e.g., | performance, build tool ecosystem, etc). | philosopher1234 wrote: | I think your use of "subjective" is avoiding discussing | things that are harder to prove but matter a great deal. | philosopher1234 wrote: | This is an argument from edge case capabilities that | completely ignores maintenance costs + development time. | Seems very naive to me. | jpgvm wrote: | It's not. If you are building a database or other | "systems" software these are very relevant capabilities. | | Also development time of Java may be slightly longer in | the early stages but I generally find refactoring of Java | projects and shuffling of timelines etc is a ton easier | than Go. So I think Java wins out over a longer period of | time even if it starts off a bit slower. | | It's far from naive. I have written a shitton of Go code | (also a shitton of Java if that wasn't already apparent). | philosopher1234 wrote: | You may not personally be naive, but i was talking about | your analysis, not you. | | >Also development time of Java may be slightly longer in | the early stages but I generally find refactoring of Java | projects and shuffling of timelines etc is a ton easier | than Go. So I think Java wins out over a longer period of | time even if it starts off a bit slower. | | I think this topic is far too large to be answered in | this brief sentence. I also think it deserves a higher | allocation of your words than what you spared for java's | capabilities :) | | But yes, I see now that you are interested purely in | performance in your argument and definition of systems | software, in which case what you're saying may be true. | BobbyJo wrote: | Totally agree. If the argument is strictly more power is | always better, then C++ would always win. Why doesn't it? | Exactly what you reference, dev time and maintenance. | | Go was designed for simplicity. Of course it's not the | fastest or most feature rich. It's strong suit is that I | can pop open any Go codebase and understand what's going | on fairly quickly. I went from not knowing any Go to | working with it effectively in a large codebase in a | couple weeks. Not the case with Java. Not the case with | most languages. | jpgvm wrote: | That wasn't the argument though, you are attacking a | strawman. The argument was much more nuanced if you | bothered to read it. | | Essentially it boils down to this. If I am writing | -systems- software and I'm going to choose between Go or | Java then the list of things I pointed out are the main | differentiating features along with raw throughput which | matters for things like databases which need to be able | to do very fast index/bitmap/etc operations. | | Go is great for being simple and easy to get going. | However that is completely worthless in systems software | that requires years of background knowledge to | meaningfully contribute to. The startup cost of learning | a new codebase (or entirely new programming language) | pales in comparison to the requisite background | knowledge. | BobbyJo wrote: | > Go is strictly less useful than Java because it has | strictly less power. | | Literally sentence one, so calling my argument straw-man | is dishonest. | | > Essentially it boils down to this. If I am writing | -systems- software and I'm going to choose between Go or | Java then the list of things I pointed out are the main | differentiating features along with raw throughput which | matters for things like databases which need to be able | to do very fast index/bitmap/etc operations. | | All true. In my experience though, the long tail of | maintenance and bug fixes tend to result in decreasing | performance over time, as well as a slowing of new | feature support. | | All of that being said, these are all fairly pointless | metrics when we can just look at the DBs being adopted | and why people are adopting them. Plenty of projects use | Go because of Go's strengths, so saying "that is | completely worthless in systems software" is verifiably | false. It's not worthless in any software, worth less | maybe, but not worthless. | throwaway894345 wrote: | I don't think it's useful to frame "fitness for a given | domain" as a binary, but yes, Java is often used | successfully for this domain (although personally I think | Go is an even better fit for a variety of reasons). | geodel wrote: | Go has user defined value types which Java does not yet. It | makes huge difference in memory density for typical data | structures. This makes Go more suitable to low overhead web | services, cli tools running on few MBs which Java at least | needs few hundred MBs | pionar wrote: | > Go has user defined value types which Java does not | yet. | | C# has this. A lot of people overlook C# in this area, | probably because until recently, it was not cross- | platform. | socialdemocrat wrote: | I would say Go is a systems programming language. A systems | programming language is for creating services used by actual | end user applications. That is pretty much what Go is being | used for. Who is writing editors or drawing applications in Go? | Nobody. | | Go does contain many of the things of interest to systems | programmers such as pointers and the ability to specify memory | layout of data structures. You can make your own secondary | allocators. In short it gives you far more fine grained control | over how memory is used than something like Java or Python. | | https://erik-engheim.medium.com/is-go-a-systems-programming-... | xyproto wrote: | The GC in Go is not mandatory. | ptman wrote: | In the language? In the implementation? | xyproto wrote: | Both. It can be paused or disabled at runtime. | titzer wrote: | You really haven't given any supporting information for your | argument other than a vague feeling that GC is somehow bad. In | fact you just pointed out many counterexamples to your own | argument, so I'm not sure what to take away. | | I've seen this sentiment a lot, and I never see specifics. "GC | is bad for systems language" is an unsupported, tribalist, | firmly-held belief that is unsupported by hard data. | | On the other hand, huge, memory-intensive and garbage-collected | systems have been deployed in vast numbers by thousands of | different companies for decades, long before Go, within | acceptable latency bounds. And shoddy, poorly performing | systems have been written in C/C++ and failed spectacularly for | all kinds of reasons. | throwaway894345 wrote: | Virtually invariably, "GC is bad" assumes (1) lots of garbage | (2) long pause times. Go has idiomatic value types (so it | generates much less garbage) and a low-latency garbage | collector. People who argue against GC are almost always | arguing against some Java GC circa 2005. | MaulingMonkey wrote: | This is the no true scottsman argument. I mean, no true | modern GC. And it's bullshit. Let's be topical and pick on | Go since that's the language in the title: | | https://blog.twitch.tv/en/2019/04/10/go-memory-ballast- | how-i... | | 30% of CPU spent on GC, individual GC pauses already in the | milliseconds, despite a tiny half-gig heap in 2019. For | gamedev, a single millisecond in the wrong place can be | enough to miss vsync and have unacceptable framerate | stutter. In the right place, it's "merely" 10% of your | entire frame budget, for VR-friendly, nausea-avoiding | framerates near 100fps. Or perhaps 90% if "single digit | milliseconds" might include 9ms. | | Meanwhile, the last professional project I worked on had | 100ms pauses every 30 seconds because we were experimenting | with duktape, which is still seeing active commits. Closer | to a 32GB heap for that project, but most of that was | textures. Explicit allocation would at least show where the | problematic garbage/churn was in any profiler, but garbage | collection meant a single opaque codepath for _all_ garbage | deallocation... without even the benefit of explicit static | types to narrow down the problem. | titzer wrote: | From your link (which I remember reading at the time): | | > So by simply reducing GC frequency, we saw close to a | ~99% drop in mark assist work, which translated to a~45% | improvement in 99th percentile API latency at peak | traffic. | | Did you look at the actual article? (Because it doesn't | support your point). They added a 10GB memory ballast to | keep the GC pacer from collecting too much. That is just | a bad heuristic in the GC, and should have a tuning knob. | I'd argue a tuning knob isn't so bad, compared to | rewriting your entire application to manually malloc/free | everything, which would likely result in oodles of bugs. | | Also: | | > And it's bullshit. | | Please, we can keep the temperature on the conversation | down a bit by just keeping to facts and leaving out a few | of these words. | MaulingMonkey wrote: | > Please, we can keep the temperature on the conversation | down a bit by just keeping to facts and leaving out a few | of these words. | | Sure. Let's avoid some of these words too: | | > unsupported, tribalist, firmly-held belief that is | unsupported by hard data. | | Asking for examples is fine and great, but painting broad | strokes of the dissenting camp before they have a chance | to respond does nothing to help keep things cool. | [deleted] | MaulingMonkey wrote: | > Did you look at the actual article? (Because it doesn't | support your point). | | I did and it does for the point I intended to derive from | said article: | | >> However, the GC pause times before and after the | change were not significantly different. Furthermore, our | pause times were on the order of single digit | milliseconds, not the 100s of milliseconds improvement we | saw at peak load. | | They were able to improve times via tuning. Individual GC | pause times were still in the milliseconds. Totally | acceptable for twitch's API servers (and in fact drowned | out by the several hundred millisecond response times), | but those numbers mean you'd want to avoid doing anything | at all in a gamedev render thread that could potentially | trigger a GC pause, because said GC pause will trigger a | vsync miss. | | > I'd argue a tuning knob isn't so bad, compared to | rewriting your entire application to manually malloc/free | everything, which would likely result in oodles of bugs. | | Memory debuggers and RAII tools have ways to tackle this. | | I've also spent my fair share of time tackling oodles of | bugs from object pooling, meant to workaround performance | pitfalls in GCed languages, made worse by the fact that | said languages treated manual memory allocation as a | second class citizen at best, providing inadequate | tooling for tackling the problem vs languages that treat | it as a first class option. | titzer wrote: | > but those numbers mean you'd want to avoid doing | anything at all in a gamedev render thread that could | potentially trigger a GC pause, because said GC pause | will trigger a vsync miss. | | You might want to take a look at this: | | https://queue.acm.org/detail.cfm?id=2977741 | MaulingMonkey wrote: | I have, it's a decent read - although somewhat | incoherent. E.g. they tout the benefits of GCing when | idle, then trash the idea of controlling GC: | | > Sin two: explicit garbage-collection invocation. | JavaScript does not have a Java-style System.gc() API, | but some developers would like to have that. Their | motivation is proactively to invoke garbage collection | during a non-time-critical phase in order to avoid it | later when timing is critical. [...] | | So, no explicitly GCing when a _game knows_ it 's idle. | Gah. The worst part is these are entirely fine points... | and somewhat coherent in the context of webapps and | webpages. But then when one attempts to embed v8 - as one | does - and suddenly you the developer are the one that | might be attempting to time GCs correctly. At least then | you have access to the appropriate native APIs: | | * https://v8docs.nodesource.com/node-7.10/d5/dda/classv8_ | 1_1_i... * https://v8docs.nodesource.com/node-7.10/d5/dda | /classv8_1_1_i... | | A project I worked on had a few points where it had to | explicitly call GC multiple times back to back. | Intertwined references from C++ -> Squirrel[1] -> C++ -> | Squirrel meant the first GC would finalize some C++ | objects, which would unroot some Squirrel objects, which | would allow some more C++ objects fo be finalized - but | only one layer at a time per GC pass. | | Without the multiple explicit GC calls between unrooting | one level and loading the next, the game had a tendency | to "randomly"[2] ~double it's typical memory budget | (thanks to uncollected dead objects and the corresponding | textures they were keeping alive), crashing OOM in the | process - the kind of thing that would fail console | certification processes and ruin marketing plans. | | [1]: http://squirrel-lang.org/ | | [2]: quite sensitive to the timing of "natural" | allocation-triggered GCs, and what objects might've | created what reference cyles etc. | titzer wrote: | > So, no explicitly GCing when a game knows it's idle. | | I mean, that is literally what the idle time scheduler in | Chrome does. It has a system-wide view of idleness, which | includes all phases of rendering and whatever else | concurrent work is going on. | | > Intertwined references from C++ -> Squirrel[1] -> C++ | -> Squirrel meant the first GC would finalize some C++ | objects, which would unroot some Squirrel objects, which | would allow some more C++ objects fo be finalized - but | only one layer at a time per GC pass. | | This is a nasty problem and it happens a lot interfacing | two heaps, one GC'd and one not. The solution isn't less | GC, it's more. That's why Chrome has GC of C++ (Oilpan) | and is working towards a unified heap (this may already | be done). You put the blame on the wrong component here. | throwaway894345 wrote: | I don't think you know what "no true scotsman" means--I'm | not asserting that Go's GC is the "true GC" but that it | is one permutation of "GC" and it defies the conventional | criticisms levied at GC. As such, it's inadequate to | refute GC in general on the basis of long pauses and lots | of garbage, you must refute each GC (or at least each | type/class of GC) individually. Also, you can see how | cherry-picking pathological, worst-case examples doesn't | inform us about the normative case, right? | [deleted] | MaulingMonkey wrote: | >> And it's bullshit. | | > _cherry picks worst-case examples and represents them | as normative_ | | Neither of my examples are anywhere near worst-case. All | texture data bypassed the GC entirely, for example, | contributing to neither live object count nor GC | pressure. I'm taking numbers from a modern GC with value | types that _you yourself_ should be fine and pointed out, | hey, it 's actually pretty not OK for anything that might | touch the render loop in modern game development, even if | it's not being used as the primary language GC. | MaulingMonkey wrote: | > I don't think you know what "no true scotsman" means-- | I'm not asserting that Go's GC is the "true GC" | | At no point in invoking | https://en.wikipedia.org/wiki/No_true_Scotsman does one | bother to define what a true scotsman is, only what it is | not by way of handwaving away any example of problems | with a category by implying the category excludes them. | It's exactly what you've done when you state "People who | argue against GC are almost always arguing against" some | ancient, nonmodern, unoptimized GC. | | Modern GCs have perf issues in some categories too. | | > As such, it's inadequate to refute GC in general on the | basis of long pauses and lots of garbage, you must refute | each GC (or at least each type/class of GC) individually. | | I do not intend to refute the value of GCs in general. I | will happily use GCs _in some cases_. | | I intend to refute your overbroad generalization of the | anti-GC camp, for which specific examples are sufficient. | | > Also, you can see how cherry-picking pathological, | worst-case examples doesn't inform us about the normative | case, right? | | My examples are neither pathological nor worst case. They | need not be normative - but for what it's worth, they | _do_ exemplify the normative case of my own experiences | in game development across multiple projects with | different teams at different studios, when language level | GCs were used for general purpouses, despite being | bypassed for bulk data. | | It's also exactly what titzer was complaining was missing | upthread: | | > I've seen this sentiment a lot, and I never see | specifics. "GC is bad for systems language" is an | unsupported, tribalist, firmly-held belief that is | unsupported by hard data. | Karrot_Kream wrote: | Right. In my experience just taking a few steps (like pre- | allocating buffers or arrays) decrease GC pressure enough | where GC runs don't actually affect performance enough to | matter (as long as you're looking at ~0.5-1 ms P99 response | times). But there's always the strident group who says GCs | are bad and never offer any circumstance where that could be | true. | titzer wrote: | Indeed. What really kills is extremely high allocation | rates and extremely high garbage production. I've seen | internal numbers from $megacorp that show that trashy C++ | programs (high allocation + deallocation rates) look pretty | much the same to CPUs as trashy Java programs, but are far | worse in terms of memory fragmentation. Trashy C++ programs | can end up spending 20+% of their execution time in | malloc/free. That's a fleetwide number I've seen on | clusters > 1M cores. | | I will admit that the programming _culture_ is different | for many GC 'd languages' communities, sometimes | encouraging a very trashy programming style, which | contributes to the perception that GC _itself_ is the | problem, but based on my experience in designing and | building GC 'd systems, I _don 't_ blame GC itself. | ninkendo wrote: | > I will admit that the programming culture is different | for many GC'd languages' communities, sometimes | encouraging a very trashy programming style, which | contributes to the perception that GC itself is the | problem | | For some languages (I'm looking at you, Java), there's | not much of a way to program that _doesn 't_ generate a | bunch of garbage, because only primitives are treated as | value types, and for Objects, heap allocations can only | be avoided if escape analysis can prove the object | doesn't outlast its stack frame (which isn't reliable in | practice.) (Edit: or maybe it doesn't happen _at all_. | Apparently escape analysis isn't used to put objects on | the stack even if they are known to not escape: | https://www.beyondjava.net/escape-analysis-java) | | I honestly can't imagine much of a way to program in Java | that doesn't result in tremendous GC pressure. You could | technically allocate big static buffers and use a | completely different paradigm where every function is | static and takes data at a defined offsets to said | buffers, but... nobody really does this and the result | wouldn't look anything like Java. | | Sometimes it's appropriate to blame the language. | _ph_ wrote: | It is indeed a huge problem of Java, that it often makes | it difficult to avoid generating garbage. However, one | still can reduce it a lot if trying hard. And be it by | reimplementing selected parts of the standard libraries. | | But the job of avoiding garbage is much easier in Go :) | titzer wrote: | > I'm looking at you, Java | | > Sometimes it's appropriate to blame the language. | | Oh, I know, I was just being vague to be diplomatic. Java | being generally trashy has been one of the major | motivators for me to do Virgil. In Java, you can't even | parse an integer without allocating memory. | osigurdson wrote: | GC is bad if the problem domain you are working on requires | you to think about the behaviour of the GC all the time. GC | behaviour can be subtle, may change from version to version | and some behaviours may not even be clearly documented. If | one has to think about it all the time, it is likely better | just to use a tool where memory management is more explicit. | However, I think for many examples of "systems software" | (Kubernetes for example), GC is not an issue at all but for | others it is an illogical first choice (though it often can | be made to work). | throwaway894345 wrote: | Even in performance-critical software, you're not thinking | about GC "all the time" but only in certain hot paths. | Also, value types and some allocation semantics (which Go | technically lacks, but the stack analyzer is intuitive and | easily profiled so it has the same effect as semantics) | make the cognitive GC burden much lower. | emtel wrote: | My argument against GC (and which applies similary to JIT- | basd runtimes) is that the problems caused by GC pauses have | non-local causes. If a piece of code ran slowly because of a | GC pause, the cause of the pause is in some sense _the entire | rest of the system_. You can't fix the problem with a | localized change. | | Programs in un-managed languages can be slow too, and | excessive use of malloc() is a frequent culprit. But the | difference is that if I have a piece of code that is slow | because it is calling malloc() too much, I can often (or at | least some of the time) just remove the malloc() calls from | that function. I don't have to boil the ocean and | significantly reduce the rate at which my entire program | allocates memory. | | I think another factor that gets ignored is how much you care | about tail latency. I think GC is usually fine for servers | and other situations where you are targeting a good P99 or | P99.9 latency number. And indeed, this is where JVM, Go, | node.js, and other GCed runtimes dominate. | | But, there are situations, like games, where a bad P99.9 | frame time means dropping a frame every 15 seconds (at | 60fps). If you've got one frame skip every 10 seconds because | of garbage collection pauses and you want to get to one frame | skip every minute, that is _not_ an easy problem to fix. | | (Yes, I am aware that many commercial game engines have | garbage collectors). | bcrosby95 wrote: | I don't want to try to bring up an exception that disproves | your rule, but what about something like BEAM, where it has | per-process (process = lightweight thread) heaps and GC. | emtel wrote: | I don't know anything about BEAM, but I don't think | single-threading of any form really addresses the | underlying problem. If you go to allocate something, and | the system decides a GC is necessary in order to satisfy | your allocation, then the GC has to run before your | allocation returns. | rbranson wrote: | You can't share objects across threads (called | "processes") in BEAM, so it's very different. The GC only | ever needs to pause one call stack at a time to do a full | GC cycle. Memory shared across processes is generally | manually managed, typically more akin to a database than | an object heap. | fauigerzigerk wrote: | _> I've seen this sentiment a lot, and I never see specifics. | "GC is bad for systems language" is an unsupported, | tribalist, firmly-held belief that is unsupported by hard | data._ | | I would argue it's not (very) hard data that we need in this | case. My opinion is that the resource usage of infrastructure | code should be as low as possible so that most resources are | available to run applications. | | The economic viability of application development is very | much determined by developer productivity. Many applications | aren't even run that often if you think of in-house business | software for instance. So application development is where we | have to spend our resource budget. | | Systems/infrastructure code on the other hand is subject to | very different economics. It runs all the time. The ratio of | development time to runtime is incredibly small. We should | optimise the heck out of infrastructure code to drive down | resource usage whenever possible. | | GC has significant memory and CPU overhead. I don't want to | spend double digit resource percentages on GC for software | that could be written differently without being uneconomical. | titzer wrote: | > Systems/infrastructure code on the other hand is subject | to very different economics. It runs all the time. The | ratio of development time to runtime is incredibly small. | We should optimise the heck out of infrastructure code to | drive down resource usage whenever possible. | | I will assume by "infrastructure code" you mean things like | kernels and network stacks. | | Unfortunately there are several intertwined issues here. | | First, we pay in completely different ways for writing this | software in manually-managed languages. Security | vulnerabilities. Bugs. Development time. Slow evolution. I | don't agree with the tradeoffs we have made. This software | is important and needs to be memory-safe. Maybe Rust will | deliver, who knows. But we currently have a lot of latent | memory management bugs here that have consistently clocked | in 2/3 to 3/4 of critical CVEs over several decades. That's | a real problem. We aren't getting this right. | | Second, infrastructure code does not consume a lot of | memory. Infrastructure code mostly manages memory and | buffers. The actual heap footprint of the Linux kernel is | pretty small; it mostly indexes and manages memory, | buffers, devices, packets, etc. _That_ is where | optimization should go; manage the most resources with the | lowest overhead just in terms of data structures. | | > GC has significant memory and CPU overhead. I don't want | to spend double digit resource percentages on GC for | software that could be written differently without being | uneconomical. | | Let's posit 20% CPU for all the things that a GC does. And | let's posit 2X for enough heap room to keep the GC running | concurrently well enough that it doesn't incur a lot of | mutator pauses. | | If all that infrastructure is taking 10% of CPU and 10% of | memory, we are talking adding 2% CPU and 10% memory. | | ABSOLUTE BARGAIN in my book! | | The funny thing is, that people made these same arguments | back in the heyday of Moore's law when we were getting 2X | CPU performance ever 18 months. 2% CPU back then was a | matter of weeks of Moore's law. Now? Maybe a couple of | months. We consistently choose to spend the performance | dividends of hardware performance improvements on...more | performance? And nothing on safety or programmability? I | seriously think we chose poorly here due to some | significant confusion in priorities and real costs. | fauigerzigerk wrote: | _> I will assume by "infrastructure code" you mean things | like kernels and network stacks._ | | That, and things like database systems, libraries that | are used in a lot of other software or language runtimes | for higher level languages. | | _> The actual heap footprint of the Linux kernel is | pretty small_ | | And what would that footprint be if the kernel was | written in Java or Go? What would the performance of all | those device drivers be? | | You can of course write memory efficient code in GC | languages by manually managing a bunch of buffers. But I | have seen and written quite a bit of that sort of code. | It's horribly unsafe and horribly unproductive to write. | It's far worse than any C++ code I have ever seen. It's | the only choice left when you have boxed yourself into a | corner with a language that is unsuitable for the task. | | _> First, we pay in completely different ways for | writing this software in manually-managed languages. | Security vulnerabilities. Bugs. Development time_ | | This is not a bad argument, but I think there has always | been a very wide range of safety features in non-GC | languages. C was never the only language choice. We had | the Pascal family of languages. We had ADA. We got | "modern" C++, and now we have Rust. | | If safety was ever good enough reason to use GC languages | for systems/infrastructure, that time is now over. | throwaway894345 wrote: | > You can of course write memory efficient code in GC | languages by manually managing a bunch of buffers. But I | have seen and written quite a bit of that sort of code. | It's horribly unsafe and horribly unproductive to write. | | Go uses buffers pretty idiomatically and they don't seem | unsafe or unproductive. Maybe I'm not following your | meaning? | | > If safety was ever good enough reason to use GC | languages for systems/infrastructure, that time is now | over. | | I don't know that I want GC for OS kernels and device | drivers and so on, but typically people arguing against | GC are assuming lots of garbage and long pause times; | however, Go demonstrates that we can have low-latency GC | and relatively easy control over how much garbage we | generate and where that garbage is generated. It's also | not hard to conceive of a language inspired by Go that is | more aggressively optimized (for example, fully | monomorphized generics, per this article or with a more | sophisticated garbage collector). | | I think the more compelling reason to avoid a GC for | kernel level code is that it implies that the lowest | level code depends on a fairly complex piece of software, | and that _feels wrong_ (but that 's also a weak criticism | and I could probably be convinced otherwise). | titzer wrote: | > It's also not hard to conceive of a language inspired | by Go that is more aggressively optimized (for example, | fully monomorphized generics, per this article or with a | more sophisticated garbage collector). | | Standard ML as implemented by MLton uses full | monomorphization and a host of advanced functional | optimizations. That code can be blazingly fast. MLton | does take a long time to compile, though. | | I've been working on a systems language that is GC'd and | also does full monomorphization--Virgil. I am fairly | miserly with memory allocations in my style and the | compiler manages to compile itself (50KLOC) at full | optimization in 300ms and ~200MB of memory, not | performing a single GC. My GC is really dumb and does a | Cheney-style semispace copy, so it has horrible pauses. | Even so, GC is invisible so the algorithm could be | swapped out at any time. | | For an OS kernel, I think a GC would have to be pretty | sophisticated (concurrent, highly-parallel, on-the-fly), | but I think this is a problem I would love to be working | on, rather than debugging the 19000th use-after-free bug. | | Go's GC is _very_ sophisticated, with very low pause | times. It trades memory for those low pause times and can | suffer from fragmentation because it doesn 't compact | memory. Concurrent copying is still a hard problem. | | Again, a problem I'd rather we had than the world melting | down because we chose performance rather than security. | fauigerzigerk wrote: | My argument is about the economics of software | development more than about any of the large number of | interesting technical details we could debate for a very | long time. | | There are higher level features that cause higher | resource consumption. GC is clearly one such feature. No | one denies that. So how do we decide where it makes more | sense to use these features and where does it make less | sense? | | What I'm saying is that we should let ourselves be guided | by the ratio development_time / running_time. The smaller | this ratio, the less sense it makes to use such "resource | hogging" features and the more sense it makes to use | every opportunity for optimisation. | | This is not only true for infrastructure/systems | software. This is just one case where that ratio is very | small. Another case would be application software that is | used by a very large number of people, such as web | browsers. | titzer wrote: | I understand your argument, it's been made for decades. | Put in a lot of effort to save those resources. But isn't | about _effort_. We put in a lot of effort _and still got | crap, even worse, security_. We put effort into the wrong | things! | | We've majorly screwed up our priorities. Correctness | should be so much higher up the priority list, probably | #1, TBH. When it is a high priority, we _should_ be | willing to sacrifice performance to actually get it. The | correct question is not _if_ we should sacrifice | performance, but _how much_. We didn 't even get that | right. | | But look, I know. Security doesn't sell systems, never | has--benchmarks do. The competitive benchmarking | marketplace is partly responsible. And there, there's | been so much FUD on the subject that I feel we've all | been hoodwinked and conned into putting performance at | the top to all of our detriment. That was just dumb on | our (collective) part. | | Let me put it another way. Go back to 1980. Suppose I | offered you two choices. Choice A, you get a 1000x | improvement in computer performance and memory capacity, | but your system software is a major pain in the ass to | write and full of security vulnerabilities, to the point | where the world suffers hundreds of billions of dollars | of lost GDP due to software vulnerabilities. Choice B, | you get an 800x improvement in computer performance, a | 500x improvement in memory capacity, and two thirds of | that GDP loss just doesn't happen. Also, writing that | software isn't nearly as much of a pain in the ass. | | Which did we choose? Yeah. That's where the disagreement | lies. | zimpenfish wrote: | > mandatory GC managed memory | | It can be a right old bugger - I've been tweaking gron's memory | usage down as a side project (e.g. 1M lines of my sample file | into original gron uses 6G maxrss/33G peak, new tweaked uses | 83M maxrss/80M peak) and there's a couple of pathological cases | where the code seems to spend more time GCing than parsing, | even with `runtime.GC()` forced every N lines. In C, I'd know | where my memory was and what it was doing but even with things | like pprof, I'm mostly in the dark with Go. | randomdata wrote: | _> I 'd argue that golang is inherently not a systems language_ | | First you'd have to establish what "systems" means. That, | you'll find, is all over the place. Some people see systems as | low level components like the kernel, others the userland that | allows the user to operate the computer (the set of Unix | utilizes, for example), you're suggesting databases and things | like that. | | The middle one, the small command line utilities that allow you | to perform focused functions, is a reasonably decent fit for | Go. This is one of the places it has really found a niche. | | What's certain is that the Go team comes from a very different | world to a lot of us. The definitions they use, across the | board, are not congruent with what you'll often find elsewhere. | Systems is one example that has drawn attention, but it doesn't | end there. For example, what Go calls casting is the opposite | of what some other languages call casting. | slrz wrote: | What is it that Go supposedly calls casting? The term (or its | variations) does not show up in the language specification. | | People sometimes use it for type conversions but that's in | line with usage elsewhere, no? | jjtheblunt wrote: | > with its mandatory GC managed memory | | is that factual, in the general case? | | it seems there exists a category of Go programs for which | escape analysis entirely obviates heap allocations, in which | case if there is any garbage collection it originates in the | statically linked runtime. | IgorPartola wrote: | Hold the phone. Where did the leap from "golang is not a | systems language" to "poor choice for anything performance or | memory sensitive" come from? | | That is a huge leap you are making there that I don't think is | exactly justified. | synergy20 wrote: | I agree. golang is not really a system programming. It's more | like java, a language for applications. | | It does have one niche, that it includes most if not everything | you need run a network-based service(or micro-service), e.g | http,even https, dns...are baked in. You no longer need to | install openssl on windows for example, in golang one binary | will include all of those(with CGO disabled too). | | I do system programming in c and c++, maybe rust later when I | have time to grasp that, there is no way for using Go there. | | For network related applications, Go thus far is my favorite, | nothing beat it, one binary has its all, can't be easier to | upgrade in the field too. | svnpenn wrote: | Typical gatekeeping. I like Go, because it lets me get stuff | done. You could say the same about JavaScript, but I think Go | is better because of the type system. C, C++ and Rust are | faster in many cases, but man are they awful to work with. | | C and C++ dont really have package management to speak of, its | basically "figure it out yourself". I tried Rust a couple of | times, but the Result/Option paradigm basically forces you into | this deeply nested code style that I hate. | dahfizz wrote: | > C and C++ dont really have package management to speak of | | I hear this complaint often, but I consider it a feature of | C. You end up with much less third party dependencies, and | the libraries you do end up using have been battle tested for | decades. I much prefer that to having to install hundreds of | packages just to check if a number is even, like in JS. | pjmlp wrote: | Ah, that is why all major OSes end up having some form of | POSIX support to keep those C applications going. | dahfizz wrote: | You need have some OS API, what is wrong with POSIX? | | And what does POSIX have to do with package management? | remexre wrote: | my big personal nit is poor async support; e.g. async | disk IO is recent in Linux, and AFAIK all the Unices | implement POSIX aio as a threadpool anyway. not being | able to wait on "either this mutex/semaphore has been | signaled, or this IO operation has completed" is also | occasionally very annoying... | pjmlp wrote: | POSIX is UNIX rebranded as C runtime for OSes that aren't | UNIX. | throwaway894345 wrote: | I think you're contradicting yourself. You end up with | fewer third party dependencies in C because C developers | end up rewriting from scratch what they would otherwise | import, and these rewrites have much _less_ battle-testing | than popular libraries in other languages. Moreover, they | also have more opportunity for failure since C is so much | less safe than other languages. Even in those few libraries | which have received "decades of battle-testing" we still | see critical vulnerabilities emerge. Lastly, you're | implying a dichotomy between C and JS in a thread about Go, | which doesn't have the same dependency sprawl as JS. | dcgudeman wrote: | Hmm yes, why stop there? Why have functions? Just | reimplement business logic all over your codebase. That way | each block of code has everything you need to know. Sure, | functions have been adopted by every other | language/ecosystem and are universally known to be useful | despite a few downsides but you could say the same about | package management and that hasn't deterred you yet. | dahfizz wrote: | You're arguing against a straw man. I could just as | easily say "why not make every line of code it's own | function?" | | C has libraries, and my comment made it clear that they | are useful. | | Argue against my actual point: | | By not having a bespoke package manager, and instead | relying on the system package manager, you end up with | higher quality dependencies and with dramatically less | bloat in C than other language ecosystems. It is all the | benefit and none of the drawbacks of npm-like ecosystems. | dcgudeman wrote: | I don't agree with your assessment that libraries in C | are higher quality. Additionally I have yet to see a | system package manager that enables developers to install | dependencies at a specific version solely for a project | without a lot of headaches. All the venv stuff in python | is necessary python dependencies are installed system | wide. The idea that the C/C++ ecosystem is better off | because it doesn't have its own package manager is a | bizarre idea at best. | steveklabnik wrote: | > You end up with much less third party dependencies, | | https://wiki.alopex.li/LetsBeRealAboutDependencies | averagedev wrote: | I've found Go to be much simpler than Rust, especially syntax | wise. However, in Rust you can use the ? operator which | propagates errors. In Go you have to check err != nil. | svnpenn wrote: | > Rust you can use the ? operator | | That doesnt work with all types: | | https://stackoverflow.com/a/65085003 | monocasa wrote: | Only automatically printing something when returning it | from main doesn't work with all types with the ? | operator. And frankly 'handling errors by auto print and | exit' is a bit of a code smell anyway, it's not much | better than just .unwrap() on everything in main. | jdmnd wrote: | It works for any type that implements the | `std::error::Error` trait; which is something you can | easily implement for your own types. If you want your | errors to be integers for some reason, you can wrap that | type in a zero-sized "newtype" wrapper, and implement | `Error` for that. | | The Stack Overflow answer you linked seems to be claiming | that it's simply easier to return strings, but I wouldn't | say this is a restriction imposed by the language. | svnpenn wrote: | > easily implement for your own types | | have you ever actually done that? I have, its not easy. | Please dont try to hand wave away negatives of the Rust | type system. | mcronce wrote: | I do it frequently. It is indeed easy | TheDong wrote: | > have you ever actually done that? I have, its not easy. | | Yes. I do it frequently. "#[derive(Error, Debug)]": | https://github.com/dtolnay/thiserror#example | | Much easier than implementing the error interface in go. | | Rust is powerful enough to allow macros to remove | annoying boiler-plate, and so most people using rust will | grab one of the error-handling crates that are de-facto | standard and remove the minor pain you're talking about. | | In go, it's not really possible to do this because the | language doesn't provide such macros (i.e. the old third- | party github.com/pkg/errors wanted you to implement | 'Cause', but couldn't provide sugar like 'this-error' | does for it because go is simply less powerful). | | I've found implementing errors in go to be much more | error-prone and painful than in rust, and that's not to | mention every function returning untyped errors, meaning | I have no clue what callers should check for and handle | new errors I add. | svnpenn wrote: | > Much easier than implementing the error interface in | go. | | is this a joke? You have to import a third party package, | just to implement an error interface? Here is Go example, | no imports: type errorString string | func (e errorString) Error() string { return | string(e) } | TheDong wrote: | It was not a joke. | | Let's look at a common example: you want to return two | different types of errors and have the caller distinguish | between them. Let me show it to you in rust and go. | | Rust: #[derive(Error, Debug)] | pub enum MyErrors { #[error("NotFound: {0}") | NotFound(String), #[error("Internal error")] | Internal(#[source] anyhow::Error), } | | The equivalent go would be something like: | type NotFoundErr struct { msg string | } func (err NotFoundErr) Error() string { | return "NotFound: " + err.msg } func | (err NotFoundErr) Is(target error) bool { if | target == nil { return false | } // All NotFoundErrs are considered the | same, regardless of msg _, ok := | target.(NotFoundErr) return ok } | type InternalErr struct { wrapped error | } func (err InternalErr) Error() string | { return fmt.Sprintf("Internal error: %s", | err.wrapped) } func (err | InternalErr) Unwrap() error { return | err.wrapped } | svnpenn wrote: | I dont think you realize how ridiculous this comment is. | Youre comparing 10 lines of Go, with 200 of Rust: | | https://github.com/dtolnay/thiserror/blob/master/src/lib. | rs | pornel wrote: | Nobody's saying you can't use Go or must use C/C++/Rust. If | Go works for you, that's great. | | The issue is about positioning of Go as a language. It's | confusing due to being (formerly) marketed as a "systems | programming language" that is typically a domain of | C/C++/Rust, but technically Go fits closer to capabilities of | Java or TypeScript. | pjmlp wrote: | Is writing a compiler, linker, kernel emulation layer, | TCP/IP stack or a GPU debugger, systems programming? | ohYi55 wrote: | git clone? | | I mean do we need bespoke package management tooling for | everything now? | | Seems like an outdated systems admin meme that violates KISS, | explodes dependency chains, risks security, etc. IT feels | infected by sunk cost fallacy. | | It's electron state in machines. The less altogether the | better. | mcronce wrote: | What, specifically, do you mean when you say Rust is "awful | to work with"? With C and C++ I agree, but I've had a | _drastically_ better development experience in Rust than Go. | svnpenn wrote: | You should probably read the rest of the comment... | mcronce wrote: | Are Result and Option really the only thing? Because | nesting scopes based on Err/None is rarely the right | choice, just like nesting a scope based on `if err == | nil` isn't typically something you want to do in Go, or | `if errno == 0` in C - You can panic | trivially with `.unwrap()` - You can propagate the | condition up trivially with `?` - `?` doesn't | work with all types, but it does work with Option, and it | does work with the vast majority of error types - making | your custom error work with it is very easy (if you're | whipping up an application or prototype and want your | error handling very simple, `anyhow::Error` works great | here) - You can convert None to an Err condition | trivially with `.ok_or()?` - In cases where it | makes sense, you can trivially use a default value with | `.unwrap_or_default()` | | And all of these use require a _lot_ less code than `if | err != nil { return nil, err }` | | And all of these allow you to use the Ok/Some value | directly in a function call, or in a method chain, while | still enabling the compiler to force you to handle the | Err/None case | | The common theme here being "trivial" :) Result/Option | are a big piece of that better developer experience. | reikonomusha wrote: | I have a slightly contrary opinion. Systems software is a very | large umbrella, and much under that umbrella is not encumbered | by a garbage collector whatsoever. (To add insult to injury, | the term's definition isn't even broadly agreed upon, similar | to the definition of a "high-level language".) Yes, there are | some systems applications where a GC can be a hindrance in | practice, but these days I'm not even sure it's a majority of | systems software. | | I think what's more important for the systems programmer is (1) | the ability to inspect the low-level behavior of functions, | like through their disassembly; (2) be reasonably confident how | code will compile; and (3) have some dials and levers to | control aspects of compiled code and memory usage. All of these | things can and are present, not only in some garbage collected | languages, but also garbage-collected languages with a dynamic | type system! | | Yes, there are environments so spartan and so precision- | oriented that even a language's built-in allocator cannot be | used (e.g., malloc), in which case using a GC'd language is | going to be an unwinnable fight for control. But if you only | need to do precision management of a memory that isn't | pervasive in all of your allocation patterns, then using a | language like C feels like throwing the baby out with the bath | water. It's very rarely "all or nothing" in a modern, garbage- | collected language. | no_circuit wrote: | The article is from a database company, so I'll assume that | approximates the scope. My scope for the GC discussion would | include other parts that could be considered similar | software: cluster-control plane (Kubernetes), other | databases, and possibly the first level of API services to | implement a service like an internal users/profiles or auth | endpoints. | | The tricky thing is GC works most of the time, but if you are | working at scale you really can't predict user behavior, and | so all of those GC-tuning parameters that were set six months | ago no longer work properly. A good portion of production | outages are likely related to cascading failures due to too | long GC pauses, and a good portion of developer time is spent | testing and tuning GC parameters. It is easier to remove | and/or just not allow GC languages at these levels in the | first place. | | On the other hand IMO GC-languages at the frontend level are | OK since you'd just need to scale horizontally. | initplus wrote: | It's impossible to spend any time tuning Go's GC parameters | as they intentionally do not provide any. | | Go's GC is optimized for latency, it doesn't see the same | kind of 1% peak latency issues you get in languages with a | long tail of high latency pauses. | | Also consider API design - Java API (both in standard & | third party libs) tend to be on the verbose side and build | complex structures out of many nested objects. Most Go | applications will have less nesting depth so it's | inherently an easier GC problem. | | System designs that rely on allocating a huge amount of | memory to a single process exist in a weird space - big | enough that perf is really important, but small enough that | single-process is still a viable design. Building massive | monoliths that allocate hundreds of Gb's at peak load just | doesn't seem "in vogue" anymore. | | If you are building a distributed system keeping any | individual processes peak allocation to a reasonable size | is almost automatic. | erik_seaberg wrote: | You tune Go's GC by rewriting your code. It's like | turning a knob but slower and riskier. | coder543 wrote: | You tune GC in Go by profiling allocations, CPU, and | memory usage. Profiling shows you where the problems are, | and Go has some surprisingly nice profiling tools built | in. | | Unlike turning a knob, which has wide reaching and | unpredictable effects that may cause problems to just | move around from one part of your application to another, | you can address the actual problems with near-surgical | precision in Go. You can even add tests to the code to | ensure that you're meeting the expected number of | allocations along a certain code path if you need to | guarantee against regressions... but the GC is so rarely | the problem in Go compared to Java, it's just not | something to worry about 99% of the time. | | If knobs had a "fix the problem" setting, they would | already be set to that value. Instead, every value is a | trade off, and since you have hundreds of knobs, you're | playing an _impossible_ optimization game with hundreds | of parameters to try to find the set of parameter values | that make your _entire_ application perform the way you | want it to. You might as well have a meta-tuner that just | randomly turns the knobs to collect data on all the | possible combinations of settings... and just hope that | your next code change doesn 't throw all that hard work | out the window. Go gives you the tools to tune different | parts of your code to behave in ways that are optimal for | them. | | It's worth pointing out that languages like Rust and C++ | also require you to tune allocations and deallocations... | this is not strictly a GC problem. In those languages, | like in Go, you have to address the actual problems | instead of spinning knobs and hoping the problem goes | away. | | The one time I have actually run up against Go's GC when | writing code that was trying to push the absolute limits | of what could be done on a fleet of rather resource | constrained cloud instances, I wished I was writing Rust | for this particular problem... I definitely wasn't | wishing I could be spinning Java's GC knobs. But, I was | still able to optimize things to work in Go the way I | needed them to even in that case, even if the level of | control isn't as granular as Rust would have provided. | exdsq wrote: | I think I toggled with the GC for less than a week in my | eight years experience including some systems stuff - maybe | this is true at FANG scale but not for me! | coder543 wrote: | Go doesn't offer a bunch of GC tuning parameters. Really | only one parameter, so your concerns about complex GC | tuning here seem targeted at some other language like Java. | | This is a drawback in some cases, since one size never | truly fits all, but it dramatically simplifies things for | most applications, and the Go GC has been tuned for many | years to work well in most places where Go is commonly | used. The developers of Go continue to fix shortcomings | that are identified. | | Go's GC prioritizes very short STWs and predictable | latency, instead of total GC throughput, and Go makes GC | throughput more manageable by stack allocating as much as | it can to reduce GC pressure. | | Generally speaking, Go is also known for using very little | memory compared to Java. | no_circuit wrote: | Yes, my comments were targeted to Java and Scala. Java | has paid the bills for me for many years. I'd use Java | for just about anything except for high load | infrastructure systems. And if you're in, or want to be | in, that situation, then why risk finding out two years | later that a GC-enabled app is suboptimal? | | I'd guess you'd have no choice if in order to hire | developers, you had to choose a language that the people | found fun to use. | astrange wrote: | Is go's GC not copying/generational? I think "stack | allocation" doesn't really make sense in a generational | GC, as everything sort of gets stack allocated. Of | course, compile-time lifetime hints might still be useful | somehow. | coder543 wrote: | > Is go's GC not copying/generational? | | Nope, Go does not use a copying or generational GC. Go | uses a concurrent mark and sweep GC. | | Even then, generational GCs are not as cheap as stack | allocation. | socialdemocrat wrote: | Java _needs_ lots of GC tuning parameters because you | have practically no way of tuning the way your memory is | used and organized in Java code. In Go you can actually | do that. You can decide how data structures are nested, | you can take pointers to the inside of a a block of | memory. You could make e.g. a secondary allocator, | allocating objects from a contiguous block of memory. | | Java doesn't allow those things, and thus it must instead | give you lots of levers to pull on to tune the GC. | | It is just a different strategy of achieving the same | thing: | | https://itnext.io/go-does-not-need-a-java-style-gc- | ac99b8d26... | apalmer wrote: | > A good portion of production outages are likely related | to cascading failures due to too long GC pauses, and a good | portion of developer time is spent testing and tuning GC | parameters. | | Can't really accept that without some kind of quantitative | evidence. | no_circuit wrote: | No worries. It is not meant to be quantitative. For a few | years of my career that has been my experience. For this | type of software, if I'm making the decision on what | technology to use, it won't be any GC-based language. I'd | rather not rely on promises that GC works great, or is | very tunable. | | One could argue that I could just tune my services from | time to time. But I'd just reduce the surface area for | problems by not relying upon it at all -- both a | technical and a business decision. | EdwardDiego wrote: | > A good portion of production outages are likely related | to cascading failures due to too long GC pauses, and a good | portion of developer time is spent testing and tuning GC | parameters | | After 14 years in JVM dev in areas where latency and | reliability are business critical, I disagree. | | Yes, excessive GC stop the world pauses can cause latency | spikes, and excessive GC time is bad, and yes, when a new | GC algorithm is released that you think might offer | improvements, you test it thoroughly to determine if it's | better or worse for your workload. | | But a "good portion" of outages and developer time? | | Nope. Most outages occur for the same old boring reasons - | someone smashed the DB with an update that hits a | pathological case and deadlocks processes using the same | table, a DC caught fire, someone committed code with a very | bad logical bug, someone considered a guru heard that gRPC | was cool and used it without adequate code review and | didn't understand that gRPC's load balancing defaults to | pick first, etc. etc. | | The outages caused by GC were very very few. | | Outages caused by screw-ups or lack of understanding of | subtleties of a piece of tech, as common as they are in | every other field of development. | | Then there's the question of what outages GCed languages | _don't_ suffer. | | I've never had to debug corrupted memory, or how a use | after free bug let people exfiltrate data. | throwaway894345 wrote: | > The tricky thing is GC works most of the time, but if you | are working at scale you really can't predict user | behavior, and so all of those GC-tuning parameters that | were set six months ago no longer work properly. A good | portion of production outages are likely related to | cascading failures due to too long GC pauses, and a good | portion of developer time is spent testing and tuning GC | parameters. It is easier to remove and/or just not allow GC | languages at these levels in the first place. | | Getting rid of the GC doesn't absolve you of the problem, | it just means that rather than tuning GC parameters, you've | encoded usage assumptions in thousands of places scattered | throughout your code base. | kubb wrote: | Reading the title I'm worried, should I keep using reflection | instead? | jerf wrote: | If the information in this article is make-or-break for your | program, you probably shouldn't have chosen Go. | | In the grand space of all programming languages, Go is fast. In | the space of compiled programming languages, it's on the slower | end. If you're in a "counting CPU ops" situation it's not a | good choice. | | There is an intermediate space in which one is optimizing a | particular tight loop, certainly, I've been there, and this can | be nice to know. But if it's beyond "nice to know", you have a | problem. | | I don't know what you're doing with reflection but the odds are | that it's wildly slower than anything in that article though, | because of how it works. Reflection is basically like a | dynamically-typed programming language runtime you can use as a | library in Go, and does the same thing dynamically-typed | languages (modulo JIT) do on their insides, which is | essentially deal with everything through an extra layer of | indirection. Not just a function call here or there... | _everything_. Reading a field. Writing a field. Calling a | function, etc. Everywhere you have runtime dynamic behavior, | the need to check for a lot of things to be true, and | everything operating through extra layers of pointers and table | structs. Where the article is complaining about an extra CPU | instruction here and an extra pointer indirection there, you | 've signed up for extra function calls and pointer indirections | by the dozens. If you can convert reflection to generics it | will almost certainly be a big win. | | (But if you cared about performance you were probably also | better off with an interface that didn't fully express what you | meant and some extra type switches.) | shadowgovt wrote: | This is good high-level advice as well as low-level advice. | | Go is positioned to be most useful as an alternative to Java, | and to C++ where performance isn't the key factor (i.e. | projects where C++ would be chosen because "Enh, it's a big | desktop application and C++ is familiar to a lot of | developers," not because the project actually calls for being | able to break out into assembly language easily or where | fine-tuning performance is more important than tool-provided | platform portability). | azth wrote: | In practice, it's used as an alternative to python and ruby | and nodejs. It can't fully do what Java or C# do. | geodel wrote: | I mean of coure. I have not seen IBM Websphere Server | 6.0.1 written in Go. Neither is there a full fledge | SOAP/WSDL engine in Go. So clearly Go is less capable. | coder543 wrote: | Well, that is simply not true at all. | | Go is a perfectly capable replacement for Java and C#. | Many huge projects that would likely never be written in | Python have been written in Go when they would have | otherwise been written in Java or C# in years past: | Kubernetes, Prometheus, HashiCorp Vault and Terraform, | etcd, CoreDNS, TiDB, Loki, InfluxDB, NATS, Docker, Caddy, | Gitea, Drone CI, Faktory, etc. The list goes on and on. | | What, exactly, are you saying that Go can't do that Java | can? | | Go is _not_ a perfectly capable replacement for Rust, for | example, because Rust offers extremely low level control | over all resource usage, making it much easier to use for | situations where you need every last ounce of | performance, but neither C# nor Java offer the | capabilities Rust offers either. | | I like C# just fine (Java... not so much), but your | comment makes no sense. Certainly, I would rather use Go | than most scripting languages; having static types and | great performance makes a lot of tasks easier. But that | doesn't mean Go is somehow less capable than Java or | C#... it is a great alternative to both. If someone needs | more than Go can provide, they're going to rewrite in | Rust, C++, or C, not Java or C#. | marwatk wrote: | > What, exactly, are you saying that Go can't do that | Java can? | | Runtime library addition (plugins) and dependency | injection are two big ones. (We can argue the merit | separately, but they're not possible in Go) | | I think if Java had easily distributable static binaries | k8s would have stayed Java (it started out as Java). | jerf wrote: | Plugins are barely possible and utterly impractical, so | no objection there. | | DI is totally possible, just about every system I build | is nothing but dependency injection. What confuses people | in the Java world is that you don't need a framework for | it, you just _do it_. You could say the language simply | supports a simple version of it natively. | | If you want something much more complicated like the Java | way, there are some libraries that do it, but few people | find them worthwhile. They are a lot of drama for what is | in Go not that much additional functionality. | | This is one of the many places the interfaces not | requiring declaration of conformance fundamentally | changes Go vs. Java and leaves me still preferring Go | even if Java picks up every other thing from Go. You | don't need a big dependency injection framework; you just | declare yourself an interface that matches what you use | out of some 3rd-party library, then pass the value from | the 3rd-party library in to your code naturally. | Dependency injected. All other things you may want to do | with that dependency, like swap in a testing | implementation, you just do with Go code. | | (And I personally think that if Java's interfaces were | satisfied like Go's, there would never have been a Go.) | coder543 wrote: | > Runtime library addition (plugins) | | https://pkg.go.dev/plugin | | Linux only, but it exists and it works... I just wouldn't | recommend that particular pattern for almost anything. | | Either some kind of RPC or compile-time plugins would be | better for almost all cases. | | - With RPC plugins (using whatever kind of RPC that you | prefer), you get the benefit of process isolation in case | that plugin crashes, the plugin can be written in any | language, and other programs can easily reuse those | plugins if they desire. The Language Server Protocol is a | great example of an RPC plugin system, and it has had a | huge impact throughout the developer world. | | - With compile-time plugins, you get even better | performance due to the ahead-of-time optimizations that | are possible. Go programs compile so quickly that it's | not a big deal to compile variants with different plugins | included... this is what Caddy did early on, and this | plugin architecture still works out well for them last I | checked. | | > dependency injection | | https://go.dev/blog/wire | | Java-style DI isn't very idiomatic for Go, and it's just | a pattern (the absence of which would not prevent | applications from being developed, the purpose of this | discussion)... but there are several options for doing DI | in Go, including this one from Google. | randomdata wrote: | _> Runtime library addition (plugins)_ | | I don't see anything inherit to Go that would prevent it. | gc even added rudimentary support some time ago, | fostering the addition of the plugin package[1], but | those doing the work ultimately determined that it wasn't | useful enough to dedicate further effort towards | improving it. | | There was a proposal to remove it, but it turns out that | some people are using runtime library addition, and so it | remains. | | [1] https://pkg.go.dev/plugin | shadowgovt wrote: | I believe Go supports DI via Wire | (https://github.com/google/wire). | lalaithion wrote: | Very few people are actually answering your question, so I'll | answer it: Generics are slower than concrete types, and are | slower than simple interfaces. However, the article does not | bother to compare generics with reflection, and my intuition | says that generics will be faster than reflection. | pdpi wrote: | Definitely not. In the general case, you will make things | simpler and faster by turning reflection-based code into | generic code. | | What this article says is that a function that is generic on an | interface introduces a tiny bit of reflection (as little as is | necessary to figure out if a type conforms to an interface and | get an itab out of it), and that tiny bit of reflection is | quite expensive. This means two things. | | One, if you're not in a position where you're worried about | what does or does not get devirtualized and inlined, this isn't | a problem for you. If you're using reflection at all, this | definitely doesn't apply to you. | | Two, reflection is crazy expensive, and the whole point of the | article is that the introduction of that tiny bit of reflection | can make function calls literally twice as slow. If you are in | a position where you care about the performance of function | calls, you're never really going to improve upon the situation | by piling on even more reflection. | LanceH wrote: | If you're worried you should benchmark the differences on your | requirements. | AYBABTME wrote: | Use generics if it makes your dev experience better. Profile if | it's slow. Optimize the slow bits. | morelisp wrote: | If you're using reflection or storing a bare interface{}, you | should probably instead try using generics. | | If you're using real interfaces, you should keep using | interfaces. | | If you care about performance, you should not try to write | Java-Streams / FP-like code in a language with no JIT and a | non-generational non-compacting GC. | linkdd wrote: | Premature optimization is a bad thing. | | Just implement naively, then if you have performance issues | identify the bottleneck. | morelisp wrote: | Ignorance of how your language works is a bad thing. | | Knowing where performance issues with certain techniques | might arise is not premature optimization. Implement with an | appropriate level of care, including performance concerns. | Not every kind of poor performance appears as a clear spike | in a call graph, and even fewer can be fixed without changing | any external API. | linkdd wrote: | > Ignorance of how your language works is a bad thing. | | And I never said anything remotely close to contradict this | statement. | | > Knowing where performance issues with certain techniques | might arise is not premature optimization. | | It is: - Python: should I use a for loop, a | list comprehension or the map function? - C++: should | I use a std::list, std::vector, ...? - Go: should I | use interface{} or generics? | | The difference between those options is subtle and | completely unrelated to the problem you want to solve. | | > Implement with an appropriate level of care, including | performance concerns. Step 1: solve your | problem naively, aka: make it work Step 2: add tests, | separate business logic from implementation details, aka: | make it right Step 3: profile / benchmark to see | where the chokepoints are and optimize them, aka: make it | fast | | Chances are that if you have deeply nested loops, generics | vs interface{} will be the last of your problems. | | To take the C++ example again, until you have implemented | your algorithm, you don't know what kind of operations (and | how often) you will do with your container. So you can't | know whether std::list or std::vector fits best. | | In Go, until you have implemented your algorithm, you don't | know how often you will have to use generics / reflection, | so you can't know what will be the true impact on your | code. | | The "I know X is almost always faster so i'll use it | instead of Y" will bite you more often than you can count. | | > Not every kind of poor performance appears as a clear | spike in a call graph | | CPU usage, memory consumption, idling/waiting times, etc... | Those are the kind of metrics you care about when | benchmarking your code. No one said you only look at spike | in a call graph. | | But still, to look for such information, you need to have | at least a first implementation of your problem's solution. | Doing this before is a waste of time and energy because 80% | of the time, your assumptions are wrong. | | > and even fewer can be fixed without changing any external | API. | | This is why you "make it work" and "make it right" before | you "make it fast". | | This way you have a clear separation between your API and | your implementation details. | morelisp wrote: | You're giving fine advice for well-scoped tasks with | minimal design space (well, sort of - using std::list | _ever_ is laughable - but if you had said unordered_map | vs. map, sure, so I take the broad point). But, some of | us have been around the block a few times though, and now | need to make sure those spaces are delineated for others | in a way that won 't force them into a performance | corner. | | > until you have implemented your algorithm, you don't | know what kind of operations (and how often) you will do | with your container.. until you have implemented your | algorithm, you don't know how often you will have to use | generics / reflection, so you can't know what will be the | true impact on your code. | | I don't mean to brag, but I guess I'm a lot better at | planning ahead than you. I don't usually have the whole | program written in my head before I start, but I also | can't remember any time I had to reach for a hammer as | big as reflect and didn't expect to very early on, and | most of the time I know what I intend to do to my data! | | > This is why you "make it work" and "make it right" | before you "make it fast"... This way you have a clear | separation between your API and your implementation | details. | | This is not possible. APIs force performance constraints. | Maybe wait until your API works before micro-optimizing | it, but also maybe think about how many pointers you're | going to have to chase and methods your users will need | to implement in the first place because you probably | don't get to "optimize" those later without breaking the | API. You write about "the bottleneck", but there's not | always a single bottleneck distinct from "the API". | Sometimes there's a program that's slow because there's a | singular part that takes 10 seconds and could take 1 | second. But sometimes it's slow because every different | bit of it is taking 2ns where it could take 1ns. | | Consider the basic read-some-bytes API in Go vs. Python | (translated into Go, so the difference is obvious): | type GoReader interface { Read([]byte) (int, error) } | type PyReader interface { Read(int) ([]byte, error) } | | You're never going to make an API like PyReader anywhere | near as fast as GoReader, no matter how much optimization | you do! | linkdd wrote: | > using std::list ever is laughable | | https://baptiste-wicht.com/posts/2012/11/cpp-benchmark- | vecto... | | > some of us have been around the block a few times | though, and now need to make sure those spaces are | delineated for others in a way that won't force them into | a performance corner. | | This, just like the rest of your comment, is just | patronizing and condescendant. | | > I don't mean to brag, but I guess I'm a lot better at | planning ahead than you | | See previous point... | | > I also can't remember any time I had to reach for a | hammer as big as reflect and didn't expect to very early | on | | This is not what I said at all. Let's say you know early | on, before any code is written, you will need reflection. | Can you tell me how many calls to the reflection API will | happen before-hand? Is it `n`? `n _log(n)`? `n2`? Will | you use reflection at every corner, or just on the | boundaries of your task? Once implemented, could it be | refactored in a simpler way? You don 't know until you | wrote the code. | | > most of the time I know what I intend to do to my data | | "what" is the spec, "how" is the code, and there is | multiple answers to the "how", until you write them and | benchmark them, you can't know for sure which one is the | best, you can only have assumptions/hypothesis. Unless | you're doing constantly exactly the same thing. | | > but also maybe think about how many pointers you're | going to have to chase and methods your users will need | to implement in the first place because you probably | don't get to "optimize" those later without breaking the | API. | | Basically, "write the spec before jumping into code". | Which is the basis of "make it work, make it right, make | it fast" because if you don't even know what problem | you're solving, there is no way you can do anything | relevant. | | > You write about "the bottleneck", but there's not | always a single bottleneck distinct from "the API". | | I never implied there is a single bottleneck. But If you | separate the implementation details from the High-Level | API, they sure are distinct. For example, you can solve | the N+1 problem in a GraphQL API without changing its | schema. | | If your implementation details leaks to your API, it just | means it's poorly separated. | | > You're never going to make an API like PyReader | anywhere near as fast as GoReader, no matter how much | optimization you do! | | Because Python is interpreted and Go is compiled. Under | the hood, the OS uses the `int read(int fd, void _dest, | size_t count)`, and there is an upper limit to the | `count` parameter (specific to the OS/kernel). | | Python's IO API knows this and allocates a buffer only | once under the hood, it would be equivalent to having a | PyReader implementation using a GoReader interface + | preallocated []byte slice. | | I can't tell you which one is faster without a benchmark | because the difference is so subtle, so I won't. | 8note wrote: | The language is a tool for a job. | | If I'm using low torque, I don't need to know the yield | strength of my wrench | slackfan wrote: | Meh. The people who screamed loudest about Generics missing in Go | aren't going to be using the language now that the language has | them, and are going to find something new to complain about. | | The language will suffer now with additional developmental and | other overhead. | | The world will continue turning. | trey-jones wrote: | This is a really long and informative article, but I would | propose a change to the title here, since "Generics can make your | Go code slower" seems like the expected outcome, where the | conclusion of the article leans more towards "Generics don't | always make your code slower", as well as enumerating some good | ways to use generics, as well as some anti-patterns. | [deleted] | SomeCallMeTim wrote: | In C++, generics (templates) are zero-cost abstractions. | | So no, generics do _not_ de facto make code slower. | BobbyJo wrote: | I think his point was that they definitely won't make it | faster (more abstraction means more indirection), so the | expectation from most (myself included) would be that using | them incurs a performance penalty, maybe not directly via | their implementation, but via their use in broader terms. | SomeCallMeTim wrote: | Using templates in C++ _can_ make code faster, though. | Because you can write the same routine with more | abstraction and _less_ indirection. | | I've used C++ templates effectively as a code generator to | layer multiple levels of abstractions into completely | customized code throughout the abstraction. | chakkepolja wrote: | We don't know of a way to implement generic types without | (vtable dispatch + boxing) cost AND without monomorphization | cost. Some languages do former, some latter, some combination | of 2. | | Monomorphization: * code bloat * slow compiles * debug builds | may be slow (esp c++) | | Dynamic dispatch & boxing (Usually both are needed): * not | zero cost | | Pick your poison | SomeCallMeTim wrote: | "Zero-cost" in that context refers to runtime performance. | It _always_ refers to runtime performance. | | And code bloat, as I've said elsewhere, is vastly overblown | as a problem. Another commenter pointed out that link-time | optimization removes most of the bloat. The rest is | customized code that's optimized per-instantiation. | | Slow compiles _are_ an issue with C++ templates. They 're | literally a Turing-complete code-generation language of | their own, and they can perform complex calculations at | compile time, so yes, they tend to make compiles take | longer when you're using them extensively. But the point I | was making was about runtime performance. That's why C++ | compilers often perform incremental compilation, which can | limit the development time cost. | | Debug builds can simply be slow in C++ with or without | templates. C++ templates really don't affect debug build | runtime performance in any material fashion; writing the | code out customized for each given type should have | identical performance to the template-generated version of | the code, unless there's some obscure corner case I'm not | considering. | monocasa wrote: | There are no true zero cost abstractions under all | situations. In the general case they make things faster, but | I've personally made C++ code faster by un templating code to | relieve I$ pressure, and also allow the compiler to make | smarter optimizations when it has less code to deal with. The | optimizer passes practically have a finite window they can | look at because of the complexity class of a lot of optimizer | algorithms. | josefx wrote: | C++ can suffer from negative performance from template bloat | in two ways: | | Templated symbol names are gigantic. This can impact program | link and load times significantly in addition to the inflated | binary size. | | Duplication of identical code for every type, for example the | methods of std::vector<int> and std::vector<unsigned int> | should compile to the same instructions. There are linker | flags that allow some deduplication but those have their own | drawbacks, another trick is to actively use void pointers for | code parts that do not need to know the type, allowing them | to be reused behind a type safe template based API. | fbkr wrote: | > There are linker flags that allow some deduplication but | those have their own drawbacks | | As long as you use --icf=safe I don't see any drawback, and | most of the time it results in almost identical reductions | to --icf=all since not many real programs compare addresses | of functions. | josefx wrote: | I think that requires separate function sections, which | themselves may cause bloat and data duplication. | gmfawcett wrote: | That's only 99% of the story. :) Having too many | specializations of a C++ template can lead to code bloat, | which can degrade cache locality, which can degrade | performance. | pjmlp wrote: | Depends if LTO is used. | mcronce wrote: | You're definitely right. While it's not a particularly | common problem, it does exist; one thing I'd really like to | see enter the compiler world is an optimization step to use | vtable dispatch (or something akin to Rust's enum_dispatch, | since all concrete types should be knowable at compile | time) in these cases. | | I expect it would require a fair amount of tuning to become | useful, but could be based on something analogous to the | function inliner's cost model, along with number of calls | per type. Could possibly be most useful as a PGO type step, | where real-world call frequency with each concrete type is | considered. | nu11ptr wrote: | enum dispatch in Rust is one of my favorite tricks. Most | of the time you have a limited number of implementations, | and enum dispatch is often more performant and even less | limiting (than say trait objects) | mcronce wrote: | I'm a huge fan. It's very little work to use, as long as | all variants can be known to the author, and as long as | you aren't in a situation where uncommon variants | drastically inflate the size of your common variants, | it's a performance win, often a big one, compared to a | boxed trait object. | | Even when you have to box a variant to avoid inflating | the size of the whole enum, that's still an improvement | over a `dyn Trait` - it involves half as much pointer | chasing | | It'd be cool to see this added as a compiler optimization | - even for cases where the author of an interface can't | possibly know all variants (e.g. you have a `pub fn` that | accepts a `&dyn MyTrait`), the compiler can | SomeCallMeTim wrote: | In my experience, code bloat from templates is overblown. | | Inlining happens with or without template classes. | gmfawcett wrote: | That's fair. I guess if you need the functionality in | your program, you need the functionality: the codegen | approach doesn't matter that much. And like pjmlp said, | LTO can make a difference too. Thanks for your thoughts, | these kinds of exchanges make me smarter. :) | asvitkine wrote: | Zero cost from runtime performance, but you pay binary size | for it. It's a trade off between the two... | addcninblue wrote: | Is it the expected outcome? I was under the initial impression | that the author also noted: | | > Overall, this may have been a bit of a disappointment to | those who expected to use Generics as a powerful option to | optimize Go code, as it is done in other systems languages. | | where the implementation would smartly inline code and have | performance no worse than doing so manually. I quite | appreciated the call to attention that there's a nonobvious | embedded footgun. | | (As a side note, this design choice is quite interesting, and I | appreciate the author diving into their breakdown and thoughts | on it!) | Ensorceled wrote: | Interestingly the original title and your proposed title imply, | to me, the opposite of what I think they imply to you. This | suggestion is really unclear. | cube2222 wrote: | Great article, just skimmed it, but will definitely dive deeper | into it. I thought Go is doing full monomorphization. | | As another datapoint I can add that I tried to replace the | interface{}-based btree that I use as the main workhorse for | grouping in OctoSQL[0] with a generic one, and got around 5% of a | speedup out of it in terms of records per second. | | That said, compiling with Go 1.18 vs Go 1.17 got me a 10-15% | speedup by itself. | | [0]:https://github.com/cube2222/octosql | morelisp wrote: | > That said, compiling with Go 1.18 vs Go 1.17 got me a 10-15% | speedup by itself. | | Where did you see this speedup? Other than `GOAMD64` there | wasn't much in the release notes about compiler or stdlib | performance improvements so I didn't rush to get 1.18-compiled | binaries deployed, but maybe I should... | | (I do expect some nice speedups from using Cut and | AvailableBuffer in a few places, but not without some | rewrites.) | cube2222 wrote: | I've experienced that speedup on an ARM MacBook Pro. I've | just checked on Linux AMD64 and there's no performance | difference there. | hencq wrote: | It's probably because of the new register passing calling | convention. From https://tip.golang.org/doc/go1.18 | | > Go 1.17 implemented a new way of passing function | arguments and results using registers instead of the stack | on 64-bit x86 architecture on selected operating systems. | Go 1.18 expands the supported platforms to include 64-bit | ARM (GOARCH=arm64), big- and little-endian 64-bit PowerPC | (GOARCH=ppc64, ppc64le), as well as 64-bit x86 architecture | (GOARCH=amd64) on all operating systems. On 64-bit ARM and | 64-bit PowerPC systems, benchmarking shows typical | performance improvements of 10% or more. | ptomato wrote: | Yeah, 1.17 got register (instead of stack) calling | convention on amd64; 1.18 expanded that to arm64, which | should be responsible for most of that performance | improvement. | coder543 wrote: | GOAMD64 _could_ be significant, so I 'm not sure why your | comment seems to dismiss it? | | Also, as the article mentions, Go 1.18 can now inline | functions that contain a "range" for loop, which previously | was not allowed, and this would contribute performance | improvements for some programs by itself. The new register- | based calling convention was extended to ARM64, so if you're | running Go on something like Graviton2 or an Apple Silicon | laptop, you could expect to see a measurable improvement from | that too. (edit: the person you replied to confirmed they're | using Apple Silicon, so definitely a major factor.) | | The Go team is always working on performance improvements, so | I'm sure there are others that made it into the release | without being mentioned in the release notes. | jatone wrote: | what I expect to happen now that golang has generics and reports | like these will show up is golang will explore monomorphizing | generics and get hard numbers. they may also choose to use some | of the compilation speeds they've gained from linker | optimizations and spend that on generics. | | I can't imagine monomorphizing being that big of a deal during | compilation if the generation is defered and results are cached. | whimsicalism wrote: | I am unfamiliar with Go. This article discusses that they have | decided to go for runtime lookup. Is there any reason why that | implementation might make monomorphizing more difficult? | jatone wrote: | nope. it was an intentional trade off with respect to | compilation speed. once generics have baked for a bit with | real world usage said decision will almost certainly be | revisited. | | edit: for example one could envision the compiler generates | the top n specializations per generic function based on usage | and then uses the current stuff non-specialized version for | the rest. | jimmaswell wrote: | > you create an exciting universe of optimizations that are | essentially impossible when using boxed types | | Couldn't JIT do this? | fulafel wrote: | > Inlining code is great. Monomorphization is a total win for | systems programming languages: it is, essentially, the only form | of polymorphism that has zero runtime overhead | | Blowing your icache can result in slowdowns. In many cases it's | worth having smaller code even if it's a bit slower when | microbenchmarked cache-hot, to avoid evicting other frequently | used code from the cache in the real system. | masklinn wrote: | The essay is missing a "usually", but it's true that | monomorphisation is a gain in the vast majority of situations | because of the data locality and optimisation opportunities | offered by all the calls being static. Though obviously that | assumes a pretty heavy optimisation pipeline (so languages like | C++ or Rust benefit a lot more than a language with a lighter | AOT optimisation pipeline like Java). | | Much as with JITs (though probably with higher thresholds), | issues occur for megamorphic callsites (when a generic function | has a ton of instances), but that should be possible to dump | for visibility, and there are common and pretty easy solutions | for at least some cases e.g. trampolining through a small | generic function (which will almost certainly be inlined) to | one that's already monomorphic is pretty common when the | generic bits are mostly a few conversions at the head of the | function (this sort of trampolining is common in Rust, where | "conversion" generics are often used for convenience purposes | so e.g. a function will take an `T: AsRef<str>` so the caller | doesn't have to extract an `&str` themselves). | ki_ wrote: | Generics are a generic solution, but they are absolutely | necessary in my opinion. | sedatk wrote: | This is a great article yet with an unnecessarily sensationalist | headline. Generics can be improved in performance over time, but | a superstition like "generics are slow" (not the exact headline, | but what it implies to reader) can remain stuck in our heads | forever. I can see developers stick to the dogma of "never use | generics if you want fast code", and resorting to terrible | duplication, and more bugs. | zellyn wrote: | (off-topic) Anyone else using Firefox know why the text starts | out light gray and then flashes to unreadably dark gray after the | page loads? (The header logo and text change from gray to blue | too) | wtetzner wrote: | I'm using Firefox and don't see that issue. Maybe some kind of | plugin you have installed? | brundolf wrote: | This is super interesting and well-written. Also, wow, that | generated-assembly-viewer widget is slick. | nunez wrote: | yeah the formatting on this article was insanely good for a | technical blog post. Good job, Planetscale marketing! | socialdemocrat wrote: | Really well written article. I liked that the author tried to | keep a simple language around a fair amount of complex topics. | | Although the article paints the Go solution for generics somewhat | negative, it actually made me more positive to the Go solution. | | I don't want generic code to be pushed everywhere in Go. I like | Go to stay simple and it seems the choices the Go authors have | made will discourage overuse of Generics. With interfaces you | already avoid code duplication so why push generics? It is just a | complication. | | Now you can keep generics to the areas were Go didn't use to work | so great. | | Personally I quite like that Go is trying to find a niche | somewhere between languages such as Python and C/C++. You get | better performance than Python, but they are not seeking zero- | overhead at any cost like C++ which dramatically increases | complexity. | | Given the huge amount of projects implemented with Java, C#, | Python, Node etc there must be more than enough cases where Go | has perfectly good performance. In the more extreme cases I | suspect C++ and Rust are the better options. | | Or if you do number crunching and more scientific stuff then | Julia will actually outperform Go, despite being dynamically | typed. Julia is a bit opposite of Go. Julia has generics | (parameterized types) for performance rather than type safety. | | In Julia you can create functions taking interface types and | still get inlining and max performance. Just throwing it out | there are many people seem to think that to achieve max | performance you always need a complex statically typed language | like C++/D/Rust. No you don't. There are also very high speed | dynamic languages (well only Julia I guess at the moment. | Possibly LuaJIT and Terra). | AtNightWeCode wrote: | Is there any large project that done an in-place replacement to | use generics that has been benchmarked? I doubt that the change | is even measurable in general. | ctvo wrote: | Bravo on the informative content and presentation. That component | that shows the assembly next to syntax highlighted code? _chefs | kiss_ | maxekman wrote: | Similar to how the GC has become faster and faster with each | version, we can expect the generics implementation to be too. I | wouldn't pay much attention to conclusions about performance from | the initial release of the feature. The Go team is quiet open | with their approach. | JaggerFoo wrote: | throwaway894345 wrote: | The article is like 9k words and it only mentions Rust twice in | passing. | zachruss92 wrote: | For me Go has replaced Node as my preferred backend language. The | reason is because of the power of static binaries, the confidence | that the code I write today can still run ten years from now, and | the performance. | | The difference in the code I'm working with is being able to | handle 250 req/s in node versus 50,000 req/s in Go without me | doing any performance optimizations. | | From my understanding Go was written with developer ergonomics | first and performance is a lower priority. Generics undoubtedly | make it a lot easier to write and maintain complex code. That may | come at a performance cost but for the work I do even if it cuts | the req/s in half I can always throw more servers at the problem. | | Now if I was writing a database or something where performance is | paramount I can understand where this can be a concern, it just | isn't for me. | | I'd be very curious what orgs like CockroachDB and even K8s think | about generics at the scale they're using them. | RedShift1 wrote: | 250 vs. 50000 req/s seems like a too big of a difference to me. | Sure Go is faster than Node but Node is no slough either, you | might want to dig in some deeper why you only got 250 req/s | with Node. | akvadrako wrote: | That could mostly be due to multithreading. That comes free | with go but requires a different model in node. | KwisaksHaderach wrote: | Doesn't sound unrealistic if you have a mix load of IO and | raw processing. | zelphirkalt wrote: | Go was created with simplicity of feature set in mind, which | does not translate into developer ergonomics automatically. It | rather offers a least common denominator of lang features, so | that most devs can handle it, who previously only handled other | languages like Java and similar. This way Google aimed at | attracting those devs. They'd not have to learn much to make | the switch. | | True developer ergonomics, as far as a programming language | itself goes, stems from language features, which make goals | easy to accomplish in little amount of code, in a readable way, | using well crafted concepts of the language. Having to go to | lengths, because your lang does not support programming | language features (like generics in Go for a long time) is not | developer ergonomics. | | There is the aspect tooling for a language of course, but that | has not necessarily to do with programmming language design. | Same goes for standard library. | vorpalhex wrote: | > The difference in the code I'm working with is being able to | handle 250 req/s in node versus 50,000 req/s in Go without me | doing any performance optimizations. | | Your node code should be in the 2k reqs/s range trivially, with | many frameworks comfortable offering 5k+. | | It is never going to be as fast as go, but it will handle most | cases. | throwaway894345 wrote: | How can you make these claims without information about what | his application handles requests? Not everything is a trivial | database read/write op. | hu3 wrote: | Related. The introduction of Generics in Go revived an issue | about the ergonomics of typesafe Context in a Go HTTP framework | called Gin: https://github.com/gin-gonic/gin/issues/1123 | | If anyone can contribute, please do. | kevwil wrote: | Seems obvious; like, did someone expect all the extra abstraction | would make Go faster? | throwaway894345 wrote: | The article articulates why it's reasonable to expect that | generics _would_ make Go faster. From TFA: | | > Monomorphization is a total win for systems programming | languages: it is, essentially, the only form of polymorphism | that has zero runtime overhead, and often it has _negative_ | performance overhead. It makes generic code _faster_. | __s wrote: | What extra abstraction? | | I'd expect without monomorphization the code should perform the | same as interface{} code, perhaps minus type cast error | handling overhead. That's the model where generics are passing | interface{} underneath, & exist only as a type check _(a la | Java type erasure)_ | anonymoushn wrote: | Yes? We used code generators to monomorphize our code in like | 2015 and it was faster than using interfaces. Generics could | reasonably produce the same code we did in 2015, but they | don't. | [deleted] | morelisp wrote: | > there's no incentive to convert a pure function that takes an | interface to use Generics in 1.18. | | Good. I saw a lot of people suggesting in late 2021 that you | could use generics as some kind of `#pragma force- | devirtualization`, and that would be awful if it became common. | mcronce wrote: | Why would that be awful? | ramesh31 wrote: | Well sure. Not writing hand tuned assembly can make your code | slower, too. Go's value as a language is how it fills the niche | between Rust and Python, giving you low level things like manual | memory control, while still making tradeoffs for performance and | developer experience. | mrweasel wrote: | I might have worded it differently, but yeah, of cause generics | can make your code slower, what did people expect. | zellyn wrote: | I don't know about you, but when I imagine what compilers do | with generic code, I typically imagine monomorphization, | which (aside from increasing cache pressure a little), should | generally not make things slower, but rather introduce | possibilities for inlining that could make it faster. | mrweasel wrote: | Apparently I scrolled right past that bit of the article. | I'm a little unsure how it's suppose to make the code | faster, but maybe because I compare it wrong. The | alternative to generics is writing all the different | function by hand, in my mind at least. I don't fully | understand how generics are suppose to be made faster than | a custom function for that datatype. | wtetzner wrote: | I think the reasoning is that for something that's | commonly used for many different types, you won't go | through the effort of re-implementing that function for | each type (it may not even be feasible to do so). Which | means you'll end up with some sort of indirection to make | it generic. | masklinn wrote: | > I don't fully understand how generics are suppose to be | made faster than a custom function for that datatype. | | The point is that a monomorphized generic function should | not be _slower_ than the custom-function-per-datatype, | but because Go 's generics are not fully monomorphized | they can be, and in fact can be slower than the a | function-for-interface. | mcronce wrote: | The point is they shouldn't be _slower_ than a manually- | copied implementation for that concrete type. They also | should be _faster_ than vtable dynamic dispatch in the | vast majority of cases. (I also fail to see a compelling | reason that they couldn 't have been implemented by | passing the fat pointer directly, making the codegen the | same as passing an interface, instead of having that | business with the extra layer of indirection.) | | If there are specialization opportunities when hand- | implementing the function for a given concrete type, I | would indeed expect that to be faster than a | monomorphized generic function. | tsimionescu wrote: | It depends what they are replacing. Typically, generics used | to replace runtime polymorphism (using [T any] []T instead of | []any) would be a speed boost in C#, C++, or Rust; and would | have no impact on speed in Java. | morelisp wrote: | And it is also a speed boost in Go, assuming you don't call | any methods. (Which, if you were really using [T any], you | either weren't or you were dissembling about your | acceptable types.) | whimsicalism wrote: | > I might have worded it differently, but yeah, of cause | generics can make your code slower, what did people expect. | | ? In most languages, it is compile-time overhead, not | runtime. | masklinn wrote: | I wouldn't say "most", it's very variable. Also not unlike | Go I think C# uses a hybrid model, where all reference | types get the same instances of the generic function. | wtetzner wrote: | From the article: | | > Monomorphization is a total win for systems programming | languages: it is, essentially, the only form of polymorphism | that has zero runtime overhead, and often it has negative | performance overhead. It makes generic code faster. | | The point is that the way Go implements generics is in such a | way that it can make your code slower, even though there is a | well-known way that will not make your code slower (at the | cost of compile times). | ramesh31 wrote: | >even though there is a well-known way that will not make | your code slower (at the cost of compile times). | | That's the point though. The Golang team was surely aware | of both approaches, and chose what they did as a conscious | design decision to prefer faster compile times. People love | Go because of the iteration speed compared to C++. And | these little things start to add up if you don't have a | clear product vision about what your language is meant for. | marssaxman wrote: | I would have expected generics to make the compiler take | longer, not the compiled program. | pphysch wrote: | My first use of Go generics has been for a concurrent "ECS" game | engine. In this case, the gains are pretty obvious. I think. | | I get to write one set of generic methods and data structures | that operate over arbitrary "Component" structs, and I can | allocate all my components of a particular type contiguously on | the heap, then iterate over them with arbitrary, type-safe | functions. | | I can't fathom that doing this via a Component interface would be | even as close as fast, because it would destroy cache performance | by introducing a bunch of Interface tuples and pointer | dereferencing for every single instance. Not to mention the type- | unsafe code being yucky. Am I wrong? | | FWIW I was able to update 2,000,000 components per (1/60s) frame | per thread in a simple Game of Life prototype, which I am quite | happy with. But I never bothered to evaluate if Interfaces would | be as fast | nikki93 wrote: | Sweet! I've been using it for the same. Example game project | (did it for a game jam): https://github.com/nikki93/raylib-5k | -- in this case the Go gets transpiled to C++ and runs as | WebAssembly too. Readme includes a link to play the game in the | browser. game.gx.go and behaviors.gx.go kind of show the ECS | style. | | It's worked with 60fps performance on a naive N^2 collision | algo over about 4200 entities -- but also I tend to use a | broadphase for collisions in actual games (there's an "externs" | system to call to other C/C++ and I use Chipmunk's broadphase). | folago wrote: | Sounds interesting, is it available somewhere? | pphysch wrote: | Still want to hit some milestones before releasing anything, | so not quite | siftrics wrote: | Assuming your generic functions take _pointers_ to Components | as input, full monomorphization does not occur and you're | suffering a performance hit similar in magnitude, if not | strictly greater empirically, to interface "dereferences". | | On this basis, I don't believe your generic implementation is | as faster than an interface implementation as you claim. | pphysch wrote: | You're right, here's what one of my hot loops look like: | func (cc *ComponentContainer[T]) ForEach(f | func(*Component[T])) { for _, page := range | cc.pool.pages { for i := range page { if | page[i].IsActive() { f(&page[i]) } | } } } | | Still, the interface approach is a total nightmare from a | readability + runtime error perspective so I won't be going | back & will just hope for some performance freebies in 1.19 | or later :^) | dmullis wrote: | lokar wrote: | It seems like the code size vs speed trade-off would be well | managed by FDO. | komuW wrote: | Go does has some form of monomorphization implemented in Go1.18; | it is just behind a feature flag(compiler flags). | | Look at the assembly difference between this two examples: | | 1. https://godbolt.org/z/7r84jd7Ya (without monomorphization) | | 2. https://godbolt.org/z/5Ecr133dz (with monomorphization) | | If you don't want to use godbolt, run the command `go tool | compile '-d=unified=1' -p . -S main.go` | | I guess that the flag is not documented because the Go team has | not committed themselves to whichever implementation. | masklinn wrote: | FWIW you can have two different compilers (and outputs) for the | same input: https://godbolt.org/z/bb1oG9TbP in the compiler | pane just click "add new" and "clone compiler" (you can | actually drag that button to immediately open the pane in the | right layout instead of having your layout move to vertical | thirds and needing to move the pane afterwards). | | Learned that watching one of Matt's cppcon talks (A+, would do | again), as you can expect this is useful to compare different | versions of a compiler, or different compilers entirely, or | different optimisation settings. | | But wait, there's more! Using the top left Add dropdown, you | can get _a diff view between compilation outputs_ : | https://godbolt.org/z/s3WxhEsKE (I maximised it because a diff | view getting only a third of the layout is a bit narrow). | komuW wrote: | thanks! | klodolph wrote: | I'm excited about generics that gives you a tradeoff between | monomorphization and "everything is a pointer". The "everything | is a pointer" approach, like Haskell, is incredibly inefficient | wrt execution time and memory usage, the "monomorphize | everything" approach can explode your code size surprisingly | fast. | | I wouldn't be surprised if we get some control over | monomorphization down the line, but if Go started with the | monomorphization approach, it would be impossible to back out of | it because it would cause performance regressions. Starting with | the shape stenciling approach means that introducing | monomorphization later can give you a performance improvement. | | I'm not trying to predict whether we'll get monomorphization at | some future point in Go, but I'm just saying that at least the | door is open. | tines wrote: | Haskell does monomorphization as well. See | https://reasonablypolymorphic.com/blog/specialization/ | SomeCallMeTim wrote: | > "monomorphize everything" approach can explode your code size | surprisingly fast. | | It can in the naive implementation. Early C++ was famous for | code bloat and (apparently) hasn't shaken that outdated | impression. | | In practice, monomorphization of templates hasn't been a | serious issue in C++ for a long time. The compiler and linker | technologies have advanced significantly. | asdfasgasdgasdg wrote: | > Early C++ was famous for code bloat and (apparently) hasn't | shaken that outdated impression. | | It's not an outdated impression. C++ generics can and do | interact very poorly with inlining and other language | features to cause extremely large binary sizes, especially if | you do anything complex inside them. They also harm | compilation performance since each copy of the generic code | needs to be optimized. | | Generics in C++ are reasonably efficient when there is | relatively little code generated per generic, but when this | is not true, they can be a problem. | titzer wrote: | > The compiler and linker technologies have advanced | significantly. | | AFAICT the linker de-duplicates identical pieces of machine | code. You still can get multi-megabyte object files for every | source file. I used to work on V8. Almost every .o is 3+MB. | Times hundreds, plus bigger ones, it's more than a gigabyte | of object files for a single build. That's absurd. Not V8's | fault--stupid C++ compilation and linking model. | zozbot234 wrote: | Yes, they seem to have shipped a MVP first, which is a sensible | approach. Controlling the extent of monomorphization requires | changes in how the code is written, so if they had offered that | exclusively it would've been a pitfall to existing users. By | boxing everything, they keep their MVP closer to the previously | idiomatic interface{} pattern. | rowanG077 wrote: | Why is a speed part of the Go language contract but footprint | of the executable is not? I, for one, would be quite miffed if | an update of the Go compiler would mean an application would no | longer fit on my mcu. That is worse then the application | running slower. | gbear605 wrote: | Ideally it could be a compiler flag. Even more ideally, you | could tell your compiler what the max size you want is and | then it would optimize for the best speed at a given | executable footprint. | masklinn wrote: | > Why is a speed part of the Go language contract but | footprint of the executable is not? | | Because footprint of the executable has pretty literally | never been, Go has always had deficient DCE and generated | huge executables. | rowanG077 wrote: | It also generates pretty slow executables. That doesn't | invalidate the point. | qznc wrote: | A hybrid approach would be monomorphization for native types | like int and pointers for records. C# is doing that if I | remember correctly. | Ar-Curunir wrote: | IMO that's a bad trade-off for many performance-sensitive | applications, since it means that you can't rely on newtypes | and structs for correctness. | dse1982 wrote: | This is a very interesting article. I was however a bit confused | by the lingo, calling everything generics. As I understood it the | main point of the article quite precisely matched the distinction | between generics and templates as I learned it. Therefore what | surprised me most was the fact that go monomorphizes generic code | sometimes. Which however makes sense given the way go's module- | system works - i.e. imported modules are included in the | compilation - but doesn't fit my general understanding of | generics. ___________________________________________________________________ (page generated 2022-03-30 23:00 UTC)