[HN Gopher] Making a Go program faster with a one-character change ___________________________________________________________________ Making a Go program faster with a one-character change Author : hcm Score : 358 points Date : 2022-11-14 14:47 UTC (8 hours ago) (HTM) web link (hmarr.com) (TXT) w3m dump (hmarr.com) | sakras wrote: | A while ago at my company we switched from GCC to Clang, and | noticed a couple of massive regressions (on the order of 50%?) in | performance having to do with floating point. | | After profiling for a bit, I discovered that suddenly a lot of | time was spent in isinf on Clang and no time in GCC... Clang was | emitting a function call where GCC wasn't. I happened to randomly | change isinf to std::isinf (it's a random habit of mine to put | std:: in front of these C functions). Suddenly the regression | disappeared! I guess on Clang only std::isinf was a compiler | intrinsic while GCC recognized both? Anyway, that's my small- | change optimization story. | 10000truths wrote: | C defines isinf as a macro, whereas C++'s std::isinf is a | function. Perhaps the discrepancy has to do with differences in | how they're evaluated? | ludiludi wrote: | > If you read the title and thought "well, you were probably just | doing something silly beforehand", you're right! | | Don't feel too silly. Russ Cox, one of the technical leads on the | Go language, made the same mistake in the regexp package of the | standard library. | | https://go-review.googlesource.com/c/go/+/355789 | lxe wrote: | This is the kind of stuff that the compiler needs to really | understand. If all this de-referencing and referencing magic is | at the control of the user, it needs to have meaningful effect on | what the code does. Otherwise we might as well just write C. | silisili wrote: | The compiler does understand it and did what was asked - it was | just written rather poorly. | | There are valid use cases for wanting to take a copy, and then | pass along a pointer of the copy. Perhaps to go through a | series of modification methods that don't touch the original. | I'd sure hate it if the compiler tried to outsmart me on that | and changed the behavior away from what I'd written. | enedil wrote: | Went from 4.139s to 2.413s. I fail to see how it is 70%. I think | it is explained as 4.139/2.413 = 1.7 which of course doesn't make | sense here. | CodesInChaos wrote: | I do think saying it's 71% faster makes sense here, since "x% | faster" and "speed increased by x%" mean the same thing. This | reduces the runtime by 42%, but that doesn't mean it's just 42% | faster. | bmicraft wrote: | It would probably be more accurate to say it can do 70% more | stuff in the same time. Or that it takes 42% less runtime | xnorswap wrote: | But 70% more stuff in the same time is 70% faster. | OJFord wrote: | Let's say you do 10 things in 100 time (100/10 =10 time per | thing) to start with: | | 70% more stuff in the same time is 17 things in 100 time | (100/17 =5.9 time per thing); | | 70% faster is 10 things in 30 time (30/10 =3 time per | thing) - or, I would argue incorrectly, perhaps said to | mean 10 things in 70 time (70/10 =7 time per thing). | morelisp wrote: | This is an extremely common mistake in reporting performance | numbers. That the old version is 70% slower does not make the | new version 70% faster. | wizofaus wrote: | 70% slower is a bit ambiguous though - it could mean 70% | extra runtime or it could mean 30% of the new speed. Whereas | 70% faster would always suggest to me that it can do 70% more | work in the same amount of time, i.e. a 1.7x increase in | speed. | enedil wrote: | I do not agree. When you benchmark you usually measure the | differences between times needed to complete. This is | because it is highly non-obvious that if you increase | workload twice, the time increases also twice. Perhaps the | algorithm is not linear. Perhaps if you have more data, you | suddenly need to swap memory. Perhaps something (like disk | access in parallel) means that actually it takes less than | 2x time. This means that a concept of speed per unit of | work is undefined. So the only reasonable interpretation of | "70% faster" means "spends 30% time of original". | yongjik wrote: | But under your definition, if something becomes "three | times as fast" (i.e., 200% faster), it will have to | finish its task in negative time! | wizofaus wrote: | I can categorically state I've never thought of or | understand 70% faster as meaning that, and certainly not | 100% faster as meaning "completes instantly". | | I see the OP has solved the problem by removing any | references to how much faster from the article title! | | You're right about non-linear algorithms though. If an | O(n^2) algorithm is 2x / 100% faster, it can't process | 100% more items in the same time, but I'd understand it | to mean taking half the time for the same n. | barbegal wrote: | It's gone from 14.5 operations per minute to 24.9 operations | per minute so a 70% speedup. | hcm wrote: | Doh! Thanks for pointing out another silly mistake - I'll fix | that. | xnorswap wrote: | You had it right the first time, 1.7x speed _is_ 70% faster. | | If something previously took 4s now takes 2s then it's 100% | faster. | | Think of driving 10miles. If you drive at 20mph then it takes | 30 minutes. If you drive twice as fast, 40mph, it takes 15 | minutes. | | 40mph is 100% faster than 20mph. | | Half the time is twice as fast! | [deleted] | wizofaus wrote: | Glad it wasn't just me thinking this! | markoman wrote: | @hcm: Would have loved to see the 'after' flamegraph just for | comparison purposes! I'm still trying to get used to groking | flamegraphs when optimizing. They're a somewhat new tool, | IMO. | wizofaus wrote: | I disagree, I think the 70% is right, and matches what you | still describe as a 1.7x speed increase. If it originally | took 4 seconds and now takes 2, I'd call that a 100% speed | increase, i.e. twice as fast. | t3estabc wrote: | cbsmith wrote: | As an old C/C++ programmer, I'm always surprised by how often | software developers are surprised by the performance costs of | inopportune value semantics (C and C++ even more so, punishes you | severely for using value semantics when you shouldn't). I | increasingly see the wisdom of languages with implicit reference | semantics. | | It's not that value semantics can't be better (they most | assuredly can be), or that reference semantics don't cause their | own complexity problems, but rather that so often we | thoughtlessly imply/impose value semantics through interfaces in | ways that negatively impact performance; getting interfaces wrong | is a much tougher bell to unring. | | The vast majority of my mental energy when I define an interface | in C++ is carefully thinking through a combination of ownership | contracts and value vs. reference semantics that I can mostly | ignore in languages with implicit reference semantics. While | occasionally ignoring those contracts while developing in | Java/Python/whatever comes back to bite me, the problem isn't | nearly as common or problematic as when I unintentionally impose | value semantics in a language that allows me to. | dleslie wrote: | I would also lay a chunk of blame on the use of type inference | which causes the information about the behaviour to be hidden | from view. | | Had the author been looking at the type information within the | syntax of the code the profile output may not have been a | surprise. Perhaps the problem would never have existed in the | first place. | adrianmonk wrote: | Yeah. Often, writing out the type explicitly is just | busywork. But it seems it would have paid off here. | | If you were forced to stop and think what type to declare, I | bet you'd write "var rule *Rule". Even if you don't think | deeply and just look at the return type. | | And then if you assigned "r[i]" to "rule", you'd get a type | error. | bitexploder wrote: | Becoming good at Go is mostly knowing all the sharp edges with | the built in types IMO. Of course go routines and concurrency | primitives are difficult to master, but that is a different | beast and if you understand concurrency from some other | languages Go just makes that easy. But knowing all the | behaviors of slices, and really intuiting how they work makes | your life a lot easier and a lot less prone to bugs. And slice | behavior almost all comes down to what is a slice internally | and how and where am I copying things as I wrote my code. | Generally the copying is fine, but in some cases it is not or | it is in a tight performance critical section where you need to | be thinking about it. | | Esit: Also, pointers into slices will probably leave you sad. | You get a pointer to the slice storage, not a pointer to the | thing. And if the slice resizes your pointer mow references a | dead slice. Basically, pointers and slices are not friends. | Unless you have a slice of pointers, which idiomatic go avoids | unless there is a decent reason for it :) | [deleted] | tylerhou wrote: | The performance issue here is not value semantics, it's the | overhead of automatic lifetime management. The copy is cheap. | The lifetime tracking is not because it forces a heap | allocation and creates additional GC pressure. In fact, | assuming Rule is small, if Match returned by _value_ , the code | would be similarly as fast. | cbsmith wrote: | > The performance issue here is not value semantics | | There's no performance cost to value semantics, so of course | not. | | > The copy is cheap. The lifetime tracking is not because it | forces a heap allocation and creates additional GC pressure. | In fact, assuming Rule is small, if Match returned by value, | the code would be similarly as fast. | | I'm referring more to how this stuff seeps in without the | programmer realizing it. It's the implicit nature of all this | behaviour that is the problem. | tylerhou wrote: | Sorry, _because_ of value semantics. (Also, you wrote | "performance costs of inopportune value semantics". So the | correction is annoying; you know what I meant.) | | This stuff seeps in without the programmer realizing it | because Go made a deliberate design decision to have | automatic lifetime management. In other words, this is a | feature of the language and not a bug. The only way for | this to not seep in is if Go forced programmers to specify | most lifetimes, which would make the language much more | cumbersome to use. | | I.e., this is not a value-vs-reference semantics issue. | It's a manual-lifetime-management vs automatic-lifetime- | management issue. The solution is to either 1) write in a | language with more explicit lifetimes if performance is | that important, or 2) profile your code and reduce heap | allocations, which is what the person who wrote the article | did. | cbsmith wrote: | Sorry for being annoying. I wasn't trying to be. | | I would disagree about this stuff seeping in because of | automatic lifetime management. The equivalent code in | Java, Smalltalk, etc., would either not have the | performance problem or it would be much more obvious that | the code wasn't as efficient as it could be. | tylerhou wrote: | > Sorry for being annoying. I wasn't trying to be. | | Thanks, no worries. | | You're right that Java doesn't have this performance | problem because it (apart from primitives) has no | implicit copies. (So the code wouldn't copy, it would | return a reference, hence no allocation.) | | I think what you're trying to say is: programs written | languages with value semantics can implicitly create more | objects, which (in some cases) have additional | allocation/lifetime tracking in languages with automatic | lifetime management. So maybe one solution is to prevent | copies in most circumstances, and when you do need to | copy, make them explicit. | | But I think this conflicts with Go's philosophy as well. | First, in highly concurrent programs, copies are | necessary. A mutex/atomic will become a bottleneck; | sharing is bad for performance. This means that copies | should be part of most types in the language. (If they | aren't, you'll run into issues like not being able to | pickle when using multiprocessing in Python [1].) | | OK, so we need copies. But clearly, I shouldn't have to | write `a = b.Copy()` if b is an integer. That's too | verbose. And if everything has a `.Copy()` call, that | leads to visual noise and it's not clear when copies are | actually expensive. So what set of "blessed" types should | be implicitly copyable? Java picks primitives, but this | means it's basically impossible to create an int-like | object with value semantics. I think this is bad. C++ is | at the other extreme with "(almost) all objects are | implicitly copyable" but some people dislike how | forgetting a & on a std::vector type might lead to a huge | copy. | | Go has chosen a reasonable middle ground -- dynamically | sized types like arrays and maps are not implicitly | copyable, but "cheap" objects like structs are implicitly | copyable. This means that when you see a Copy call, it's | meaningful. But this means you run into these situations | when an implicit copy interacting with lifetimes/garbage | collection causes a slowdown. But I don't see any other | alternative. Getting rid of implicit copies for cheap | types will cause visual noise. Getting rid of GC makes | your language harder to use. Both of these alternatives | seem worse to me. | | To be clear, I'm not a huge fan of Go. Especially around | the lack of monadic error handling. But I think they have | done a decent job wrt. their design goals. | | [1] https://stackoverflow.com/questions/8804830/python- | multiproc... | [deleted] | Jemaclus wrote: | I'm a dumb dumb. Can you define "implicit reference semantics" | and "value semantics"? You use the phrases several times in | your post, but I don't really understand what you mean. If it | helps, I'm not a C++ programmer, but I am familiar with higher | level languages like Go, Python, Ruby, PHP and Javascript. | scubbo wrote: | Thank you for asking - I, too, was a dumb dumb. | gdwatson wrote: | "Implicit reference semantics" means that variables | ordinarily refer to objects rather than containing them. | "Value semantics" means that variables contain values rather | than references, though there are pointer types that let you | explicitly store references when you want them. (Often this | is discussed in terms of parameter passing, for historical | reasons, but the same ideas apply to variables more | generally.) | | If your language has implicit reference semantics, "x = y" | will cause x and y to refer to the same object. If it has | value semantics, x will be a copy of y. | Jemaclus wrote: | Thank you | masklinn wrote: | "Implicit reference semantics" = everything (or near enough) | is implicitly a reference. That's what you get in Python, | Ruby, or Javascript: when you pass something around, you're | really passing a reference to it, a pointer. Everyone ends up | referring to the same thing, and sharing it. | | "Value semantics": you pass the value itself around, which | means you're shallow-copying it all the time. That's what you | get when you pass or return a bare non-interface type in Go. | | PHP is a bit of a mess, objects are passed by reference (= | implicit reference semantics) but arrays are passed by value, | and you can opt them into pass-by-reference. You can also | pass non-arrays by reference, though I don't think that's | very common. | strifey wrote: | If you're familiar with Python and Go, you'll likely be able | to quickly spot the differences in how they handle parameter | passing. Python uses references and Go uses value. | erik_seaberg wrote: | Go maps and channels have reference semantics. Slices are | passed and returned by value, but backing arrays are often | shared (not copied) in an error-prone way, so it's safest | to pretend they have move semantics and treat only one copy | as valid. | | IMHO they should have used pointers everywhere they didn't | want value semantics with deep copies. | klodolph wrote: | Languages like Python, Java, and C# have implicit reference | semantics. When you create an object, you get a pointer to | that object (usually, or commonly). | | In languages like C, C++, Go, Rust... references are more | explicit. If you want a pointer to an object, you have to &, | or something similar. | | It gets a bit fuzzy. | Jemaclus wrote: | Thank you | TheRealPomax wrote: | It's the difference between assigning/passing around "copies | of the data" vs. assigning/passing around "the memory address | for that data" under the hood. | | PHP, for example, has explicit references. If you have an | `$arr1=array(1,2,3)` and an `$arr2 = $arr1`, that second | array is a full copy of the first array, and updating $arr1 | does nothing to $arr2. Similarly, `function | update_array($arr) { $arr[0] = 'cake'; }` called with `$arr1` | will create a _copy_ of $arr1 for use inside that function. | Any changes to `$arr` inside the function only apply to that | copy, not to $arr1. _Unless_ you explicitly tell PHP you want | to work with references, by using `function update_array( | &$arr) { ... }` instead. PHP uses value semantics. | | JS, on the other hand, uses implicit reference semantics. If | you have a `const arr1 = [1,2,3];` then `arr1` is an _alias_ | , simply pointing to a memory location, and declaring a | `const arr2 = arr1`, makes arr2 _also_ be an alias for that | same memory location. They both reference the same thing, but | neither "is" the thing. Similarly, if you have a `function | updateArray(arr) { arr[0] = 'cake'; }` then that `arr` is | _also_ just an alias pointing to the memory location of | whatever you passed in, and changes to `arr` change the | underlying data, so whether you try to access that through | `arr1`, `arr2` and `arr`, it 's the same thing. JS uses | implicit reference semantics. | | (But note that many languages with implicit reference | semantics make exceptions for immutable primitives like | numbers or strings) | datalopers wrote: | That's not technically correct with regards to PHP. Your | statement that any changes to $arr1 or $arr2 only impact | the one in question, however, is accurate. If no changes | are made they still refer to the same data in memory. It's | copy-on-write semantics. | | $arr1 = [1,2,3]; // $arr1 is a pointer to a zval array | [1,2,3] and refcount:1 | | $arr2 = $arr1; // $arr1 and $arr2 are pointers to the same | zval array but incremented refcount to 2. | | $arr2[] = 4; // at this point, $arr2 creates a new zval | array, copies over the data from the first, sets the | recount to 1, and appends int(4). The [1,2,3] array has its | refcount decremented to 1 as it's still referenced by | $arr1. | | [1] http://hengrui-li.blogspot.com/2011/08/php-copy-on- | write-how... | TheRealPomax wrote: | true, good point. | morelisp wrote: | This isn't semantics, though, but implementation. As far | as I'm aware there's no way to "observe" the copy short | of digging into runtime debugging tools, so I think it's | still fair to say the language has pass-by-value | semantics. In C++ we still refer to things returned-by- | value as returned by value despite the reality of NRVO, | etc. | johannes1234321 wrote: | Actually that is outdated since at least PHP 7. In modern | PHP the engine uses copy on write only for "larger" | things, but "small" things like integers or booleans live | on the stack and are copied around, avoiding heap | allocations etc. | Jemaclus wrote: | Thank you | unfunco wrote: | PHP has implicit references too, objects are passed by | reference by default (primitives by value, like your note) | marcosdumay wrote: | a = [1, 2] | | b = a | | b[0] = 3 | | print(a) | | What does the above print? If the language implements | reference semantics, it prints [3, 2]. If it implements value | semantics, it prints [1, 2]. | Jemaclus wrote: | Thank you | brundolf wrote: | Having used C++ relatively little, and a long time ago (and | never used Go), I don't think I ever realized that value | semantics could be used to implicitly clone _heap_ objects | | Yeesh | masklinn wrote: | Normally they're not, that's really specific to C++ nonsense. | | In a normal value-semantics system if you have a pointer you | just copy the pointer. Obviously if the langage doesn't have | your back it also means you've now fucked up your ownership, | but you're in C so that was to be expected. | brundolf wrote: | Yeah, this was my assumption when I read the original | comment, so I was confused about what it was saying (I | thought it was suggesting you should pass references | instead of doing a bitwise-copy most of the time, which is | just plausible enough that I believed it for a minute) | until I read further down in the thread and realized they | meant implicit heap-cloning | titzer wrote: | I think the main issue in C++ isn't value semantics, it's deep | copy semantics. E.g. in a functional language ADTs are | immutable and don't have identity. They can be freely copied, | or not, passed by reference or by value, but they are never | _deep copied_. Comparison may be deep, but not passing them. | | That is to say, I think I mostly am agreeing with you. In Java, | objects are always passed by reference, never by value, and | never implicitly copied. But Java doesn't have _any_ value | types other than primitives. When I added ADTs to Virgil, I | wanted to get the best of both worlds; objects are pass by | reference, like Java, and ADTs are immutable, identity-less, so | they are deep compared but never deep copied. (ADTs in Virgil | can also sometimes be flattened, so they don 't necessarily | result in heap allocation). | nyanpasu64 wrote: | I still don't see why value structs need to be immutable; | ints are mutable in all languages, and structs are mutable in | C, C++, and Rust (if you `let mut`) and it's a feature of the | language. | cbsmith wrote: | In a functional language, you don't have to worry about bits | of code mutating your data. ;-) On the flip side, there's a | lot of cognitive load that comes with functional languages, | so while they do address the problem neatly... | | I'd have to take a look at Virgil to appreciate your | approach, but I'm always leery of implicit value vs. | reference semantics tied to types (aside from the whole array | fiasco, easily the ugliest part of Java's type system). So | often the particular semantics you want are driven by the | context of what you're doing, rather than the what you're | doing it with. | thomascgalvin wrote: | > I increasingly see the wisdom of languages with implicit | reference semantics. | | I spend most of my time in a JVM language of one flavor or | another, and when I was learning Go, the first thing that stuck | out at me was, "why would I _ever_ want the compiler to | invisibly copy a data structure for me? " | | I suppose the primary reason is to prevent the callee from | modifying the caller's data out from under them; unless you | pass a reference value, you _know_ the callee cannot modify | your data. | | But, as someone who leans heavily into "everything should be as | immutable as possible," the second thing that stuck out at me | was "wait, a struct can't have const fields?" | | When I write code, it's common to have references to immutable | classes thrown around with wild abandon, heedless of ownership, | threads, or good taste, because the data just can't change. But | that's a paradigm that Go simply doesn't support. | cbsmith wrote: | > When I write code, it's common to have references to | immutable classes thrown around with wild abandon, heedless | of ownership, threads, or good taste, because the data just | can't change. | | If there's anything I wish languages with implicit reference | semantics would adopt, it's implicit immutability. I wish | Java would be so much nicer with keyword that is half way | between "final" and "volatile" that means, "yes, you can | actually mutate this" and then make final semantics the | default for fields & variables. | cogman10 wrote: | Agreed. | | Could you Imagine a Java where you have a `Map` and a | `MutableMap` and that's what you put at your API? I'd make | it SO much clearer how safe any individual API is to call. | spockz wrote: | Scala has had this for ages. You can have it today. Even | in Java either through the Google collection library or | through a library that mimics fp style programming. The | name eludes me for the moment. | gowld wrote: | Guava, but it's not fully typesafe. | | Mutable is not part of the Java meta-language used for | typing. | | ImmutableMap implements Map | | https://guava.dev/releases/23.0/api/docs/com/google/commo | n/c... | | and throws exceptions on mutation. | | https://guava.dev/releases/23.0/api/docs/src- | html/com/google... | foverzar wrote: | Kotlin has this | thomascgalvin wrote: | Kotlin has this, but the Map is (usually) a MutableMap | under the covers, because it's Java bytecode at the lower | levels. You have to go out of your way to footgun | yourself, but it's still possible. | morelisp wrote: | In general this doesn't work, the history rule says | mutable types are not proper subtypes of immutable ones | (and the converse is obvious). If you want to capture | mutability in your type system, it needs to be orthogonal | to subtyping (like C/C++ const). | cogman10 wrote: | > the history rule | | I'm unfamiliar with this rule (and not finding anything | good to google). Can you elaborate? | | I can't really think of a scenario where an immutable | datastructure isn't a subset of actions against a mutable | datastructure. | kapep wrote: | I had to look it up too, it apparently is a constraint | for subtypes defined in Liskovs substitution principle | [1]. From Wikipedia: | | > History constraint (the "history rule"). Objects are | regarded as being modifiable only through their methods | (encapsulation). Because subtypes may introduce methods | that are not present in the supertype, the introduction | of these methods may allow state changes in the subtype | that are not permissible in the supertype. The history | constraint prohibits this. It was the novel element | introduced by Liskov and Wing. A violation of this | constraint can be exemplified by defining a mutable point | as a subtype of an immutable point. This is a violation | of the history constraint, because in the history of the | immutable point, the state is always the same after | creation, so it cannot include the history of a mutable | point in general. Fields added to the subtype may however | be safely modified because they are not observable | through the supertype methods. Thus, one can define a | circle with immutable center and mutable radius as a | subtype of an immutable point without violating the | history constraint. | | [1]: https://en.wikipedia.org/wiki/Liskov_substitution_pr | inciple#... | gowld wrote: | [deleted] | a_t48 wrote: | C++ still has this problem - | std::unordered_map<std::string, std::string>` and | `std::unordered_map<std::string, const std::string>` are | basically unrelated types - you can't const-cast the | templated const away. (I may be misunderstanding here) | masklinn wrote: | > you can't const-cast the templated const away. | | That seems like a good thing. If you're handed a map to | const values you can't just go "imma gunna mutate them | anyway". | a_t48 wrote: | Yup, it's definitely a bit of a code smell if you do. The | issue is more the reverse, though - I can't make a | mutable map, then hand it by pointer/reference to | something that says it wants an immutable map later. | masklinn wrote: | Not necessarily a bad thing either, things can get odd if | you're not the only owner and it's mutated under you | unexpectedly. | [deleted] | a_t48 wrote: | That only matters if you're storing it. The big issue is | 99% of code out there uses mutable template types for | containers, and if you ever declare a container that | doesn't have a mutable template type, you stub your toe | as your new container isn't compatible with anyone else's | code. You can't even easily copy your way out. | std::unordered_map<std::string, const std::string> m; | m["foo"] = "bar"; std::unordered_map<std::string, | std::string> m2 = m; | | doesn't compile. | morelisp wrote: | Normally (when containers are not involved) this is | exactly the point of a const cast. | titzer wrote: | > When I write code, it's common to have references to | immutable classes thrown around with wild abandon, heedless | of ownership, threads, or good taste, because the data just | can't change. But that's a paradigm that Go simply doesn't | support. | | You might get a kick out of Virgil. It's easy (and terse!) to | define immutable classes and you can have immutable ADTs too. | (plus tuples, generics with separate typechecking, etc). | throwaway894345 wrote: | I also have a background in C/C++, etc and I've only ever found | myself missing value semantics when I use languages with | implicit reference semantics. I guess I always figured the | solution was "value semantics with better education / tooling". | Education: people should understand value semantics. Tooling: | imagine an IDE that highlights allocation points automatically | (or perhaps the problem is implicit allocations rather than | value semantics?). | cbsmith wrote: | > I've only ever found myself missing value semantics when I | use languages with implicit reference semantics. | | Oh, I miss it every time. ;-) | | I will say though that some newer languages seem to have a | confused idea about how to offer mixed semantics. A bunch of | them tie semantics to types. The ideal interface can vary by | usage context. It's hard enough getting the semantics right | as the callee (as opposed to caller), let alone when you're | defining a type that will be used by who knows how many | interfaces. | | > I guess I always figured the solution was "value semantics | with better education / tooling". | | I've always thought much the same, but I have slowly come to | appreciate that it's more than just education & tooling. Even | with good education & tooling, there's a cognitive load that | comes with getting interfaces right that for the general case | is just not worth it. | adgjlsfhk1 wrote: | I think this is half right. For anything 64 bits or | smaller, value semantics are pretty much always going to be | better. That said, being able to choose between value and | reference semantics for larger objects per object is a | pretty useful feature. | cbsmith wrote: | > For anything 64 bits or smaller, value semantics are | pretty much always going to be better. | | That's assuming a 64-bit CPU (which admittedly seems like | a reasonable assumption. The nice thing about the | abstraction though is that there's nothing preventing the | runtime from applying value semantics for those trivial | small-object cases where they're obviously more | efficient. | throwaway894345 wrote: | > I will say though that some newer languages seem to have | a confused idea about how to offer mixed semantics. A bunch | of them tie semantics to types. | | Curious about what you mean here. This sounds like C#'s | class/struct distinction to me. | cbsmith wrote: | That's exactly the example I was thinking of. | Serow225 wrote: | The JetBrains IDEs can do this, at least for .NET | ok123456 wrote: | > Tooling: imagine an IDE that highlights allocation points | automatically | | Rider does this already for C#. | TremendousJudge wrote: | >or perhaps the problem is implicit allocations rather than | value semantics | | To me, this sounds like this is it. _Explicit is better than | implicit_ is a very useful truism | cbsmith wrote: | The counter argument to the "explicit is better than | implicit" is that abstraction & encapsulation are such | significant force multipliers. If done properly, implicit | is good. It's just that in case of copying, doing it | "properly" is well nigh impossible. | kazinator wrote: | explicit implicit good * < * | v v v bad * > * | | Good implicit is better than good explicit. (If all is | good, go for implicit.) | | Bad explicit is better than bad implicit. (If all is bad, | go for explicit; don't hide bad explicit with bad | implicit.) | | Good explicit or implicit is better than bad explicit or | implicit. | codeflo wrote: | > perhaps the problem is implicit allocations rather than | value semantics? | | I think that's true. Expensive copies should never have been | implicit. There was a story some time ago about a single | keypress in the address bar of Chrome causing thousands of | memory allocations. The culprit: lots of std::string | arguments up and down the call stack. | | Rust gets this right, with the hindsight of C++'s example: "a | = b" is a move operation by default and clone() is always | explicit, except for plain data types where copying is | literally memcpy -- and those are clearly marked as such by | the type system. | cbsmith wrote: | IMHO, implicit allocations is a bit of a red herring. Yes, | in C/C++ heap allocations are proportionately pretty | expensive, but I've seen Java programs have just ridiculous | amounts of implicit allocations but there really isn't much | of a problem. | | But _allocations_ aren 't the same as copies, and the | argument for reference semantics has always been that | implicit copies are problematic. In your std::string | example, having that many String copies in a Java program | would be similarly terrible (and this sometimes happens by | accident because of abstraction layers that hide all the | copying going on under the covers). | | I do think Rust gets a lot of stuff right, but Rust's | cognitive load is broadly recognized. I tend to see it as | C++ with a lot fewer foot guns. ;-) | codeflo wrote: | I'm not sure I get what you mean. You wouldn't have that | many String copies in Java by passing an unchanged String | down the call stack. My point is that it's too easy to | make this mistake in C++. | cbsmith wrote: | In Java, the mistake happens only when there's an | abstraction that hides the copying from you, so it isn't | implicit in the same way, but it's still implicit. | morelisp wrote: | Allocations are as damaging as your free function is | slow. | | Java has a tremendously good GC, so can cope with lots of | allocations. Go has an OK one, so needs some help (but | mollifying it often pays dividends elsewhere in locality | and memory usage too). C++ has your default system heap, | good luck. | throwaway894345 wrote: | Historically Java has traded long pause times for fast | allocations, although I'm of the impression that it has | recently found a way to have its cake and eat it. | klodolph wrote: | Java has been tunable for a long time. Periodically, the | recommended tuning changes, or new GC algorithms become | available, etc. But it has long been possible to get | short pause times with various combinations of choosing | the right algorithm and writing your program the right | way. | | I think what _really_ throws people off here is that | getting good performance out of a Java application | involves some skills which are alien to C++ programmers, | and vice versa. You take an experienced C++ programmer | and drop them into a Java codebase, they may have a very | poor sense of what is expensive and what is cheap. Vice | versa... experienced Java programmers don't do well in | C++ either. | | The result is that you have precious few people with any | significant, real-world experience fixing performance | issues in both languages. | throwaway894345 wrote: | Agreed, but usually tuning for short pause times involves | trading off throughput or allocation performance. But at | the end of the day, if you aren't allocating a bunch of | garbage in the first place, then you don't need to be as | concerned about the costs of allocating or cleaning up | the garbage. I wish Go did more to make allocations | explicit so they could be more easily recognized and | avoided; I dislike Java's approach of making allocations | even more implicit/idiomatic while trying to fix the cost | problem in the runtime (although I admire the ambition). | lazide wrote: | What do you consider a 'long' pause time? | | I've had no issues with Java 17+ under heavy | allocation/garbage collection (data encryption pipeline I | haven't tuned to reuse buffers yet), and it's pause times | are on the order of a handful of milliseconds, without | meaningful tuning. I think it's doing something like a | GB/s of garbage collection. | | And the jvm in question is doing a LOT more than just | this, so it's coping with millions of allocated objects | at the same time. | throwaway894345 wrote: | > Yes, in C/C++ heap allocations are proportionately | pretty expensive, but I've seen Java programs have just | ridiculous amounts of implicit allocations but there | really isn't much of a problem. | | Java programs make "ridiculous amounts of implicit | allocations" because allocations are cheap in Java. And | they need to be cheap because Java doesn't have value | semantics so it leans hard on escape analysis + cheap | allocations. | | I agree with the rest of your comment, although I think | most of Rust's "cognitive load" amounts to borrow- | checker-vs-garbage-collection. You could envision a Rust | with explicit allocations and a GC, and that language | would have a "cognitive load" approaching that of Go | while also being a fair bit more performant insofar as | people can much more easily reason about allocations and | thus performance. | jhoechtl wrote: | A long long time ago Rust was a GC language. | cbsmith wrote: | > Java programs make "ridiculous amounts of implicit | allocations" because allocations are cheap in Java. And | they need to be cheap because Java doesn't have value | semantics so it leans hard on escape analysis + cheap | allocations. | | Yes, but that's kind of the point, right? Implicit | allocation isn't really a problem because a runtime that | optimizes the allocations magically for you is a lot | easier to build than a runtime that optimizes whether you | really need to be copying objects as much as you do. | throwaway894345 wrote: | > Implicit allocation isn't really a problem because a | runtime that optimizes the allocations magically for you | is a lot easier to build | | As far as I know, Java's (default) runtime gives cheap | allocations at the cost of long GC pause times. | | > than a runtime that optimizes whether you really need | to be copying objects as much as you do | | It's not "copying", it's "allocating", and avoiding | allocations isn't that much work (and frankly I'm | surprised it's such a minor problem that no one has | bothered to build an IDE plugin that highlights these | allocation points automatically--or at least I haven't | heard of such a thing). Anyway, "a runtime that minimizes | allocations" is just an escape analyzer and Java has one | of these too, and IIRC it's a lot more sophisticated than | Go's (but it's also a lot harder to reason about as a | consequence). | cbsmith wrote: | > As far as I know, Java's (default) runtime gives cheap | allocations at the cost of long GC pause times. | | "long GC pause times" is kind of vague, so I guess you | could be correct, but in practice there's a LOT of | different ways the memory management can be handled, many | of which are deemed "pauseless GC" (though the term is | somewhat misleading). | | My statement was considering that reality though. While | not true for some use cases, in the vast majority of | cases, the runtime optimizes the allocations more than | sufficiently. | | > It's not "copying", it's "allocating" | | Allocators can do a pretty good job of minimizing the | overhead of allocation, to the point the amortized cost | isn't much more than a single machine instruction. | Allocating gigabytes of memory quickly is possible. | Copying the data can be a lot more work, and often | objects have copy semantics that add a lot more | additional work. | | > Anyway, "a runtime that minimizes allocations" is just | an escape analyzer and Java has one of these too, and | IIRC it's a lot more sophisticated than Go's (but it's | also a lot harder to reason about as a consequence). | | I think you're implicitly saying "a runtime that | minimizes _heap_ allocations " there, in which case I'd | agree. | throwaway894345 wrote: | > in practice there's a LOT of different ways the memory | management can be handled, many of which are deemed | "pauseless GC" (though the term is somewhat misleading). | | Yes, but I'm pretty sure those "pauseless GC" schemes | impose other tradeoffs. | | > My statement was considering that reality though. While | not true for some use cases, in the vast majority of | cases, the runtime optimizes the allocations more than | sufficiently. | | I'm not sure I follow. The same could be said for Go--in | the vast majority of cases, Go's tradeoffs (slow | allocations, low latency / non-moving GC) are also | suitable. | | > Allocators can do a pretty good job of minimizing the | overhead of allocation, to the point the amortized cost | isn't much more than a single machine instruction. | | As far as I know, speeding up allocations to this degree | _requires_ a moving GC which imposes a bunch of other | constraints (including copying a bunch of memory around). | | > Allocating gigabytes of memory quickly is possible. | Copying the data can be a lot more work, and often | objects have copy semantics that add a lot more | additional work. | | Yes, but the bottleneck here wasn't the copying, it was | the allocations. And if you optimized away allocation | cost entirely such that only the copy cost remained, that | cost would be so small that the OP would never have | bothered to profile because copying small objects like | this is so cheap compared to everything else (even if it | is expensive compared to bump allocating). | | > I think you're implicitly saying "a runtime that | minimizes heap allocations" there, in which case I'd | agree. | | Yes, the allocator and GC are concerned with heap | allocations and not stack allocations. I'm using | "allocations" as a shorthand for "heap allocations". | cbsmith wrote: | In hindsight, I think I chose how to present this poorly, | because yes, in this case, the allocation is what is | killing the performance. I look at it, and I just see | unnecessary implied behaviour creating a performance | problem. Usually it isn't the allocations themselves that | kill you, but it certainly is the case here. | throwaway894345 wrote: | Allocating isn't "an expensive copy"; it's not analogous to | clone() in Rust. The copy isn't the problem, it's the | allocation. | cbsmith wrote: | I'd argue quite the reverse. Allocation can be quite | efficient if done properly, but copying involves a lot of | other work. | throwaway894345 wrote: | I disagree--the bottleneck here is entirely the | allocation. The copying is just a memcpy and it's very | fast for small structs like this; like I said, it's not | the same as a clone() in Rust, which is a deep copy. If | you optimized the allocation away entirely (leaving only | the copy cost), there wouldn't have been a significant | performance problem and this blog would never have been | written. | xdavidliu wrote: | > except for plain data types where copying is literally | memcpy | | what do you mean by this? If I say `let x = 5; let y = x;` | in rust, that's a "plain data type copy" of a stack value, | but memcpy is usually used to copy heap memory. What | connection between copying of primitive simple stack values | and memcpy are you suggesting here? | kccqzy wrote: | The compiler can optimize memcpy with a known size into a | small number of move instructions so they are identical | to copying stack values. | | Try playing with memcpy on Godbolt and you'll find that | the compiler will compile the memcpy to a single mov | instruction when the size is small, and some | movdqu/movups when the size is slightly large, and only a | function call when the size is huge. | | > memcpy is usually used to copy heap memory | | memcpy is often used in low-level serialization / | deserialization code since you can't just cast a buffer | pointer to a uint32_t pointer and dereference that; the | solution is memcpy between variables that are often both | on the stack. | orra wrote: | > What connection between copying of primitive simple | stack values and memcpy are you suggesting here? | | They're just using 'memcpy' as a shorthand for saying the | bitpattern is blitted. Semantically, that's like a | memcpy. The point is, there are no expensive allocations, | nor do any embedded pointer fields need adjusted, etc. | dimitrios1 wrote: | As a current Go programmer, hunting down escaping pointers via | escape analysis is a common part of the routine now. | AtNightWeCode wrote: | So, this is very basic Go design and you could write something | about how it works in C and Go and why a older lang like C don't | have this prob but then at the end of the day the Go fanclub will | down vote the hell out you no matter what. | AtNightWeCode wrote: | Go compiler is garbage by the design. A 20 year old C compiler | does not have this prob. This is also why Go have declined so | much during the last couple of years. The benefits of Go have | not increased and most of the quirks are still there. Like the | error handling, the naive compiler and the syntax sugar that | somewhat hides the diff between pointers and direct heap | allocs. | | -1 | kosherhurricane wrote: | I work on a code base that is a mixture of Go and C. | | It's IO, CPU and Memory hungry, and it's distributed. | | C is fast because it's close to how CPU and memory actually | work. Go gives you 95+% of that plus easy to learn, easy to | use language. A new person could start contributing useful | features and bug fixes immediately. A senior person could get | C-level performance. | | More and more of our code is moved from C to Go, with very | little performance penalty, but with a lot more safety and | ease of use. | | Our customers benefit, and our company makes more money. | | In the end, that's what software is about. | dist1ll wrote: | > C is fast because it's close to how CPU and memory | actually work. | | Out-of-order execution, cache hierarchies, branch | prediction, virtual memory, pipelining, vector | instructions, ILP, NUMA are all pretty transparent to the C | spec. | | Trying to accommodat hardware quirks with C feels like | blackbox engineering. It's certainly better than with | managed languages but still.... | fear91 wrote: | Can you show any examples of "go compiler being garbage"? In | my experience, it often generates much smarter code than C# | or Java. | asim wrote: | If you want to have a solid understanding and need to do it in | just a few hours here's a few things to review. | | - The Go programming language spec https://go.dev/ref/spec | | - Effective Go https://go.dev/doc/effective_go | | - Advanced Go concurrency patterns | https://go.dev/talks/2013/advconc.slide#1 | | - Plus many more talks/slides https://go.dev/talks/ | coder543 wrote: | There is potentially another option: use the midstack inliner to | move the allocation from the heap to the stack of the calling | function: https://words.filippo.io/efficient-go-apis-with-the- | inliner/ | | As long as the global slice is never mutated, the current | approach is probably fine, but it is definitely a semantic change | to the code. | ploxiln wrote: | That seems like overkill for this particular case, but it's a | very interesting technique, thanks for the link! | morelisp wrote: | Packages should be exposing an API with destination slices more | often to begin with. The stdlib is pretty good about this | (there's a few missing though 1.19 closed the most obvious | absences), but most third-party code is awful. Or worse, it | only takes strings. | throwaway232iuu wrote: | Why doesn't go use RVO like C++ and Rust? | | https://en.wikipedia.org/wiki/Copy_elision#Background | coder543 wrote: | I don't think we're on the same page about what midstack | inlining is being used for in my suggestion. This discussion | is about eliminating a heap allocation, which as far as I | understand, RVO _never_ does. Please read the article I | linked if you want to discuss this further. I don't want to | repeat the article pointlessly. | | I'm also fairly sure Go uses RVO here too, which cuts down on | the number of times the object is copied around, but again, | it's irrelevant to the discussion of heap allocations. | Copying the object isn't the performance problem here, | needlessly allocating a very short-lived object on the heap | over and over is. | amluto wrote: | Somewhat off topic, but I find a different part of this to be | quite ugly: if match || err != nil { | return rule, err } | | Translating this code to actual logic takes too much thought and | is too fragile. Is that an error path or a success path? It's | both! The logic is "if we found a rule _or_ if there was an error | then return a tuple that hopefully indicates the outcome". If any | further code were to be added in this block, it would have to be | validated for the success and the error case. | | But this only makes any sense at all if one is okay with reading | Go result returns in their full generality. A canonical Go | function returns either Success(value) or Error(err not nil, | meaningless auxiliary value). And this code has "meaningless | auxiliary value" != nil! In fact, it's a pointer that likely | escapes further into unrelated error handling code and thus | complicates and kind of lifetime or escape analysis. | | I don't use Go, but if I did, I think this part of the language | would be my biggest peeve. Go has very little explicit error | handling; fine, that's a reasonable design decision. But Go's | error handing is incorrectly typed, and that is IMO _not_ a | reasonable design. | ericbarrett wrote: | I write a lot of Go, and I agree that this is a big wart in its | error handling that would be served by a proper Result type. | | Nevertheless, the convention is that if a function returns | (value, err), and err != nil, the value is discarded (I think | of it as "undefined"). So the code is conventional. | za3faran wrote: | What I found frustrating that there is nothing stopping a | function from returning both a non nil value and a non nil | error. I've seen weird code that did that and it could have | easily resulted in incorrect behavior. | klodolph wrote: | The one place where you'll actually see this is in Read(). | | If you try to read 1000 bytes, you might get "800 bytes | read, EOF" as a result. The worst part is that os.File | won't ever do this! | ok_dad wrote: | I really hate when people use "errors" for signals. Or, | alternately, use the signals mechanisms for actual | errors. An error should be "something went wrong", and | reading a file to the EOF is not "going wrong", that's | just what the "read()" should do if you tell it so! | | I like Go's multiple returns and error checking by | default, but it definitely should have been implemented | with some sort of "Result/Error" type union instead. Go | made so many good decisions, that I am frankly amazed | they made this one really bad one. | robotresearcher wrote: | If you take the position that the semantics of read() are | 'read data from this endless stream', then EOF is an | error indicating that this model no longer applies. | | Doesn't bother me at all. | nemo1618 wrote: | Presumably this is for performance reasons: if a `Read` | call involves, say, a network request, then returning | "800, EOF" in one roundtrip is certainly nicer than "800, | nil" -> "0, EOF" in two roundtrips. In practice... I | suspect this happens rarely enough that it's not worth | the cost. The performance edge case could be handled with | an alternative API, e.g. an optional EOF() method. | amluto wrote: | In C, "discarding" a pointer in a way that leaves the value | visible is quite common. At least if one doesn't accidentally | use the pointer, it's harmless. (In the way that all manner | of unsafeness is harmless in C as long as no actual UB | occurs, which is to say it's not great.) | | But Go is a garbage collected language, and there is so such | thing as "discarding" a pointer. Either it's there or it | isn't, and this kind of leak has side effects. I find it | baffling that the language designers and the community | consider this acceptable. | | (One thing I really like about Rust is that you can't play | fast and loose with lifetimes like this. If you have function | taking &'a Vec<T> and you return &'a T, you can't arbitrarily | "discard" and leak that reference up the call chain. You need | to genuinely get rid of it by the time the vector is gone.) | slcjordan wrote: | They should probably replace that with 2 if statements to make | the error path and the non-error path obvious: | if err != nil { return nil, err } | if match { return rule, nil } | foldr wrote: | The regular return value doesn't have to be meaningless just | because the error is non-nil. In this case the function returns | the rule that triggered the error, which is potentially useful | information. I don't think it is an official Go convention that | err being non-nil entails that the other return value should be | meaningless. | za3faran wrote: | It's quite error prone, most golang code is if err != nil { | return nil, err } | | Now of course it's important to read the documentation, but a | language with sum types (or exceptions) would have used a | separate type to indicate an error condition plus useful | information on that error. | chubot wrote: | FWIW, to prevent the bug where a = b is slow for big types, | Google's C++ style guide used to mandate DISALLOW_COPY_AND_ASSIGN | (which used to be DISALLOW_EVIL_CONSTRUCTORS I think) on all | types (most types?) | | Looks like that's been gone for awhile in favor of C++ 11 stuff, | which I don't really like: | | https://google.github.io/styleguide/cppguide.html#Copyable_M... | | A lot of good software was written in that style, but it has | grown bureaucratic over time, and as the C++ language evolved | renewiltord wrote: | Clear tutorial of how to go about identifying this. Good blog | post. Since the problem was constrained and real, it helps | someone know when to use these tools. Thank you for sharing. | karmakaze wrote: | > I did consider two other approaches: Changing | Ruleset from being []Rule to []*Rule, which would mean we no | longer need to explicitly take a reference to the rule. | Returning a Rule rather than a *Rule. This would still copy the | Rule, but it should stay on the stack instead of moving to the | heap. | | > However, both of these would have resulted in a breaking change | as this method is part of the public API. | | The problem with heap allocated objects could be due to the | incorrect public API. | | The change that improves performance also gives out pointers to | the actual elements of Ruleset itself permitting the caller to | change the contents of Ruleset which wasn't possible before the | speed-up. Perhaps you're already aware since change to []*Rule | was being considered. | ok_dad wrote: | Sometimes it doesn't matter if a public API is incorrect, | because it's set in stone for whatever reason, and you just | need to fix the problem internally. | spockz wrote: | This is why I like https://github.com/openrewrite so much. | One gets to tell users how to rewrite code automatically. It | makes refactoring almost as easy as in a mono repo. | bspammer wrote: | This looks super interesting, but as far as I can tell they | don't go into how to inform downstream users to add your | recipes to their maven/gradle configuration when you make a | breaking change. | | In my head, the ideal flow would be an annotation on your | library function/class which triggers an IntelliJ | suggestion for downstream users affected by your breaking | change to run the automated refactor for them. Kinda like a | more helpful @Deprecated. | tuetuopay wrote: | Aaaaaand that's why I love Rust's decision to make copies | explicit with `.clone()`. Annoying as hell when you're not used | to it but overall worth it. | lanstin wrote: | That seems like a potential for compiler optimization. It should | already know that the rule value is only used one time, as the | target of a & and this must be somewhat common in managing return | values. | ithkuil wrote: | The semantics change. You're now returning a pointer to the | actual Rule in the Ruleset, while before you'd be returning a | pointer to copy of the Rule. | | The optimization would only work if you had a way to tell the | compiler that some values are constant/immutable. | ithkuil wrote: | BTW; I'm using both Go and Rust lately. | | In Rust you can write a function that returns the pointer of | one element of a slice. You can also write a function that | returns the pointer to a heap-allocated copy of an element of | the slice. The two functions would have different signatures. | | The compiler would also prevent mutation of the slice as long | as there are any references to individual elements of the | slice being passed around. | ok123456 wrote: | >In Rust you can write a function that returns the pointer | of one element of a slice. | | Have fun fighting the borrow checker on that one. | [deleted] | oconnor663 wrote: | It's not always so bad :) https://play.rust- | lang.org/?version=stable&mode=debug&editio... | ok123456 wrote: | Now try to actually do that in a larger program without | static lifetimes... | piperswe wrote: | It shouldn't be too bad as long as you keep in mind that | it is a reference to an item in that slice, so whatever | that slice is pointing to needs to stick around as long | as you're using elements from it. I don't often encounter | borrow checker issues anymore, because once you program | Rust long enough you know what things need to live for | what lifetimes, and you architect your "larger programs" | around that. | lanstin wrote: | Oh yeah. Too bad. Just ran the escape to heap analysis on my | current project, not looking too promising. Mostly it is | allocating structure and saving them in a huge in memory | hash. | [deleted] | kgeist wrote: | I don't think it can be optimized without altering semantics. | If it's a pointer to a value in the slice, changing the slice's | values (ruleset[i] = ...) will be reflected in all Rule values | returned from the function, because they all point to the same | memory. In the same way, changing the returned value's fields | will change the data in the original slice. The author's code | is prone to this behavior after the change. | | When it's a pointer to a copy, no such implicit dependencies | occur. | derefr wrote: | You're not wrong in general, but one interesting thing about | Go as an ecosystem (rather than as a language) is that golang | programs are mostly statically compiled -- all sources, one | pass, one code unit, one object-code output -- so they're | (theoretically) very amenable to compile-time (rather than | link-time) Whole-Program Optimization techniques. | | In this specific case, that technique would be whole-program | dataflow analysis. Given a Golang function that passes out | references-to-copies-of owned data, you _could_ actually | determine for certain -- at least in the default static- | binary linkage mode -- whether these two properties hold | universally within the resulting binary: | | 1. whether _no_ caller of the function _will ever try_ to do | anything that would cause data within their copy of the | struct to be modified; | | 2. whether the owner of the data _will never_ modify the data | of the original struct in such a way that, if the copy were | elided, the changes would be "seen" by any reads done in any | of the callers. (The owner could still modify internal | metadata within the struct for its own use, as long as such | internal metadata is 1. in private fields, 2. where all | callers live outside the package defining the struct, making | those fields inaccessible; and 3. the fields are never | accessed by any struct methods called by borrowers of the | struct -- keeping in mind that such methods can be defined | outside the package by caller code.) | | If you could prove both of these properties (using dataflow | analysis), then you could safely elide the copy within the | function, turning the return of a reference-to-a-copy-of-X | into a return of a reference-to-X. | | (And, in fact, if you can only prove the second property | universally, and the first property in specific instances, | then you can still elide the copy from the function itself; | but you'd also generate a wrapper function that calls said | function [receiving a reference-to-X], copies, and so returns | a reference-to-a-copy-of-X; and then, for any call-site where | the first property _doesn 't_ hold -- i.e. callers whose | transitive call-graph will ever modify the data -- you'd | replace the call to the original function with a call to the | wrapper. So "safe" connected caller sub-graphs would receive | references, while "unsafe" connected caller sub-graphs would | receive copies.) | oconnor663 wrote: | I think the optimization is only valid if we know that nothing | is ever going to use thr returned pointer to do mutation. | gp wrote: | I was trying to debug and improve the performance of some | parallelized C++ code over the weekend for parsing CSV files. | What would happen was parsing each file (~24k lines, 8 columns) | would take 100ms with one execution context, but when split | across many threads, the execution time of each thread would slow | down proportionally and the throughput of the whole program would | strictly decrease as thread count increased. | | I tried all of the obvious things, but the offender ended up | being a call to allocate and fill a `struct tm` object from a | string representation of a date. This doesn't have any obvious | reasons (to me) that it would cause cache invalidation, etc, so | I'm a little in the dark. | | Still, replacing this four line block improved single threaded | performance by 5x, and fixed the threaded behavior, so on the | whole it is now ~70x faster and parses about 400mb of csv per | second. | Quekid5 wrote: | False sharing, maybe? | gp wrote: | Not a bad suggestion - thanks for the idea | infamousclyde wrote: | Thank you for sharing. I'm curious if you would recommend any | good resources for profiling with Go. I enjoyed your code | snippets and methodology. | assbuttbuttass wrote: | Returning a pointer to a local variable is convenient, but can be | a source of hidden allocations. | | It's best to treat each struct as a "value" or "pointer" type, | and use one or the other consistently for each type. This mostly | avoids the need to use & in the first place | hoosieree wrote: | > You can see these decisions being made by passing -gcflags=-m | to go build: | | That's a very nice feature! I wonder if compilers for other | languages have something similar. | throwaway894345 wrote: | Does anyone with VS Code or vim plugin development experience | know how hard it would be to call this behind the scenes and | highlight allocation points in the editor? | _old_dude_ wrote: | In Java -XX:+PrintCompilation | -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining | | and -XX:+UnlockDiagnosticVMOptions | -XX:+LogCompilation | | The generated logs can be seen with JITWatch [1]. | | [1] https://github.com/AdoptOpenJDK/jitwatch | masklinn wrote: | You'd have to see if all compilers support it, but LLVM has a | "remarks" system, which should provide similar information | (though likely a lot more of it) for optimization passes which | are traced: https://llvm.org/docs/Remarks.html#introduction-to- | the-llvm-... | | The frontend may or may not have its own optimizations and logs | tho e.g. rustc now has MIR optimizations (https://rustc-dev- | guide.rust-lang.org/mir/optimizations.html) but while you can | dump MIR data (https://rustc-dev-guide.rust- | lang.org/mir/debugging.html) I don't remember seeing an | optimisation log. | | At the end of the day, I think it's more likely that you take a | look at the assembly and infer problems from there if the | profiler doesn't tell you straight. An other difference is the | _kind_ of decisions the compiler makes e.g. while a compiler | can optimize away allocations in "manual allocation" languages | (https://godbolt.org/z/5nEo7xjEr) the allocations are plainly | visible, so if they're trivially avoidable... you'll just avoid | them. | | Using Rust as an example, you'd have something like this: | pub fn match_(&self, path: &str) -> Result<&Rule, Error> { | for rule in self.0.iter() { if | rule.match_(path)? { return Ok(rule); | } } Err(Error) } | | You couldn't miss an allocation, because the return type would | have to change, and you'd need to perform the copy out: | pub fn match_(&self, path: &str) -> Result<Box<Rule>, Error> { | for rule in self.0.iter() { if | rule.match_(path)? { return | Ok(Box::new(rule.clone())); } } | Err(Error) } | amtamt wrote: | It falls in those 3% of code lines one should think of while not | optimizing prematurely. | [deleted] | is_taken wrote: | Would be interesting to see the performance difference if you | undo that move-&-change and change the function signature from: | func (r Ruleset) Match(path string) (*Rule, error) | | to: func (r *Ruleset) Match(path string) (*Rule, | error) | masklinn wrote: | Likely none: Ruleset is type Ruleset []Rule | | The original code creates a local copy of a rule and explicitly | returns a pointer to _that_. Taking the ruleset by address | wouldn 't change that issue. | Beltalowda wrote: | The deeper lesson here is "don't use pointers unless you're sure | you need them". I've seen quite a few people use pointers for no | reason in particular, or there's simply the _assumption_ it 's | faster (and have done this myself, too), but it puts a lot more | pressure on the GC than simple local stack variables. | | Of course sometimes pointers are faster, or much more convenient. | But as a rule of thumb: don't use pointers unless you've got a | specific reason for them. This applies even more so if you're | creating a lot of pointers (like in a loop, or a function that | gets called very frequently). | throwaway894345 wrote: | Eh, I've waffled a couple of times between "pass values by | default" and "pass pointers by default". Ultimately, I don't | think there's a really good answer except to understand escape | analysis and get comfortable with the escape analyzer. Notably, | "using pointers" doesn't inherently put pressure on the GC, but | rather _allocations_ put pressure on the GC and there are | plenty of instances where pointers don 't put pressure on the | GC at all (notably, if you're passing data _into_ a function by | pointer, it 's not going to allocate, but it may if you're | returning a pointer to "a stack-allocated value"). | | Notably, if you're defaulting to values, you may still have a | bunch of allocations when you try to implement interfaces, | which usually (always?) requires implicitly boxing values; | however, if you pass a pointer into a function that takes an | interface, I _don 't think_ it gets moved to the heap (but I'm | not sure, which is why Go programmers need to be comfortable | with profiling the escape analyzer and also why it would be | great if Go actually had explicit semantics for allocations). | bitwise101 wrote: | Most of the times I use a pointer just because I want to return | nil in case of error. Is this a valid reason? | kccqzy wrote: | That's not a deeper lesson. A deeper lesson is to understand | _where_ the pointer points to and then decide accordingly. | Beltalowda wrote: | That is pretty much what I said, except phrased different. | throwaway894345 wrote: | It sounded to me like you were saying "avoid pointers by | default" (bad advice IMO) rather than "if it matters, | verify whether the pointer escapes" (good advice IMO). | dist1ll wrote: | > but it puts a lot more pressure on the GC than simple local | stack variables. | | Do you have evidence for this claim? AFAIK the Go compiler does | escape analysis, and allocates pointers that don't escape on | the stack. | throwaway894345 wrote: | This is true, but it's hard to tell if a pointer escapes or | not without actually profiling. That said, I don't think the | answer is to avoid pointers, but rather to get comfortable | with profiling the escape analyzer. By default, I just stick | to the subset of Go which I know _won 't_ escape--functions | can take pointer parameters, but I'm very careful about | returning pointers to data that would otherwise be stack- | allocated (even though it's not especially idiomatic, I'll | often prefer mutating an `out _T` parameter rather than | returning a `_ T` because I _know_ the former will not | allocate). | tonymet wrote: | Overall good review of profiling tactics . But there's nothing | egregious about Golang here . Pass by value vs reference is a | common performance issue. | masklinn wrote: | > But there's nothing egregious about Golang here . Pass by | value vs reference is a common performance issue. | | The trap here is that everything is passed by reference | (pointer), but the intermediate _local_ value is, well, a value | (a copy). | | Rule is not a gigantic monster struct (it's 72 bytes), chances | are returning it by value would not have been an issue. | | Anyway I would say there _is_ an issue with Go here: it 's way | too easy to copy out of a slice. | sendfoods wrote: | 1 character, in 2 places ;) I did not know profiling support for | go was so seamless, thank you! | | May I ask, is that theme custom or available somewhere? I really | enjoyed it | blowski wrote: | > is that theme custom or available somewhere | | Looks a bit like https://newcss.net/ or Water CSS | hcm wrote: | Thanks! It's just a few dozen lines of CSS. The body font is | Inter and the monospaced font is JetBrains Mono. | gwd wrote: | > 1 character, in 2 places ;) | | Moving a single character from one place to another. :-) | | A good explanation of why "fire the developers with the lowest | 50% of lines added" is an idiotic thing to do: this sort of | deep analysis takes a lot of time and expertise, and frequently | results in tiny changes. | jimsmart wrote: | From the headline alone, I guessed this was to do with | pointers/references to values vs values themselves. | | Yep, with values that take a lot of memory, it's faster to pass | pointers/references around than it is to pass the values around, | because it is less bytes to copy. | | Of course there is more to such a decision than just performance, | because if the code makes changes to the value which are not | meant to be persisted, then one wants to be working with a copy | of the value, not a pointer to the value. So one should take care | if simply switching some code from values to pointers-to-values. | | All of these things are things that coders with more experience | of languages that use such semantics kinda know already, almost | as second nature, since the first day they got caught out by | them. But everyone is learning, to various degrees, and we all | have to start somewhere (i.e. knowing little to nothing). ___________________________________________________________________ (page generated 2022-11-14 23:00 UTC)