[HN Gopher] Making a Go program faster with a one-character change
       ___________________________________________________________________
        
       Making a Go program faster with a one-character change
        
       Author : hcm
       Score  : 358 points
       Date   : 2022-11-14 14:47 UTC (8 hours ago)
        
 (HTM) web link (hmarr.com)
 (TXT) w3m dump (hmarr.com)
        
       | sakras wrote:
       | A while ago at my company we switched from GCC to Clang, and
       | noticed a couple of massive regressions (on the order of 50%?) in
       | performance having to do with floating point.
       | 
       | After profiling for a bit, I discovered that suddenly a lot of
       | time was spent in isinf on Clang and no time in GCC... Clang was
       | emitting a function call where GCC wasn't. I happened to randomly
       | change isinf to std::isinf (it's a random habit of mine to put
       | std:: in front of these C functions). Suddenly the regression
       | disappeared! I guess on Clang only std::isinf was a compiler
       | intrinsic while GCC recognized both? Anyway, that's my small-
       | change optimization story.
        
         | 10000truths wrote:
         | C defines isinf as a macro, whereas C++'s std::isinf is a
         | function. Perhaps the discrepancy has to do with differences in
         | how they're evaluated?
        
       | ludiludi wrote:
       | > If you read the title and thought "well, you were probably just
       | doing something silly beforehand", you're right!
       | 
       | Don't feel too silly. Russ Cox, one of the technical leads on the
       | Go language, made the same mistake in the regexp package of the
       | standard library.
       | 
       | https://go-review.googlesource.com/c/go/+/355789
        
       | lxe wrote:
       | This is the kind of stuff that the compiler needs to really
       | understand. If all this de-referencing and referencing magic is
       | at the control of the user, it needs to have meaningful effect on
       | what the code does. Otherwise we might as well just write C.
        
         | silisili wrote:
         | The compiler does understand it and did what was asked - it was
         | just written rather poorly.
         | 
         | There are valid use cases for wanting to take a copy, and then
         | pass along a pointer of the copy. Perhaps to go through a
         | series of modification methods that don't touch the original.
         | I'd sure hate it if the compiler tried to outsmart me on that
         | and changed the behavior away from what I'd written.
        
       | enedil wrote:
       | Went from 4.139s to 2.413s. I fail to see how it is 70%. I think
       | it is explained as 4.139/2.413 = 1.7 which of course doesn't make
       | sense here.
        
         | CodesInChaos wrote:
         | I do think saying it's 71% faster makes sense here, since "x%
         | faster" and "speed increased by x%" mean the same thing. This
         | reduces the runtime by 42%, but that doesn't mean it's just 42%
         | faster.
        
         | bmicraft wrote:
         | It would probably be more accurate to say it can do 70% more
         | stuff in the same time. Or that it takes 42% less runtime
        
           | xnorswap wrote:
           | But 70% more stuff in the same time is 70% faster.
        
             | OJFord wrote:
             | Let's say you do 10 things in 100 time (100/10 =10 time per
             | thing) to start with:
             | 
             | 70% more stuff in the same time is 17 things in 100 time
             | (100/17 =5.9 time per thing);
             | 
             | 70% faster is 10 things in 30 time (30/10 =3 time per
             | thing) - or, I would argue incorrectly, perhaps said to
             | mean 10 things in 70 time (70/10 =7 time per thing).
        
         | morelisp wrote:
         | This is an extremely common mistake in reporting performance
         | numbers. That the old version is 70% slower does not make the
         | new version 70% faster.
        
           | wizofaus wrote:
           | 70% slower is a bit ambiguous though - it could mean 70%
           | extra runtime or it could mean 30% of the new speed. Whereas
           | 70% faster would always suggest to me that it can do 70% more
           | work in the same amount of time, i.e. a 1.7x increase in
           | speed.
        
             | enedil wrote:
             | I do not agree. When you benchmark you usually measure the
             | differences between times needed to complete. This is
             | because it is highly non-obvious that if you increase
             | workload twice, the time increases also twice. Perhaps the
             | algorithm is not linear. Perhaps if you have more data, you
             | suddenly need to swap memory. Perhaps something (like disk
             | access in parallel) means that actually it takes less than
             | 2x time. This means that a concept of speed per unit of
             | work is undefined. So the only reasonable interpretation of
             | "70% faster" means "spends 30% time of original".
        
               | yongjik wrote:
               | But under your definition, if something becomes "three
               | times as fast" (i.e., 200% faster), it will have to
               | finish its task in negative time!
        
               | wizofaus wrote:
               | I can categorically state I've never thought of or
               | understand 70% faster as meaning that, and certainly not
               | 100% faster as meaning "completes instantly".
               | 
               | I see the OP has solved the problem by removing any
               | references to how much faster from the article title!
               | 
               | You're right about non-linear algorithms though. If an
               | O(n^2) algorithm is 2x / 100% faster, it can't process
               | 100% more items in the same time, but I'd understand it
               | to mean taking half the time for the same n.
        
         | barbegal wrote:
         | It's gone from 14.5 operations per minute to 24.9 operations
         | per minute so a 70% speedup.
        
         | hcm wrote:
         | Doh! Thanks for pointing out another silly mistake - I'll fix
         | that.
        
           | xnorswap wrote:
           | You had it right the first time, 1.7x speed _is_ 70% faster.
           | 
           | If something previously took 4s now takes 2s then it's 100%
           | faster.
           | 
           | Think of driving 10miles. If you drive at 20mph then it takes
           | 30 minutes. If you drive twice as fast, 40mph, it takes 15
           | minutes.
           | 
           | 40mph is 100% faster than 20mph.
           | 
           | Half the time is twice as fast!
        
             | [deleted]
        
             | wizofaus wrote:
             | Glad it wasn't just me thinking this!
        
           | markoman wrote:
           | @hcm: Would have loved to see the 'after' flamegraph just for
           | comparison purposes! I'm still trying to get used to groking
           | flamegraphs when optimizing. They're a somewhat new tool,
           | IMO.
        
           | wizofaus wrote:
           | I disagree, I think the 70% is right, and matches what you
           | still describe as a 1.7x speed increase. If it originally
           | took 4 seconds and now takes 2, I'd call that a 100% speed
           | increase, i.e. twice as fast.
        
       | t3estabc wrote:
        
       | cbsmith wrote:
       | As an old C/C++ programmer, I'm always surprised by how often
       | software developers are surprised by the performance costs of
       | inopportune value semantics (C and C++ even more so, punishes you
       | severely for using value semantics when you shouldn't). I
       | increasingly see the wisdom of languages with implicit reference
       | semantics.
       | 
       | It's not that value semantics can't be better (they most
       | assuredly can be), or that reference semantics don't cause their
       | own complexity problems, but rather that so often we
       | thoughtlessly imply/impose value semantics through interfaces in
       | ways that negatively impact performance; getting interfaces wrong
       | is a much tougher bell to unring.
       | 
       | The vast majority of my mental energy when I define an interface
       | in C++ is carefully thinking through a combination of ownership
       | contracts and value vs. reference semantics that I can mostly
       | ignore in languages with implicit reference semantics. While
       | occasionally ignoring those contracts while developing in
       | Java/Python/whatever comes back to bite me, the problem isn't
       | nearly as common or problematic as when I unintentionally impose
       | value semantics in a language that allows me to.
        
         | dleslie wrote:
         | I would also lay a chunk of blame on the use of type inference
         | which causes the information about the behaviour to be hidden
         | from view.
         | 
         | Had the author been looking at the type information within the
         | syntax of the code the profile output may not have been a
         | surprise. Perhaps the problem would never have existed in the
         | first place.
        
           | adrianmonk wrote:
           | Yeah. Often, writing out the type explicitly is just
           | busywork. But it seems it would have paid off here.
           | 
           | If you were forced to stop and think what type to declare, I
           | bet you'd write "var rule *Rule". Even if you don't think
           | deeply and just look at the return type.
           | 
           | And then if you assigned "r[i]" to "rule", you'd get a type
           | error.
        
         | bitexploder wrote:
         | Becoming good at Go is mostly knowing all the sharp edges with
         | the built in types IMO. Of course go routines and concurrency
         | primitives are difficult to master, but that is a different
         | beast and if you understand concurrency from some other
         | languages Go just makes that easy. But knowing all the
         | behaviors of slices, and really intuiting how they work makes
         | your life a lot easier and a lot less prone to bugs. And slice
         | behavior almost all comes down to what is a slice internally
         | and how and where am I copying things as I wrote my code.
         | Generally the copying is fine, but in some cases it is not or
         | it is in a tight performance critical section where you need to
         | be thinking about it.
         | 
         | Esit: Also, pointers into slices will probably leave you sad.
         | You get a pointer to the slice storage, not a pointer to the
         | thing. And if the slice resizes your pointer mow references a
         | dead slice. Basically, pointers and slices are not friends.
         | Unless you have a slice of pointers, which idiomatic go avoids
         | unless there is a decent reason for it :)
        
         | [deleted]
        
         | tylerhou wrote:
         | The performance issue here is not value semantics, it's the
         | overhead of automatic lifetime management. The copy is cheap.
         | The lifetime tracking is not because it forces a heap
         | allocation and creates additional GC pressure. In fact,
         | assuming Rule is small, if Match returned by _value_ , the code
         | would be similarly as fast.
        
           | cbsmith wrote:
           | > The performance issue here is not value semantics
           | 
           | There's no performance cost to value semantics, so of course
           | not.
           | 
           | > The copy is cheap. The lifetime tracking is not because it
           | forces a heap allocation and creates additional GC pressure.
           | In fact, assuming Rule is small, if Match returned by value,
           | the code would be similarly as fast.
           | 
           | I'm referring more to how this stuff seeps in without the
           | programmer realizing it. It's the implicit nature of all this
           | behaviour that is the problem.
        
             | tylerhou wrote:
             | Sorry, _because_ of value semantics. (Also, you wrote
             | "performance costs of inopportune value semantics". So the
             | correction is annoying; you know what I meant.)
             | 
             | This stuff seeps in without the programmer realizing it
             | because Go made a deliberate design decision to have
             | automatic lifetime management. In other words, this is a
             | feature of the language and not a bug. The only way for
             | this to not seep in is if Go forced programmers to specify
             | most lifetimes, which would make the language much more
             | cumbersome to use.
             | 
             | I.e., this is not a value-vs-reference semantics issue.
             | It's a manual-lifetime-management vs automatic-lifetime-
             | management issue. The solution is to either 1) write in a
             | language with more explicit lifetimes if performance is
             | that important, or 2) profile your code and reduce heap
             | allocations, which is what the person who wrote the article
             | did.
        
               | cbsmith wrote:
               | Sorry for being annoying. I wasn't trying to be.
               | 
               | I would disagree about this stuff seeping in because of
               | automatic lifetime management. The equivalent code in
               | Java, Smalltalk, etc., would either not have the
               | performance problem or it would be much more obvious that
               | the code wasn't as efficient as it could be.
        
               | tylerhou wrote:
               | > Sorry for being annoying. I wasn't trying to be.
               | 
               | Thanks, no worries.
               | 
               | You're right that Java doesn't have this performance
               | problem because it (apart from primitives) has no
               | implicit copies. (So the code wouldn't copy, it would
               | return a reference, hence no allocation.)
               | 
               | I think what you're trying to say is: programs written
               | languages with value semantics can implicitly create more
               | objects, which (in some cases) have additional
               | allocation/lifetime tracking in languages with automatic
               | lifetime management. So maybe one solution is to prevent
               | copies in most circumstances, and when you do need to
               | copy, make them explicit.
               | 
               | But I think this conflicts with Go's philosophy as well.
               | First, in highly concurrent programs, copies are
               | necessary. A mutex/atomic will become a bottleneck;
               | sharing is bad for performance. This means that copies
               | should be part of most types in the language. (If they
               | aren't, you'll run into issues like not being able to
               | pickle when using multiprocessing in Python [1].)
               | 
               | OK, so we need copies. But clearly, I shouldn't have to
               | write `a = b.Copy()` if b is an integer. That's too
               | verbose. And if everything has a `.Copy()` call, that
               | leads to visual noise and it's not clear when copies are
               | actually expensive. So what set of "blessed" types should
               | be implicitly copyable? Java picks primitives, but this
               | means it's basically impossible to create an int-like
               | object with value semantics. I think this is bad. C++ is
               | at the other extreme with "(almost) all objects are
               | implicitly copyable" but some people dislike how
               | forgetting a & on a std::vector type might lead to a huge
               | copy.
               | 
               | Go has chosen a reasonable middle ground -- dynamically
               | sized types like arrays and maps are not implicitly
               | copyable, but "cheap" objects like structs are implicitly
               | copyable. This means that when you see a Copy call, it's
               | meaningful. But this means you run into these situations
               | when an implicit copy interacting with lifetimes/garbage
               | collection causes a slowdown. But I don't see any other
               | alternative. Getting rid of implicit copies for cheap
               | types will cause visual noise. Getting rid of GC makes
               | your language harder to use. Both of these alternatives
               | seem worse to me.
               | 
               | To be clear, I'm not a huge fan of Go. Especially around
               | the lack of monadic error handling. But I think they have
               | done a decent job wrt. their design goals.
               | 
               | [1] https://stackoverflow.com/questions/8804830/python-
               | multiproc...
        
         | [deleted]
        
         | Jemaclus wrote:
         | I'm a dumb dumb. Can you define "implicit reference semantics"
         | and "value semantics"? You use the phrases several times in
         | your post, but I don't really understand what you mean. If it
         | helps, I'm not a C++ programmer, but I am familiar with higher
         | level languages like Go, Python, Ruby, PHP and Javascript.
        
           | scubbo wrote:
           | Thank you for asking - I, too, was a dumb dumb.
        
           | gdwatson wrote:
           | "Implicit reference semantics" means that variables
           | ordinarily refer to objects rather than containing them.
           | "Value semantics" means that variables contain values rather
           | than references, though there are pointer types that let you
           | explicitly store references when you want them. (Often this
           | is discussed in terms of parameter passing, for historical
           | reasons, but the same ideas apply to variables more
           | generally.)
           | 
           | If your language has implicit reference semantics, "x = y"
           | will cause x and y to refer to the same object. If it has
           | value semantics, x will be a copy of y.
        
             | Jemaclus wrote:
             | Thank you
        
           | masklinn wrote:
           | "Implicit reference semantics" = everything (or near enough)
           | is implicitly a reference. That's what you get in Python,
           | Ruby, or Javascript: when you pass something around, you're
           | really passing a reference to it, a pointer. Everyone ends up
           | referring to the same thing, and sharing it.
           | 
           | "Value semantics": you pass the value itself around, which
           | means you're shallow-copying it all the time. That's what you
           | get when you pass or return a bare non-interface type in Go.
           | 
           | PHP is a bit of a mess, objects are passed by reference (=
           | implicit reference semantics) but arrays are passed by value,
           | and you can opt them into pass-by-reference. You can also
           | pass non-arrays by reference, though I don't think that's
           | very common.
        
           | strifey wrote:
           | If you're familiar with Python and Go, you'll likely be able
           | to quickly spot the differences in how they handle parameter
           | passing. Python uses references and Go uses value.
        
             | erik_seaberg wrote:
             | Go maps and channels have reference semantics. Slices are
             | passed and returned by value, but backing arrays are often
             | shared (not copied) in an error-prone way, so it's safest
             | to pretend they have move semantics and treat only one copy
             | as valid.
             | 
             | IMHO they should have used pointers everywhere they didn't
             | want value semantics with deep copies.
        
           | klodolph wrote:
           | Languages like Python, Java, and C# have implicit reference
           | semantics. When you create an object, you get a pointer to
           | that object (usually, or commonly).
           | 
           | In languages like C, C++, Go, Rust... references are more
           | explicit. If you want a pointer to an object, you have to &,
           | or something similar.
           | 
           | It gets a bit fuzzy.
        
             | Jemaclus wrote:
             | Thank you
        
           | TheRealPomax wrote:
           | It's the difference between assigning/passing around "copies
           | of the data" vs. assigning/passing around "the memory address
           | for that data" under the hood.
           | 
           | PHP, for example, has explicit references. If you have an
           | `$arr1=array(1,2,3)` and an `$arr2 = $arr1`, that second
           | array is a full copy of the first array, and updating $arr1
           | does nothing to $arr2. Similarly, `function
           | update_array($arr) { $arr[0] = 'cake'; }` called with `$arr1`
           | will create a _copy_ of $arr1 for use inside that function.
           | Any changes to `$arr` inside the function only apply to that
           | copy, not to $arr1. _Unless_ you explicitly tell PHP you want
           | to work with references, by using `function update_array(
           | &$arr) { ... }` instead. PHP uses value semantics.
           | 
           | JS, on the other hand, uses implicit reference semantics. If
           | you have a `const arr1 = [1,2,3];` then `arr1` is an _alias_
           | , simply pointing to a memory location, and declaring a
           | `const arr2 = arr1`, makes arr2 _also_ be an alias for that
           | same memory location. They both reference the same thing, but
           | neither  "is" the thing. Similarly, if you have a `function
           | updateArray(arr) { arr[0] = 'cake'; }` then that `arr` is
           | _also_ just an alias pointing to the memory location of
           | whatever you passed in, and changes to `arr` change the
           | underlying data, so whether you try to access that through
           | `arr1`, `arr2` and `arr`, it 's the same thing. JS uses
           | implicit reference semantics.
           | 
           | (But note that many languages with implicit reference
           | semantics make exceptions for immutable primitives like
           | numbers or strings)
        
             | datalopers wrote:
             | That's not technically correct with regards to PHP. Your
             | statement that any changes to $arr1 or $arr2 only impact
             | the one in question, however, is accurate. If no changes
             | are made they still refer to the same data in memory. It's
             | copy-on-write semantics.
             | 
             | $arr1 = [1,2,3]; // $arr1 is a pointer to a zval array
             | [1,2,3] and refcount:1
             | 
             | $arr2 = $arr1; // $arr1 and $arr2 are pointers to the same
             | zval array but incremented refcount to 2.
             | 
             | $arr2[] = 4; // at this point, $arr2 creates a new zval
             | array, copies over the data from the first, sets the
             | recount to 1, and appends int(4). The [1,2,3] array has its
             | refcount decremented to 1 as it's still referenced by
             | $arr1.
             | 
             | [1] http://hengrui-li.blogspot.com/2011/08/php-copy-on-
             | write-how...
        
               | TheRealPomax wrote:
               | true, good point.
        
               | morelisp wrote:
               | This isn't semantics, though, but implementation. As far
               | as I'm aware there's no way to "observe" the copy short
               | of digging into runtime debugging tools, so I think it's
               | still fair to say the language has pass-by-value
               | semantics. In C++ we still refer to things returned-by-
               | value as returned by value despite the reality of NRVO,
               | etc.
        
               | johannes1234321 wrote:
               | Actually that is outdated since at least PHP 7. In modern
               | PHP the engine uses copy on write only for "larger"
               | things, but "small" things like integers or booleans live
               | on the stack and are copied around, avoiding heap
               | allocations etc.
        
             | Jemaclus wrote:
             | Thank you
        
             | unfunco wrote:
             | PHP has implicit references too, objects are passed by
             | reference by default (primitives by value, like your note)
        
           | marcosdumay wrote:
           | a = [1, 2]
           | 
           | b = a
           | 
           | b[0] = 3
           | 
           | print(a)
           | 
           | What does the above print? If the language implements
           | reference semantics, it prints [3, 2]. If it implements value
           | semantics, it prints [1, 2].
        
             | Jemaclus wrote:
             | Thank you
        
         | brundolf wrote:
         | Having used C++ relatively little, and a long time ago (and
         | never used Go), I don't think I ever realized that value
         | semantics could be used to implicitly clone _heap_ objects
         | 
         | Yeesh
        
           | masklinn wrote:
           | Normally they're not, that's really specific to C++ nonsense.
           | 
           | In a normal value-semantics system if you have a pointer you
           | just copy the pointer. Obviously if the langage doesn't have
           | your back it also means you've now fucked up your ownership,
           | but you're in C so that was to be expected.
        
             | brundolf wrote:
             | Yeah, this was my assumption when I read the original
             | comment, so I was confused about what it was saying (I
             | thought it was suggesting you should pass references
             | instead of doing a bitwise-copy most of the time, which is
             | just plausible enough that I believed it for a minute)
             | until I read further down in the thread and realized they
             | meant implicit heap-cloning
        
         | titzer wrote:
         | I think the main issue in C++ isn't value semantics, it's deep
         | copy semantics. E.g. in a functional language ADTs are
         | immutable and don't have identity. They can be freely copied,
         | or not, passed by reference or by value, but they are never
         | _deep copied_. Comparison may be deep, but not passing them.
         | 
         | That is to say, I think I mostly am agreeing with you. In Java,
         | objects are always passed by reference, never by value, and
         | never implicitly copied. But Java doesn't have _any_ value
         | types other than primitives. When I added ADTs to Virgil, I
         | wanted to get the best of both worlds; objects are pass by
         | reference, like Java, and ADTs are immutable, identity-less, so
         | they are deep compared but never deep copied. (ADTs in Virgil
         | can also sometimes be flattened, so they don 't necessarily
         | result in heap allocation).
        
           | nyanpasu64 wrote:
           | I still don't see why value structs need to be immutable;
           | ints are mutable in all languages, and structs are mutable in
           | C, C++, and Rust (if you `let mut`) and it's a feature of the
           | language.
        
           | cbsmith wrote:
           | In a functional language, you don't have to worry about bits
           | of code mutating your data. ;-) On the flip side, there's a
           | lot of cognitive load that comes with functional languages,
           | so while they do address the problem neatly...
           | 
           | I'd have to take a look at Virgil to appreciate your
           | approach, but I'm always leery of implicit value vs.
           | reference semantics tied to types (aside from the whole array
           | fiasco, easily the ugliest part of Java's type system). So
           | often the particular semantics you want are driven by the
           | context of what you're doing, rather than the what you're
           | doing it with.
        
         | thomascgalvin wrote:
         | > I increasingly see the wisdom of languages with implicit
         | reference semantics.
         | 
         | I spend most of my time in a JVM language of one flavor or
         | another, and when I was learning Go, the first thing that stuck
         | out at me was, "why would I _ever_ want the compiler to
         | invisibly copy a data structure for me? "
         | 
         | I suppose the primary reason is to prevent the callee from
         | modifying the caller's data out from under them; unless you
         | pass a reference value, you _know_ the callee cannot modify
         | your data.
         | 
         | But, as someone who leans heavily into "everything should be as
         | immutable as possible," the second thing that stuck out at me
         | was "wait, a struct can't have const fields?"
         | 
         | When I write code, it's common to have references to immutable
         | classes thrown around with wild abandon, heedless of ownership,
         | threads, or good taste, because the data just can't change. But
         | that's a paradigm that Go simply doesn't support.
        
           | cbsmith wrote:
           | > When I write code, it's common to have references to
           | immutable classes thrown around with wild abandon, heedless
           | of ownership, threads, or good taste, because the data just
           | can't change.
           | 
           | If there's anything I wish languages with implicit reference
           | semantics would adopt, it's implicit immutability. I wish
           | Java would be so much nicer with keyword that is half way
           | between "final" and "volatile" that means, "yes, you can
           | actually mutate this" and then make final semantics the
           | default for fields & variables.
        
             | cogman10 wrote:
             | Agreed.
             | 
             | Could you Imagine a Java where you have a `Map` and a
             | `MutableMap` and that's what you put at your API? I'd make
             | it SO much clearer how safe any individual API is to call.
        
               | spockz wrote:
               | Scala has had this for ages. You can have it today. Even
               | in Java either through the Google collection library or
               | through a library that mimics fp style programming. The
               | name eludes me for the moment.
        
               | gowld wrote:
               | Guava, but it's not fully typesafe.
               | 
               | Mutable is not part of the Java meta-language used for
               | typing.
               | 
               | ImmutableMap implements Map
               | 
               | https://guava.dev/releases/23.0/api/docs/com/google/commo
               | n/c...
               | 
               | and throws exceptions on mutation.
               | 
               | https://guava.dev/releases/23.0/api/docs/src-
               | html/com/google...
        
               | foverzar wrote:
               | Kotlin has this
        
               | thomascgalvin wrote:
               | Kotlin has this, but the Map is (usually) a MutableMap
               | under the covers, because it's Java bytecode at the lower
               | levels. You have to go out of your way to footgun
               | yourself, but it's still possible.
        
               | morelisp wrote:
               | In general this doesn't work, the history rule says
               | mutable types are not proper subtypes of immutable ones
               | (and the converse is obvious). If you want to capture
               | mutability in your type system, it needs to be orthogonal
               | to subtyping (like C/C++ const).
        
               | cogman10 wrote:
               | > the history rule
               | 
               | I'm unfamiliar with this rule (and not finding anything
               | good to google). Can you elaborate?
               | 
               | I can't really think of a scenario where an immutable
               | datastructure isn't a subset of actions against a mutable
               | datastructure.
        
               | kapep wrote:
               | I had to look it up too, it apparently is a constraint
               | for subtypes defined in Liskovs substitution principle
               | [1]. From Wikipedia:
               | 
               | > History constraint (the "history rule"). Objects are
               | regarded as being modifiable only through their methods
               | (encapsulation). Because subtypes may introduce methods
               | that are not present in the supertype, the introduction
               | of these methods may allow state changes in the subtype
               | that are not permissible in the supertype. The history
               | constraint prohibits this. It was the novel element
               | introduced by Liskov and Wing. A violation of this
               | constraint can be exemplified by defining a mutable point
               | as a subtype of an immutable point. This is a violation
               | of the history constraint, because in the history of the
               | immutable point, the state is always the same after
               | creation, so it cannot include the history of a mutable
               | point in general. Fields added to the subtype may however
               | be safely modified because they are not observable
               | through the supertype methods. Thus, one can define a
               | circle with immutable center and mutable radius as a
               | subtype of an immutable point without violating the
               | history constraint.
               | 
               | [1]: https://en.wikipedia.org/wiki/Liskov_substitution_pr
               | inciple#...
        
               | gowld wrote:
        
               | [deleted]
        
               | a_t48 wrote:
               | C++ still has this problem -
               | std::unordered_map<std::string, std::string>` and
               | `std::unordered_map<std::string, const std::string>` are
               | basically unrelated types - you can't const-cast the
               | templated const away. (I may be misunderstanding here)
        
               | masklinn wrote:
               | > you can't const-cast the templated const away.
               | 
               | That seems like a good thing. If you're handed a map to
               | const values you can't just go "imma gunna mutate them
               | anyway".
        
               | a_t48 wrote:
               | Yup, it's definitely a bit of a code smell if you do. The
               | issue is more the reverse, though - I can't make a
               | mutable map, then hand it by pointer/reference to
               | something that says it wants an immutable map later.
        
               | masklinn wrote:
               | Not necessarily a bad thing either, things can get odd if
               | you're not the only owner and it's mutated under you
               | unexpectedly.
        
               | [deleted]
        
               | a_t48 wrote:
               | That only matters if you're storing it. The big issue is
               | 99% of code out there uses mutable template types for
               | containers, and if you ever declare a container that
               | doesn't have a mutable template type, you stub your toe
               | as your new container isn't compatible with anyone else's
               | code. You can't even easily copy your way out.
               | std::unordered_map<std::string, const std::string> m;
               | m["foo"] = "bar";       std::unordered_map<std::string,
               | std::string> m2 = m;
               | 
               | doesn't compile.
        
               | morelisp wrote:
               | Normally (when containers are not involved) this is
               | exactly the point of a const cast.
        
           | titzer wrote:
           | > When I write code, it's common to have references to
           | immutable classes thrown around with wild abandon, heedless
           | of ownership, threads, or good taste, because the data just
           | can't change. But that's a paradigm that Go simply doesn't
           | support.
           | 
           | You might get a kick out of Virgil. It's easy (and terse!) to
           | define immutable classes and you can have immutable ADTs too.
           | (plus tuples, generics with separate typechecking, etc).
        
         | throwaway894345 wrote:
         | I also have a background in C/C++, etc and I've only ever found
         | myself missing value semantics when I use languages with
         | implicit reference semantics. I guess I always figured the
         | solution was "value semantics with better education / tooling".
         | Education: people should understand value semantics. Tooling:
         | imagine an IDE that highlights allocation points automatically
         | (or perhaps the problem is implicit allocations rather than
         | value semantics?).
        
           | cbsmith wrote:
           | > I've only ever found myself missing value semantics when I
           | use languages with implicit reference semantics.
           | 
           | Oh, I miss it every time. ;-)
           | 
           | I will say though that some newer languages seem to have a
           | confused idea about how to offer mixed semantics. A bunch of
           | them tie semantics to types. The ideal interface can vary by
           | usage context. It's hard enough getting the semantics right
           | as the callee (as opposed to caller), let alone when you're
           | defining a type that will be used by who knows how many
           | interfaces.
           | 
           | > I guess I always figured the solution was "value semantics
           | with better education / tooling".
           | 
           | I've always thought much the same, but I have slowly come to
           | appreciate that it's more than just education & tooling. Even
           | with good education & tooling, there's a cognitive load that
           | comes with getting interfaces right that for the general case
           | is just not worth it.
        
             | adgjlsfhk1 wrote:
             | I think this is half right. For anything 64 bits or
             | smaller, value semantics are pretty much always going to be
             | better. That said, being able to choose between value and
             | reference semantics for larger objects per object is a
             | pretty useful feature.
        
               | cbsmith wrote:
               | > For anything 64 bits or smaller, value semantics are
               | pretty much always going to be better.
               | 
               | That's assuming a 64-bit CPU (which admittedly seems like
               | a reasonable assumption. The nice thing about the
               | abstraction though is that there's nothing preventing the
               | runtime from applying value semantics for those trivial
               | small-object cases where they're obviously more
               | efficient.
        
             | throwaway894345 wrote:
             | > I will say though that some newer languages seem to have
             | a confused idea about how to offer mixed semantics. A bunch
             | of them tie semantics to types.
             | 
             | Curious about what you mean here. This sounds like C#'s
             | class/struct distinction to me.
        
               | cbsmith wrote:
               | That's exactly the example I was thinking of.
        
           | Serow225 wrote:
           | The JetBrains IDEs can do this, at least for .NET
        
           | ok123456 wrote:
           | > Tooling: imagine an IDE that highlights allocation points
           | automatically
           | 
           | Rider does this already for C#.
        
           | TremendousJudge wrote:
           | >or perhaps the problem is implicit allocations rather than
           | value semantics
           | 
           | To me, this sounds like this is it. _Explicit is better than
           | implicit_ is a very useful truism
        
             | cbsmith wrote:
             | The counter argument to the "explicit is better than
             | implicit" is that abstraction & encapsulation are such
             | significant force multipliers. If done properly, implicit
             | is good. It's just that in case of copying, doing it
             | "properly" is well nigh impossible.
        
               | kazinator wrote:
               | explicit  implicit            good       *    <    *
               | v    v    v            bad        *    >    *
               | 
               | Good implicit is better than good explicit. (If all is
               | good, go for implicit.)
               | 
               | Bad explicit is better than bad implicit. (If all is bad,
               | go for explicit; don't hide bad explicit with bad
               | implicit.)
               | 
               | Good explicit or implicit is better than bad explicit or
               | implicit.
        
           | codeflo wrote:
           | > perhaps the problem is implicit allocations rather than
           | value semantics?
           | 
           | I think that's true. Expensive copies should never have been
           | implicit. There was a story some time ago about a single
           | keypress in the address bar of Chrome causing thousands of
           | memory allocations. The culprit: lots of std::string
           | arguments up and down the call stack.
           | 
           | Rust gets this right, with the hindsight of C++'s example: "a
           | = b" is a move operation by default and clone() is always
           | explicit, except for plain data types where copying is
           | literally memcpy -- and those are clearly marked as such by
           | the type system.
        
             | cbsmith wrote:
             | IMHO, implicit allocations is a bit of a red herring. Yes,
             | in C/C++ heap allocations are proportionately pretty
             | expensive, but I've seen Java programs have just ridiculous
             | amounts of implicit allocations but there really isn't much
             | of a problem.
             | 
             | But _allocations_ aren 't the same as copies, and the
             | argument for reference semantics has always been that
             | implicit copies are problematic. In your std::string
             | example, having that many String copies in a Java program
             | would be similarly terrible (and this sometimes happens by
             | accident because of abstraction layers that hide all the
             | copying going on under the covers).
             | 
             | I do think Rust gets a lot of stuff right, but Rust's
             | cognitive load is broadly recognized. I tend to see it as
             | C++ with a lot fewer foot guns. ;-)
        
               | codeflo wrote:
               | I'm not sure I get what you mean. You wouldn't have that
               | many String copies in Java by passing an unchanged String
               | down the call stack. My point is that it's too easy to
               | make this mistake in C++.
        
               | cbsmith wrote:
               | In Java, the mistake happens only when there's an
               | abstraction that hides the copying from you, so it isn't
               | implicit in the same way, but it's still implicit.
        
               | morelisp wrote:
               | Allocations are as damaging as your free function is
               | slow.
               | 
               | Java has a tremendously good GC, so can cope with lots of
               | allocations. Go has an OK one, so needs some help (but
               | mollifying it often pays dividends elsewhere in locality
               | and memory usage too). C++ has your default system heap,
               | good luck.
        
               | throwaway894345 wrote:
               | Historically Java has traded long pause times for fast
               | allocations, although I'm of the impression that it has
               | recently found a way to have its cake and eat it.
        
               | klodolph wrote:
               | Java has been tunable for a long time. Periodically, the
               | recommended tuning changes, or new GC algorithms become
               | available, etc. But it has long been possible to get
               | short pause times with various combinations of choosing
               | the right algorithm and writing your program the right
               | way.
               | 
               | I think what _really_ throws people off here is that
               | getting good performance out of a Java application
               | involves some skills which are alien to C++ programmers,
               | and vice versa. You take an experienced C++ programmer
               | and drop them into a Java codebase, they may have a very
               | poor sense of what is expensive and what is cheap. Vice
               | versa... experienced Java programmers don't do well in
               | C++ either.
               | 
               | The result is that you have precious few people with any
               | significant, real-world experience fixing performance
               | issues in both languages.
        
               | throwaway894345 wrote:
               | Agreed, but usually tuning for short pause times involves
               | trading off throughput or allocation performance. But at
               | the end of the day, if you aren't allocating a bunch of
               | garbage in the first place, then you don't need to be as
               | concerned about the costs of allocating or cleaning up
               | the garbage. I wish Go did more to make allocations
               | explicit so they could be more easily recognized and
               | avoided; I dislike Java's approach of making allocations
               | even more implicit/idiomatic while trying to fix the cost
               | problem in the runtime (although I admire the ambition).
        
               | lazide wrote:
               | What do you consider a 'long' pause time?
               | 
               | I've had no issues with Java 17+ under heavy
               | allocation/garbage collection (data encryption pipeline I
               | haven't tuned to reuse buffers yet), and it's pause times
               | are on the order of a handful of milliseconds, without
               | meaningful tuning. I think it's doing something like a
               | GB/s of garbage collection.
               | 
               | And the jvm in question is doing a LOT more than just
               | this, so it's coping with millions of allocated objects
               | at the same time.
        
               | throwaway894345 wrote:
               | > Yes, in C/C++ heap allocations are proportionately
               | pretty expensive, but I've seen Java programs have just
               | ridiculous amounts of implicit allocations but there
               | really isn't much of a problem.
               | 
               | Java programs make "ridiculous amounts of implicit
               | allocations" because allocations are cheap in Java. And
               | they need to be cheap because Java doesn't have value
               | semantics so it leans hard on escape analysis + cheap
               | allocations.
               | 
               | I agree with the rest of your comment, although I think
               | most of Rust's "cognitive load" amounts to borrow-
               | checker-vs-garbage-collection. You could envision a Rust
               | with explicit allocations and a GC, and that language
               | would have a "cognitive load" approaching that of Go
               | while also being a fair bit more performant insofar as
               | people can much more easily reason about allocations and
               | thus performance.
        
               | jhoechtl wrote:
               | A long long time ago Rust was a GC language.
        
               | cbsmith wrote:
               | > Java programs make "ridiculous amounts of implicit
               | allocations" because allocations are cheap in Java. And
               | they need to be cheap because Java doesn't have value
               | semantics so it leans hard on escape analysis + cheap
               | allocations.
               | 
               | Yes, but that's kind of the point, right? Implicit
               | allocation isn't really a problem because a runtime that
               | optimizes the allocations magically for you is a lot
               | easier to build than a runtime that optimizes whether you
               | really need to be copying objects as much as you do.
        
               | throwaway894345 wrote:
               | > Implicit allocation isn't really a problem because a
               | runtime that optimizes the allocations magically for you
               | is a lot easier to build
               | 
               | As far as I know, Java's (default) runtime gives cheap
               | allocations at the cost of long GC pause times.
               | 
               | > than a runtime that optimizes whether you really need
               | to be copying objects as much as you do
               | 
               | It's not "copying", it's "allocating", and avoiding
               | allocations isn't that much work (and frankly I'm
               | surprised it's such a minor problem that no one has
               | bothered to build an IDE plugin that highlights these
               | allocation points automatically--or at least I haven't
               | heard of such a thing). Anyway, "a runtime that minimizes
               | allocations" is just an escape analyzer and Java has one
               | of these too, and IIRC it's a lot more sophisticated than
               | Go's (but it's also a lot harder to reason about as a
               | consequence).
        
               | cbsmith wrote:
               | > As far as I know, Java's (default) runtime gives cheap
               | allocations at the cost of long GC pause times.
               | 
               | "long GC pause times" is kind of vague, so I guess you
               | could be correct, but in practice there's a LOT of
               | different ways the memory management can be handled, many
               | of which are deemed "pauseless GC" (though the term is
               | somewhat misleading).
               | 
               | My statement was considering that reality though. While
               | not true for some use cases, in the vast majority of
               | cases, the runtime optimizes the allocations more than
               | sufficiently.
               | 
               | > It's not "copying", it's "allocating"
               | 
               | Allocators can do a pretty good job of minimizing the
               | overhead of allocation, to the point the amortized cost
               | isn't much more than a single machine instruction.
               | Allocating gigabytes of memory quickly is possible.
               | Copying the data can be a lot more work, and often
               | objects have copy semantics that add a lot more
               | additional work.
               | 
               | > Anyway, "a runtime that minimizes allocations" is just
               | an escape analyzer and Java has one of these too, and
               | IIRC it's a lot more sophisticated than Go's (but it's
               | also a lot harder to reason about as a consequence).
               | 
               | I think you're implicitly saying "a runtime that
               | minimizes _heap_ allocations " there, in which case I'd
               | agree.
        
               | throwaway894345 wrote:
               | > in practice there's a LOT of different ways the memory
               | management can be handled, many of which are deemed
               | "pauseless GC" (though the term is somewhat misleading).
               | 
               | Yes, but I'm pretty sure those "pauseless GC" schemes
               | impose other tradeoffs.
               | 
               | > My statement was considering that reality though. While
               | not true for some use cases, in the vast majority of
               | cases, the runtime optimizes the allocations more than
               | sufficiently.
               | 
               | I'm not sure I follow. The same could be said for Go--in
               | the vast majority of cases, Go's tradeoffs (slow
               | allocations, low latency / non-moving GC) are also
               | suitable.
               | 
               | > Allocators can do a pretty good job of minimizing the
               | overhead of allocation, to the point the amortized cost
               | isn't much more than a single machine instruction.
               | 
               | As far as I know, speeding up allocations to this degree
               | _requires_ a moving GC which imposes a bunch of other
               | constraints (including copying a bunch of memory around).
               | 
               | > Allocating gigabytes of memory quickly is possible.
               | Copying the data can be a lot more work, and often
               | objects have copy semantics that add a lot more
               | additional work.
               | 
               | Yes, but the bottleneck here wasn't the copying, it was
               | the allocations. And if you optimized away allocation
               | cost entirely such that only the copy cost remained, that
               | cost would be so small that the OP would never have
               | bothered to profile because copying small objects like
               | this is so cheap compared to everything else (even if it
               | is expensive compared to bump allocating).
               | 
               | > I think you're implicitly saying "a runtime that
               | minimizes heap allocations" there, in which case I'd
               | agree.
               | 
               | Yes, the allocator and GC are concerned with heap
               | allocations and not stack allocations. I'm using
               | "allocations" as a shorthand for "heap allocations".
        
               | cbsmith wrote:
               | In hindsight, I think I chose how to present this poorly,
               | because yes, in this case, the allocation is what is
               | killing the performance. I look at it, and I just see
               | unnecessary implied behaviour creating a performance
               | problem. Usually it isn't the allocations themselves that
               | kill you, but it certainly is the case here.
        
             | throwaway894345 wrote:
             | Allocating isn't "an expensive copy"; it's not analogous to
             | clone() in Rust. The copy isn't the problem, it's the
             | allocation.
        
               | cbsmith wrote:
               | I'd argue quite the reverse. Allocation can be quite
               | efficient if done properly, but copying involves a lot of
               | other work.
        
               | throwaway894345 wrote:
               | I disagree--the bottleneck here is entirely the
               | allocation. The copying is just a memcpy and it's very
               | fast for small structs like this; like I said, it's not
               | the same as a clone() in Rust, which is a deep copy. If
               | you optimized the allocation away entirely (leaving only
               | the copy cost), there wouldn't have been a significant
               | performance problem and this blog would never have been
               | written.
        
             | xdavidliu wrote:
             | > except for plain data types where copying is literally
             | memcpy
             | 
             | what do you mean by this? If I say `let x = 5; let y = x;`
             | in rust, that's a "plain data type copy" of a stack value,
             | but memcpy is usually used to copy heap memory. What
             | connection between copying of primitive simple stack values
             | and memcpy are you suggesting here?
        
               | kccqzy wrote:
               | The compiler can optimize memcpy with a known size into a
               | small number of move instructions so they are identical
               | to copying stack values.
               | 
               | Try playing with memcpy on Godbolt and you'll find that
               | the compiler will compile the memcpy to a single mov
               | instruction when the size is small, and some
               | movdqu/movups when the size is slightly large, and only a
               | function call when the size is huge.
               | 
               | > memcpy is usually used to copy heap memory
               | 
               | memcpy is often used in low-level serialization /
               | deserialization code since you can't just cast a buffer
               | pointer to a uint32_t pointer and dereference that; the
               | solution is memcpy between variables that are often both
               | on the stack.
        
               | orra wrote:
               | > What connection between copying of primitive simple
               | stack values and memcpy are you suggesting here?
               | 
               | They're just using 'memcpy' as a shorthand for saying the
               | bitpattern is blitted. Semantically, that's like a
               | memcpy. The point is, there are no expensive allocations,
               | nor do any embedded pointer fields need adjusted, etc.
        
         | dimitrios1 wrote:
         | As a current Go programmer, hunting down escaping pointers via
         | escape analysis is a common part of the routine now.
        
       | AtNightWeCode wrote:
       | So, this is very basic Go design and you could write something
       | about how it works in C and Go and why a older lang like C don't
       | have this prob but then at the end of the day the Go fanclub will
       | down vote the hell out you no matter what.
        
         | AtNightWeCode wrote:
         | Go compiler is garbage by the design. A 20 year old C compiler
         | does not have this prob. This is also why Go have declined so
         | much during the last couple of years. The benefits of Go have
         | not increased and most of the quirks are still there. Like the
         | error handling, the naive compiler and the syntax sugar that
         | somewhat hides the diff between pointers and direct heap
         | allocs.
         | 
         | -1
        
           | kosherhurricane wrote:
           | I work on a code base that is a mixture of Go and C.
           | 
           | It's IO, CPU and Memory hungry, and it's distributed.
           | 
           | C is fast because it's close to how CPU and memory actually
           | work. Go gives you 95+% of that plus easy to learn, easy to
           | use language. A new person could start contributing useful
           | features and bug fixes immediately. A senior person could get
           | C-level performance.
           | 
           | More and more of our code is moved from C to Go, with very
           | little performance penalty, but with a lot more safety and
           | ease of use.
           | 
           | Our customers benefit, and our company makes more money.
           | 
           | In the end, that's what software is about.
        
             | dist1ll wrote:
             | > C is fast because it's close to how CPU and memory
             | actually work.
             | 
             | Out-of-order execution, cache hierarchies, branch
             | prediction, virtual memory, pipelining, vector
             | instructions, ILP, NUMA are all pretty transparent to the C
             | spec.
             | 
             | Trying to accommodat hardware quirks with C feels like
             | blackbox engineering. It's certainly better than with
             | managed languages but still....
        
           | fear91 wrote:
           | Can you show any examples of "go compiler being garbage"? In
           | my experience, it often generates much smarter code than C#
           | or Java.
        
       | asim wrote:
       | If you want to have a solid understanding and need to do it in
       | just a few hours here's a few things to review.
       | 
       | - The Go programming language spec https://go.dev/ref/spec
       | 
       | - Effective Go https://go.dev/doc/effective_go
       | 
       | - Advanced Go concurrency patterns
       | https://go.dev/talks/2013/advconc.slide#1
       | 
       | - Plus many more talks/slides https://go.dev/talks/
        
       | coder543 wrote:
       | There is potentially another option: use the midstack inliner to
       | move the allocation from the heap to the stack of the calling
       | function: https://words.filippo.io/efficient-go-apis-with-the-
       | inliner/
       | 
       | As long as the global slice is never mutated, the current
       | approach is probably fine, but it is definitely a semantic change
       | to the code.
        
         | ploxiln wrote:
         | That seems like overkill for this particular case, but it's a
         | very interesting technique, thanks for the link!
        
         | morelisp wrote:
         | Packages should be exposing an API with destination slices more
         | often to begin with. The stdlib is pretty good about this
         | (there's a few missing though 1.19 closed the most obvious
         | absences), but most third-party code is awful. Or worse, it
         | only takes strings.
        
         | throwaway232iuu wrote:
         | Why doesn't go use RVO like C++ and Rust?
         | 
         | https://en.wikipedia.org/wiki/Copy_elision#Background
        
           | coder543 wrote:
           | I don't think we're on the same page about what midstack
           | inlining is being used for in my suggestion. This discussion
           | is about eliminating a heap allocation, which as far as I
           | understand, RVO _never_ does. Please read the article I
           | linked if you want to discuss this further. I don't want to
           | repeat the article pointlessly.
           | 
           | I'm also fairly sure Go uses RVO here too, which cuts down on
           | the number of times the object is copied around, but again,
           | it's irrelevant to the discussion of heap allocations.
           | Copying the object isn't the performance problem here,
           | needlessly allocating a very short-lived object on the heap
           | over and over is.
        
       | amluto wrote:
       | Somewhat off topic, but I find a different part of this to be
       | quite ugly:                   if match || err != nil {
       | return rule, err         }
       | 
       | Translating this code to actual logic takes too much thought and
       | is too fragile. Is that an error path or a success path? It's
       | both! The logic is "if we found a rule _or_ if there was an error
       | then return a tuple that hopefully indicates the outcome". If any
       | further code were to be added in this block, it would have to be
       | validated for the success and the error case.
       | 
       | But this only makes any sense at all if one is okay with reading
       | Go result returns in their full generality. A canonical Go
       | function returns either Success(value) or Error(err not nil,
       | meaningless auxiliary value). And this code has "meaningless
       | auxiliary value" != nil! In fact, it's a pointer that likely
       | escapes further into unrelated error handling code and thus
       | complicates and kind of lifetime or escape analysis.
       | 
       | I don't use Go, but if I did, I think this part of the language
       | would be my biggest peeve. Go has very little explicit error
       | handling; fine, that's a reasonable design decision. But Go's
       | error handing is incorrectly typed, and that is IMO _not_ a
       | reasonable design.
        
         | ericbarrett wrote:
         | I write a lot of Go, and I agree that this is a big wart in its
         | error handling that would be served by a proper Result type.
         | 
         | Nevertheless, the convention is that if a function returns
         | (value, err), and err != nil, the value is discarded (I think
         | of it as "undefined"). So the code is conventional.
        
           | za3faran wrote:
           | What I found frustrating that there is nothing stopping a
           | function from returning both a non nil value and a non nil
           | error. I've seen weird code that did that and it could have
           | easily resulted in incorrect behavior.
        
             | klodolph wrote:
             | The one place where you'll actually see this is in Read().
             | 
             | If you try to read 1000 bytes, you might get "800 bytes
             | read, EOF" as a result. The worst part is that os.File
             | won't ever do this!
        
               | ok_dad wrote:
               | I really hate when people use "errors" for signals. Or,
               | alternately, use the signals mechanisms for actual
               | errors. An error should be "something went wrong", and
               | reading a file to the EOF is not "going wrong", that's
               | just what the "read()" should do if you tell it so!
               | 
               | I like Go's multiple returns and error checking by
               | default, but it definitely should have been implemented
               | with some sort of "Result/Error" type union instead. Go
               | made so many good decisions, that I am frankly amazed
               | they made this one really bad one.
        
               | robotresearcher wrote:
               | If you take the position that the semantics of read() are
               | 'read data from this endless stream', then EOF is an
               | error indicating that this model no longer applies.
               | 
               | Doesn't bother me at all.
        
               | nemo1618 wrote:
               | Presumably this is for performance reasons: if a `Read`
               | call involves, say, a network request, then returning
               | "800, EOF" in one roundtrip is certainly nicer than "800,
               | nil" -> "0, EOF" in two roundtrips. In practice... I
               | suspect this happens rarely enough that it's not worth
               | the cost. The performance edge case could be handled with
               | an alternative API, e.g. an optional EOF() method.
        
           | amluto wrote:
           | In C, "discarding" a pointer in a way that leaves the value
           | visible is quite common. At least if one doesn't accidentally
           | use the pointer, it's harmless. (In the way that all manner
           | of unsafeness is harmless in C as long as no actual UB
           | occurs, which is to say it's not great.)
           | 
           | But Go is a garbage collected language, and there is so such
           | thing as "discarding" a pointer. Either it's there or it
           | isn't, and this kind of leak has side effects. I find it
           | baffling that the language designers and the community
           | consider this acceptable.
           | 
           | (One thing I really like about Rust is that you can't play
           | fast and loose with lifetimes like this. If you have function
           | taking &'a Vec<T> and you return &'a T, you can't arbitrarily
           | "discard" and leak that reference up the call chain. You need
           | to genuinely get rid of it by the time the vector is gone.)
        
         | slcjordan wrote:
         | They should probably replace that with 2 if statements to make
         | the error path and the non-error path obvious:
         | if err != nil {             return nil, err         }
         | if match {             return rule, nil         }
        
         | foldr wrote:
         | The regular return value doesn't have to be meaningless just
         | because the error is non-nil. In this case the function returns
         | the rule that triggered the error, which is potentially useful
         | information. I don't think it is an official Go convention that
         | err being non-nil entails that the other return value should be
         | meaningless.
        
           | za3faran wrote:
           | It's quite error prone, most golang code is if err != nil {
           | return nil, err }
           | 
           | Now of course it's important to read the documentation, but a
           | language with sum types (or exceptions) would have used a
           | separate type to indicate an error condition plus useful
           | information on that error.
        
       | chubot wrote:
       | FWIW, to prevent the bug where a = b is slow for big types,
       | Google's C++ style guide used to mandate DISALLOW_COPY_AND_ASSIGN
       | (which used to be DISALLOW_EVIL_CONSTRUCTORS I think) on all
       | types (most types?)
       | 
       | Looks like that's been gone for awhile in favor of C++ 11 stuff,
       | which I don't really like:
       | 
       | https://google.github.io/styleguide/cppguide.html#Copyable_M...
       | 
       | A lot of good software was written in that style, but it has
       | grown bureaucratic over time, and as the C++ language evolved
        
       | renewiltord wrote:
       | Clear tutorial of how to go about identifying this. Good blog
       | post. Since the problem was constrained and real, it helps
       | someone know when to use these tools. Thank you for sharing.
        
       | karmakaze wrote:
       | > I did consider two other approaches:                 Changing
       | Ruleset from being []Rule to []*Rule, which would mean we no
       | longer need to explicitly take a reference to the rule.
       | Returning a Rule rather than a *Rule. This would still copy the
       | Rule, but it should stay on the stack instead of moving to the
       | heap.
       | 
       | > However, both of these would have resulted in a breaking change
       | as this method is part of the public API.
       | 
       | The problem with heap allocated objects could be due to the
       | incorrect public API.
       | 
       | The change that improves performance also gives out pointers to
       | the actual elements of Ruleset itself permitting the caller to
       | change the contents of Ruleset which wasn't possible before the
       | speed-up. Perhaps you're already aware since change to []*Rule
       | was being considered.
        
         | ok_dad wrote:
         | Sometimes it doesn't matter if a public API is incorrect,
         | because it's set in stone for whatever reason, and you just
         | need to fix the problem internally.
        
           | spockz wrote:
           | This is why I like https://github.com/openrewrite so much.
           | One gets to tell users how to rewrite code automatically. It
           | makes refactoring almost as easy as in a mono repo.
        
             | bspammer wrote:
             | This looks super interesting, but as far as I can tell they
             | don't go into how to inform downstream users to add your
             | recipes to their maven/gradle configuration when you make a
             | breaking change.
             | 
             | In my head, the ideal flow would be an annotation on your
             | library function/class which triggers an IntelliJ
             | suggestion for downstream users affected by your breaking
             | change to run the automated refactor for them. Kinda like a
             | more helpful @Deprecated.
        
       | tuetuopay wrote:
       | Aaaaaand that's why I love Rust's decision to make copies
       | explicit with `.clone()`. Annoying as hell when you're not used
       | to it but overall worth it.
        
       | lanstin wrote:
       | That seems like a potential for compiler optimization. It should
       | already know that the rule value is only used one time, as the
       | target of a & and this must be somewhat common in managing return
       | values.
        
         | ithkuil wrote:
         | The semantics change. You're now returning a pointer to the
         | actual Rule in the Ruleset, while before you'd be returning a
         | pointer to copy of the Rule.
         | 
         | The optimization would only work if you had a way to tell the
         | compiler that some values are constant/immutable.
        
           | ithkuil wrote:
           | BTW; I'm using both Go and Rust lately.
           | 
           | In Rust you can write a function that returns the pointer of
           | one element of a slice. You can also write a function that
           | returns the pointer to a heap-allocated copy of an element of
           | the slice. The two functions would have different signatures.
           | 
           | The compiler would also prevent mutation of the slice as long
           | as there are any references to individual elements of the
           | slice being passed around.
        
             | ok123456 wrote:
             | >In Rust you can write a function that returns the pointer
             | of one element of a slice.
             | 
             | Have fun fighting the borrow checker on that one.
        
               | [deleted]
        
               | oconnor663 wrote:
               | It's not always so bad :) https://play.rust-
               | lang.org/?version=stable&mode=debug&editio...
        
               | ok123456 wrote:
               | Now try to actually do that in a larger program without
               | static lifetimes...
        
               | piperswe wrote:
               | It shouldn't be too bad as long as you keep in mind that
               | it is a reference to an item in that slice, so whatever
               | that slice is pointing to needs to stick around as long
               | as you're using elements from it. I don't often encounter
               | borrow checker issues anymore, because once you program
               | Rust long enough you know what things need to live for
               | what lifetimes, and you architect your "larger programs"
               | around that.
        
           | lanstin wrote:
           | Oh yeah. Too bad. Just ran the escape to heap analysis on my
           | current project, not looking too promising. Mostly it is
           | allocating structure and saving them in a huge in memory
           | hash.
        
         | [deleted]
        
         | kgeist wrote:
         | I don't think it can be optimized without altering semantics.
         | If it's a pointer to a value in the slice, changing the slice's
         | values (ruleset[i] = ...) will be reflected in all Rule values
         | returned from the function, because they all point to the same
         | memory. In the same way, changing the returned value's fields
         | will change the data in the original slice. The author's code
         | is prone to this behavior after the change.
         | 
         | When it's a pointer to a copy, no such implicit dependencies
         | occur.
        
           | derefr wrote:
           | You're not wrong in general, but one interesting thing about
           | Go as an ecosystem (rather than as a language) is that golang
           | programs are mostly statically compiled -- all sources, one
           | pass, one code unit, one object-code output -- so they're
           | (theoretically) very amenable to compile-time (rather than
           | link-time) Whole-Program Optimization techniques.
           | 
           | In this specific case, that technique would be whole-program
           | dataflow analysis. Given a Golang function that passes out
           | references-to-copies-of owned data, you _could_ actually
           | determine for certain -- at least in the default static-
           | binary linkage mode -- whether these two properties hold
           | universally within the resulting binary:
           | 
           | 1. whether _no_ caller of the function _will ever try_ to do
           | anything that would cause data within their copy of the
           | struct to be modified;
           | 
           | 2. whether the owner of the data _will never_ modify the data
           | of the original struct in such a way that, if the copy were
           | elided, the changes would be  "seen" by any reads done in any
           | of the callers. (The owner could still modify internal
           | metadata within the struct for its own use, as long as such
           | internal metadata is 1. in private fields, 2. where all
           | callers live outside the package defining the struct, making
           | those fields inaccessible; and 3. the fields are never
           | accessed by any struct methods called by borrowers of the
           | struct -- keeping in mind that such methods can be defined
           | outside the package by caller code.)
           | 
           | If you could prove both of these properties (using dataflow
           | analysis), then you could safely elide the copy within the
           | function, turning the return of a reference-to-a-copy-of-X
           | into a return of a reference-to-X.
           | 
           | (And, in fact, if you can only prove the second property
           | universally, and the first property in specific instances,
           | then you can still elide the copy from the function itself;
           | but you'd also generate a wrapper function that calls said
           | function [receiving a reference-to-X], copies, and so returns
           | a reference-to-a-copy-of-X; and then, for any call-site where
           | the first property _doesn 't_ hold -- i.e. callers whose
           | transitive call-graph will ever modify the data -- you'd
           | replace the call to the original function with a call to the
           | wrapper. So "safe" connected caller sub-graphs would receive
           | references, while "unsafe" connected caller sub-graphs would
           | receive copies.)
        
         | oconnor663 wrote:
         | I think the optimization is only valid if we know that nothing
         | is ever going to use thr returned pointer to do mutation.
        
       | gp wrote:
       | I was trying to debug and improve the performance of some
       | parallelized C++ code over the weekend for parsing CSV files.
       | What would happen was parsing each file (~24k lines, 8 columns)
       | would take 100ms with one execution context, but when split
       | across many threads, the execution time of each thread would slow
       | down proportionally and the throughput of the whole program would
       | strictly decrease as thread count increased.
       | 
       | I tried all of the obvious things, but the offender ended up
       | being a call to allocate and fill a `struct tm` object from a
       | string representation of a date. This doesn't have any obvious
       | reasons (to me) that it would cause cache invalidation, etc, so
       | I'm a little in the dark.
       | 
       | Still, replacing this four line block improved single threaded
       | performance by 5x, and fixed the threaded behavior, so on the
       | whole it is now ~70x faster and parses about 400mb of csv per
       | second.
        
         | Quekid5 wrote:
         | False sharing, maybe?
        
           | gp wrote:
           | Not a bad suggestion - thanks for the idea
        
       | infamousclyde wrote:
       | Thank you for sharing. I'm curious if you would recommend any
       | good resources for profiling with Go. I enjoyed your code
       | snippets and methodology.
        
       | assbuttbuttass wrote:
       | Returning a pointer to a local variable is convenient, but can be
       | a source of hidden allocations.
       | 
       | It's best to treat each struct as a "value" or "pointer" type,
       | and use one or the other consistently for each type. This mostly
       | avoids the need to use & in the first place
        
       | hoosieree wrote:
       | > You can see these decisions being made by passing -gcflags=-m
       | to go build:
       | 
       | That's a very nice feature! I wonder if compilers for other
       | languages have something similar.
        
         | throwaway894345 wrote:
         | Does anyone with VS Code or vim plugin development experience
         | know how hard it would be to call this behind the scenes and
         | highlight allocation points in the editor?
        
         | _old_dude_ wrote:
         | In Java                   -XX:+PrintCompilation
         | -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining
         | 
         | and                   -XX:+UnlockDiagnosticVMOptions
         | -XX:+LogCompilation
         | 
         | The generated logs can be seen with JITWatch [1].
         | 
         | [1] https://github.com/AdoptOpenJDK/jitwatch
        
         | masklinn wrote:
         | You'd have to see if all compilers support it, but LLVM has a
         | "remarks" system, which should provide similar information
         | (though likely a lot more of it) for optimization passes which
         | are traced: https://llvm.org/docs/Remarks.html#introduction-to-
         | the-llvm-...
         | 
         | The frontend may or may not have its own optimizations and logs
         | tho e.g. rustc now has MIR optimizations (https://rustc-dev-
         | guide.rust-lang.org/mir/optimizations.html) but while you can
         | dump MIR data (https://rustc-dev-guide.rust-
         | lang.org/mir/debugging.html) I don't remember seeing an
         | optimisation log.
         | 
         | At the end of the day, I think it's more likely that you take a
         | look at the assembly and infer problems from there if the
         | profiler doesn't tell you straight. An other difference is the
         | _kind_ of decisions the compiler makes e.g. while a compiler
         | can optimize away allocations in  "manual allocation" languages
         | (https://godbolt.org/z/5nEo7xjEr) the allocations are plainly
         | visible, so if they're trivially avoidable... you'll just avoid
         | them.
         | 
         | Using Rust as an example, you'd have something like this:
         | pub fn match_(&self, path: &str) -> Result<&Rule, Error> {
         | for rule in self.0.iter() {                 if
         | rule.match_(path)? {                     return Ok(rule);
         | }             }             Err(Error)         }
         | 
         | You couldn't miss an allocation, because the return type would
         | have to change, and you'd need to perform the copy out:
         | pub fn match_(&self, path: &str) -> Result<Box<Rule>, Error> {
         | for rule in self.0.iter() {                 if
         | rule.match_(path)? {                     return
         | Ok(Box::new(rule.clone()));                 }             }
         | Err(Error)         }
        
       | amtamt wrote:
       | It falls in those 3% of code lines one should think of while not
       | optimizing prematurely.
        
       | [deleted]
        
       | is_taken wrote:
       | Would be interesting to see the performance difference if you
       | undo that move-&-change and change the function signature from:
       | func (r Ruleset) Match(path string) (*Rule, error)
       | 
       | to:                 func (r *Ruleset) Match(path string) (*Rule,
       | error)
        
         | masklinn wrote:
         | Likely none: Ruleset is                   type Ruleset []Rule
         | 
         | The original code creates a local copy of a rule and explicitly
         | returns a pointer to _that_. Taking the ruleset by address
         | wouldn 't change that issue.
        
       | Beltalowda wrote:
       | The deeper lesson here is "don't use pointers unless you're sure
       | you need them". I've seen quite a few people use pointers for no
       | reason in particular, or there's simply the _assumption_ it 's
       | faster (and have done this myself, too), but it puts a lot more
       | pressure on the GC than simple local stack variables.
       | 
       | Of course sometimes pointers are faster, or much more convenient.
       | But as a rule of thumb: don't use pointers unless you've got a
       | specific reason for them. This applies even more so if you're
       | creating a lot of pointers (like in a loop, or a function that
       | gets called very frequently).
        
         | throwaway894345 wrote:
         | Eh, I've waffled a couple of times between "pass values by
         | default" and "pass pointers by default". Ultimately, I don't
         | think there's a really good answer except to understand escape
         | analysis and get comfortable with the escape analyzer. Notably,
         | "using pointers" doesn't inherently put pressure on the GC, but
         | rather _allocations_ put pressure on the GC and there are
         | plenty of instances where pointers don 't put pressure on the
         | GC at all (notably, if you're passing data _into_ a function by
         | pointer, it 's not going to allocate, but it may if you're
         | returning a pointer to "a stack-allocated value").
         | 
         | Notably, if you're defaulting to values, you may still have a
         | bunch of allocations when you try to implement interfaces,
         | which usually (always?) requires implicitly boxing values;
         | however, if you pass a pointer into a function that takes an
         | interface, I _don 't think_ it gets moved to the heap (but I'm
         | not sure, which is why Go programmers need to be comfortable
         | with profiling the escape analyzer and also why it would be
         | great if Go actually had explicit semantics for allocations).
        
         | bitwise101 wrote:
         | Most of the times I use a pointer just because I want to return
         | nil in case of error. Is this a valid reason?
        
         | kccqzy wrote:
         | That's not a deeper lesson. A deeper lesson is to understand
         | _where_ the pointer points to and then decide accordingly.
        
           | Beltalowda wrote:
           | That is pretty much what I said, except phrased different.
        
             | throwaway894345 wrote:
             | It sounded to me like you were saying "avoid pointers by
             | default" (bad advice IMO) rather than "if it matters,
             | verify whether the pointer escapes" (good advice IMO).
        
         | dist1ll wrote:
         | > but it puts a lot more pressure on the GC than simple local
         | stack variables.
         | 
         | Do you have evidence for this claim? AFAIK the Go compiler does
         | escape analysis, and allocates pointers that don't escape on
         | the stack.
        
           | throwaway894345 wrote:
           | This is true, but it's hard to tell if a pointer escapes or
           | not without actually profiling. That said, I don't think the
           | answer is to avoid pointers, but rather to get comfortable
           | with profiling the escape analyzer. By default, I just stick
           | to the subset of Go which I know _won 't_ escape--functions
           | can take pointer parameters, but I'm very careful about
           | returning pointers to data that would otherwise be stack-
           | allocated (even though it's not especially idiomatic, I'll
           | often prefer mutating an `out _T` parameter rather than
           | returning a `_ T` because I _know_ the former will not
           | allocate).
        
       | tonymet wrote:
       | Overall good review of profiling tactics . But there's nothing
       | egregious about Golang here . Pass by value vs reference is a
       | common performance issue.
        
         | masklinn wrote:
         | > But there's nothing egregious about Golang here . Pass by
         | value vs reference is a common performance issue.
         | 
         | The trap here is that everything is passed by reference
         | (pointer), but the intermediate _local_ value is, well, a value
         | (a copy).
         | 
         | Rule is not a gigantic monster struct (it's 72 bytes), chances
         | are returning it by value would not have been an issue.
         | 
         | Anyway I would say there _is_ an issue with Go here: it 's way
         | too easy to copy out of a slice.
        
       | sendfoods wrote:
       | 1 character, in 2 places ;) I did not know profiling support for
       | go was so seamless, thank you!
       | 
       | May I ask, is that theme custom or available somewhere? I really
       | enjoyed it
        
         | blowski wrote:
         | > is that theme custom or available somewhere
         | 
         | Looks a bit like https://newcss.net/ or Water CSS
        
         | hcm wrote:
         | Thanks! It's just a few dozen lines of CSS. The body font is
         | Inter and the monospaced font is JetBrains Mono.
        
         | gwd wrote:
         | > 1 character, in 2 places ;)
         | 
         | Moving a single character from one place to another. :-)
         | 
         | A good explanation of why "fire the developers with the lowest
         | 50% of lines added" is an idiotic thing to do: this sort of
         | deep analysis takes a lot of time and expertise, and frequently
         | results in tiny changes.
        
       | jimsmart wrote:
       | From the headline alone, I guessed this was to do with
       | pointers/references to values vs values themselves.
       | 
       | Yep, with values that take a lot of memory, it's faster to pass
       | pointers/references around than it is to pass the values around,
       | because it is less bytes to copy.
       | 
       | Of course there is more to such a decision than just performance,
       | because if the code makes changes to the value which are not
       | meant to be persisted, then one wants to be working with a copy
       | of the value, not a pointer to the value. So one should take care
       | if simply switching some code from values to pointers-to-values.
       | 
       | All of these things are things that coders with more experience
       | of languages that use such semantics kinda know already, almost
       | as second nature, since the first day they got caught out by
       | them. But everyone is learning, to various degrees, and we all
       | have to start somewhere (i.e. knowing little to nothing).
        
       ___________________________________________________________________
       (page generated 2022-11-14 23:00 UTC)