[HN Gopher] No sane compiler would optimize atomics (2105)
       ___________________________________________________________________
        
       No sane compiler would optimize atomics (2105)
        
       Author : wheresvic4
       Score  : 51 points
       Date   : 2021-07-11 19:58 UTC (3 hours ago)
        
 (HTM) web link (www.open-std.org)
 (TXT) w3m dump (www.open-std.org)
        
       | andi999 wrote:
       | Why is it surprising that atomic can be optimized?
        
       | rurban wrote:
       | Likewise no sane compiler would optimize memset. Unluckily we
       | don't have sane compilers
        
         | rowanG077 wrote:
         | Why not? This makes little sense to me. Why shouldn't a
         | compiler optimize it?
        
           | 1ris wrote:
           | Because, when a programmer write a call to memset, she does,
           | if you, as a compiler, take her seriously at all, expects the
           | memory to be set, and not it just being a mere suggestion 1).
           | should added a separate function called it
           | memsetOrDontIfYouThinkINeverReadThisAgain. Does this make
           | sense?
           | 
           | 1) Yes, I know there are rules and all that are precise and
           | allow that.
        
           | [deleted]
        
           | aqrit wrote:
           | https://media.ccc.de/v/35c3-9788-memsad
        
         | halayli wrote:
         | Compilers are sane and written by very smart people. They do
         | have bugs just like any other software. In my 20+ yrs of
         | coding, I can count the number of bugs I've encountered on one
         | hand.
         | 
         | memset tend to be optimized out in the dead code elimination
         | pass and highly relies on SSA. Compilers adhere to the abstract
         | machine specs. If you want memset not to be optimized out then
         | use memset_s because the spec explicitly doesn't allow that.
         | 
         | if an optimization relies on a UB and surprised the programmer
         | then the programmer was also relying on a UB in the first
         | place.
        
           | userbinator wrote:
           | If anything I think they are written by people who are _too_
           | smart --- theoreticians who care only about the standard and
           | can 't be bothered exercising common sense when the standard
           | doesn't specify what to do.
           | 
           | http://blog.metaobject.com/2014/04/cc-osmartass.html
        
             | beached_whale wrote:
             | The example about signed overflow misses the point, on good
             | code it results in a performance improvement. Similar for
             | dereference of null pointers; it allows the compiler to
             | remove visible successive checks(e.g if you call *ptr, then
             | all further calls to ptr will not be null until it is
             | written to). This makes it faster to express preconditions
             | but let the compiler make safer code faster.
        
             | halayli wrote:
             | Your definition of what's considered common sense is based
             | on your sole understanding of computers. It's more like
             | your intuition.
             | 
             | Just because you don't understand something fully doesn't
             | mean it's not common sense.
             | 
             | It's very common that programmers don't bother reading the
             | standard or attempt to understand the concept of an
             | abstract machine.
             | 
             | Programmers that know what they are doing tend not to be
             | surprised by such nuances.
        
           | 1ris wrote:
           | The problem with programming is almost never the lack of
           | smartness. It's almost always the opposite: People being too
           | clever.
           | 
           | The sane thing in a realistic scenario is to ban all usage of
           | memset and add a diagnostic to use memset_s, like any
           | responsible programmer should do with strcpy. Then however,
           | was it really wise to make this machinery, that the people
           | who need it the most are most likely to not have, necessary?
           | I really, really don't think so.
           | 
           | \Edit: Here is some code written by a very smart person,
           | Niels Fugerson. I think (not sure) it tries to wipe sensive
           | material from memory and gets it wrong. Really not sure, tho.
           | 
           | https://github.com/MacPass/KeePassKit/blob/master/TwoFish/tw.
           | ..
        
             | ncmncm wrote:
             | Yes, this code gets it wrong.
             | 
             | It demonstrates that there are many different meanings for
             | "smart". Nobody qualifies for all of them.
        
           | secondcoming wrote:
           | Isn't memset different in that it's an external function that
           | compiler writers have chosen to intercept and perform
           | themselves? Why is saying 'use memset_s' acceptable when they
           | could have stopped treating memset as a special case?
        
             | chrisseaton wrote:
             | > Isn't memset different in that it's an external function
             | that compiler writers have chosen to intercept and perform
             | themselves?
             | 
             | The semantics of memset and memset_s are defined by a
             | standard. They're not unknown functions that could do
             | anything, and they're breaking a rule by treating them
             | differently, like you're suggesting they are.
        
             | Someone wrote:
             | If you include a standard header (using <> brackets) and
             | call a function that the standard says get declared when
             | doing that, the compiler is allowed to assume you want the
             | standard behavior, and inline that/replace it by a
             | specialized version/etc.
             | 
             | Optimizing _memset_ , in particular, can reap huge
             | benefits, for example for sizes of 4 or 8 bytes in tight
             | loops or for zeroing cache line sized blocks.
        
             | halayli wrote:
             | memset is not a special case. Compiler designers are very
             | much against special casing, and rightfully so.
             | 
             | Compilers do have an intrinsic equivalent but that has
             | nothing to do with whether it gets optimized out or kept.
             | 
             | Following SSA rules, if you are writing to a non-volatile
             | memory and not reading from it for the remaining duration
             | of its lifetime then it's dead code.
             | 
             | use memset_s was a suggestion if your goal is to scrub
             | memory because it guarantees the write will happen. The
             | guarantee comes from the C standard:
             | 
             | > Unlike memset, any call to the memset_s function shall be
             | evaluated strictly according to the rules of the abstract
             | machine as described in (5.1.2.3). That is, any call to the
             | memset_s function shall assume that the memory indicated by
             | s and n may be accessible in the future and thus must
             | contain the values indicated by c
             | 
             | Which essentially means treat the memory as-if it's
             | volatile.
        
               | 1ris wrote:
               | Are you allowed to cast the void * to volatile void *,
               | pass that to memset and except the same result, or is the
               | void *, volatile void*, void * conversion eliminated
               | aswell?
        
               | ncmncm wrote:
               | You can wave whichever dead chickens you like over the
               | keyboard, all to the same effect. The compiler ensures
               | only semantics that are observable.
               | 
               | If you pass a pointer to a volatile or atomic object to
               | another function the compiler cannot see into, the
               | compiler knows that any specified operations _might be_
               | observable, so must act accordingly. Then, operations on
               | that object will not be optimized away. Or, anyway, not
               | until someone builds the program using Link Time
               | Optimization ( "-flto") and the called function body
               | becomes visible.
        
               | 1ris wrote:
               | I'm confused. Is the memset to the temporary volatile
               | void* observable or not, and is the temporary possibly
               | eliminated?
        
               | ncmncm wrote:
               | Treating "the memory as-if it's volatile" would _not_
               | block such an optimization. Specifically, zeroing a
               | volatile or atomic that lives on the stack immediately
               | before returning from a function whose scope it lives in
               | is freely optimized away, because it is not observable.
               | 
               | Many people mentally model atomic as akin to volatile,
               | about which they also frequently harbor superstitions.
               | 
               | I knew a FreeBSD core committer who insisted (and
               | convinced our boss!) that volatile was entirely
               | sufficient to serialize interactions between threads.
               | 
               | Someone else expected a line with a volatile or atomic
               | operation never to be optimized away, and was appalled
               | that it happened. Assignment to a volatile stack object
               | whose address has not escaped and is not used will, in
               | fact, be optimized away, regardless of the "clear
               | language of the Standard".
               | 
               | Often, not performing what would appear to be a pointless
               | optimization would block a more substantial optimization.
               | Compiler writers optimize aggressively in order to break
               | through to the substantial ones, which are most usually
               | dead-code elimination that can then propagate backward
               | and upward.
               | 
               | If you really need for a volatile object and operations
               | on it not to be optimized away, you need to allow a
               | pointer to it to escape to somewhere the compiler cannot
               | see. Have a function in a separate ".o" file, and pass
               | the pointer to that.
        
       | chrisseaton wrote:
       | A similar thing is that I often come across people who are well-
       | informed but still surprised that compilers will combine two
       | observable operations into one, and complain that some other
       | thread somehow 'should' be able to observe the intermediate
       | state. But I don't understand how they think they would be able
       | to tell the difference between an interleaving that _never
       | happens to happen_ , and one that _will never happen_.
        
         | voidnullnil wrote:
         | So what's the motivation for the article? Presumably someone
         | complained that the compiler optimizing atomics broke their
         | code.
        
           | chrisseaton wrote:
           | If they need some interleaving to happen at least n% of the
           | time for their program to work... what am I supposed to do?
           | Because it was never guaranteed to be interleaved at least n%
           | of the time, even when the atomics were not merged. Do they
           | want me to bypass the system scheduler and run threads
           | concurrently with forced interleavings against a statistical
           | distribution? I doubt they want that. So what do they want?
        
         | historyloop wrote:
         | It can be tricky in that fuzzing your program to discover race
         | conditions seems to behave perfectly on one compilation target
         | where these atomics are detected and reified, while you later
         | discover on another platform surprisingly your app crashes 20%
         | of the time.
         | 
         | Ideally we test our applications on all hardware, with all
         | configuration permutations etc. etc. In practice we do rely on
         | our compilers translating our intent accurately and sometimes
         | such edge cases matter.
         | 
         | Compatibility is a tricky thing. It's kind of like the argument
         | whether adding optional arguments to a function breaks BC. It
         | doesn't break BC if you don't pass those parameters. But if for
         | some reason you were passing extra parameters hoping they'd be
         | ignored (for example as a handler some other place) then adding
         | optional parameters WILL break BC and cause your software's
         | behavior to be undefined.
        
           | gizmo686 wrote:
           | It is not about translating intent accurately. What you need
           | is for compilation to be fully specified. Anywhere where
           | there is multiple correct ways to compile something
           | introduces a risk for bugs that only show up with some
           | compilers.
           | 
           | If you are a compiler or library writer, one solution is to
           | avoid having useful properties that are not part of the spec.
           | For instance, Go does not guarantee any particular iteration
           | order for hashmaps; so they go out of there way to randomize
           | the iteration order, thereby preventing developers from
           | writing code that depends on a deterministic order.
           | 
           | In the case of threading, what you would need to do is
           | essentially have a compiler/runtime that goes out of its way
           | to order and time operation in a random/adversarial manner.
           | 
           | I've seen research that looks into doing this in a VM
           | environment; which would be inhibited by the type of compiler
           | optimizations being discussed. And others that modify the
           | compiler itself to replace the concurrency primitives with
           | runtime functions, that can then execute them in a fuzzed
           | order.
           | 
           | Ultimately, fuzzing and testing can only give you confidence
           | that what is being tested is mostly correct. It can never
           | give you confidence that what is written is entirely correct.
           | If you want confidence in the latter, you need to invest in
           | some form of static analysis (which could either be built
           | into the language, such as a type system, or be an external
           | analysis tool). Ultimatly, writing even a moderately
           | complicated program (by modern standards) with full
           | confidence in its correctness would involve advancing the
           | state of the art of the field by decades (if not longer).
           | 
           | For the most part, the field just doesn't care about programs
           | being fully correct; and accept it as a fact of life that
           | going onto new/untested platforms and configurations will
           | introduce/expose bugs.
        
             | b3morales wrote:
             | Very interesting. I would love to see tools emerge out of
             | that research. In my experience one of the basic problems
             | with testing/verifying concurrent code is that it is often
             | difficult-to-impossible to perturb a particular part of the
             | system, to exercise different orderings, and thus find
             | corners.
        
         | kccqzy wrote:
         | The article has this very nice example here:
         | x.fetch_add(1, std::memory_order_relaxed);
         | y.fetch_add(1, std::memory_order_relaxed);
         | x.fetch_add(1, std::memory_order_relaxed);
         | y.fetch_add(1, std::memory_order_relaxed);
         | 
         | Can a different thread observe x and y only incremented once?
         | Before the optimization yes, it's possible. After the
         | optimization, no, it's not possible. The compiler removes a
         | possible observable state in the course of optimization, and
         | that occasionally surprises people.
        
           | chrisseaton wrote:
           | > and that occasionally surprises people
           | 
           | But what did they want?
           | 
           | They tell me they're not happy with 0% chance of some
           | interleaving they need. Are they happy with 0.000000000001%?
           | I'm not sure I understand the practical difference between
           | that and 0%? Can I tell them it is possible, but rely on the
           | death of the universe happening before they have a right to
           | complain it didn't turn up yet? If they have some hard
           | minimum they require, then what do they expect me to do?
           | Completely abstract from the system scheduler to make it
           | happen?
           | 
           | Doesn't seem a reasonable expectation from programmers on
           | compiler writers, and likely wouldn't really be what they
           | wanted even if I made it happen.
        
             | dataflow wrote:
             | It might be 0.000000000001% but it need not be, right?
             | That's kind of the point. I'm pretty sure I could write you
             | a program where the probability of observing that would be
             | (say) 50% without the optimization, and with the
             | optimization that would go to 0%, defeating the program.
             | Should that be allowed?
             | 
             | This is basically a statistical/probabilistic version of
             | another problem: should a compiler be allowed to change the
             | time complexity of an algorithm? If I write linear search,
             | and the compiler proves the list is sorted, should it be
             | allowed to switch to binary search--or vice-versa?
             | Traditionally the answer might be 'yes', but it's not
             | exactly hard to argue an answer of 'no' might also be
             | desirable in a number of situations. Now in this case, it's
             | not the time complexity that's changing, but the
             | statistical properties of the program... and while in some
             | cases it might be negligible (1E-10 or what have you),
             | that's not necessarily the case, at which point a
             | difference in degree really can become a difference in
             | kind.
        
               | chrisseaton wrote:
               | But no mechanism was ever guaranteeing the 50% in the
               | first place. So their program could already stop working
               | correctly without any changes and with the same compiler.
               | What I'm saying is I don't know how they'd even tell the
               | difference between my compiler being the problem, or them
               | just being unlucky and not getting the interleavings they
               | want even though they're statistically likely.
        
               | dataflow wrote:
               | I'm saying if you made a complier that made reduced that
               | expected 50% to roughly 0%, it wouldn't matter to the
               | user if it was 1E-10 or exactly 0%. _Both_ could be
               | undesirable for exactly the same reason. Again: you can
               | complain that the 1E-10 compiler is till  "correct"
               | because there was never a 50% guarantee, but in doing
               | that you're completely missing the point. Here's an
               | exaggerated example. Imagine your Tesla autopilot's
               | accuracy suddenly dropped from >99.9999% to <0.00001%
               | overnight because of some stupid optimization (maybe this
               | one) that was _technically_ correct. Assume there was
               | never a _guarantee_ it would stay this accurate. Do you
               | _really_ think you could shift the blame to the user
               | because you were  "technically" not guaranteeing
               | anything?
        
         | overgard wrote:
         | > A similar thing is that I often come across people who are
         | well-informed but still surprised that compilers will combine
         | two observable operations into one
         | 
         | Well, think of it this way, if you're not into compilers
         | wouldn't you expect your code to run as written? Especially if
         | you put a breakpoint in there and you can see each step
         | happening sequently. I don't really think people that don't
         | know these things should be viewed as ignorant because the
         | compiler tends to be doing some very unintuitive things.
        
       | dang wrote:
       | One past thread:
       | 
       |  _No Sane Compiler Would Optimize Atomics_ -
       | https://news.ycombinator.com/item?id=11694325 - May 2016 (56
       | comments)
        
         | MauranKilom wrote:
         | Could you fix the title to not be almost a century in the
         | future please? :)
        
       | elliotpage wrote:
       | 2105? I love posts from the future
        
       | tester756 wrote:
       | >For Compiler Writers
       | 
       | >Get back to work, there's so much more to optimize... and so
       | much code to break! Help users write good code: the compiler
       | should provide diagnostics when it detects anti-patterns or
       | misuses of atomics.
       | 
       | Is it worth to put effort into compiler optimizations instead of
       | putting more effort into providing "diagnostics" and informations
       | for programmer?
       | 
       | in context of this:
       | http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.29....
        
         | kccqzy wrote:
         | The very paragraph you are quoting asks compiler writers to
         | provide better diagnostics.
        
           | tester756 wrote:
           | but there's also
           | 
           | >Get back to work, there's so much more to optimize...
           | 
           | my point is that maybe code optimization research time could
           | be spent better
        
       | overgard wrote:
       | I'd love to have a compiler that just tells me what optimizations
       | could be made, but let me either do it myself (if syntactically
       | available in the language), or explicitly mark the optimization
       | as ok through a #pragma or something like that. I just think
       | having to read the disassembly to figure out what happened isn't
       | a great user experience.
        
       ___________________________________________________________________
       (page generated 2021-07-11 23:00 UTC)