[HN Gopher] No sane compiler would optimize atomics (2105) ___________________________________________________________________ No sane compiler would optimize atomics (2105) Author : wheresvic4 Score : 51 points Date : 2021-07-11 19:58 UTC (3 hours ago) (HTM) web link (www.open-std.org) (TXT) w3m dump (www.open-std.org) | andi999 wrote: | Why is it surprising that atomic can be optimized? | rurban wrote: | Likewise no sane compiler would optimize memset. Unluckily we | don't have sane compilers | rowanG077 wrote: | Why not? This makes little sense to me. Why shouldn't a | compiler optimize it? | 1ris wrote: | Because, when a programmer write a call to memset, she does, | if you, as a compiler, take her seriously at all, expects the | memory to be set, and not it just being a mere suggestion 1). | should added a separate function called it | memsetOrDontIfYouThinkINeverReadThisAgain. Does this make | sense? | | 1) Yes, I know there are rules and all that are precise and | allow that. | [deleted] | aqrit wrote: | https://media.ccc.de/v/35c3-9788-memsad | halayli wrote: | Compilers are sane and written by very smart people. They do | have bugs just like any other software. In my 20+ yrs of | coding, I can count the number of bugs I've encountered on one | hand. | | memset tend to be optimized out in the dead code elimination | pass and highly relies on SSA. Compilers adhere to the abstract | machine specs. If you want memset not to be optimized out then | use memset_s because the spec explicitly doesn't allow that. | | if an optimization relies on a UB and surprised the programmer | then the programmer was also relying on a UB in the first | place. | userbinator wrote: | If anything I think they are written by people who are _too_ | smart --- theoreticians who care only about the standard and | can 't be bothered exercising common sense when the standard | doesn't specify what to do. | | http://blog.metaobject.com/2014/04/cc-osmartass.html | beached_whale wrote: | The example about signed overflow misses the point, on good | code it results in a performance improvement. Similar for | dereference of null pointers; it allows the compiler to | remove visible successive checks(e.g if you call *ptr, then | all further calls to ptr will not be null until it is | written to). This makes it faster to express preconditions | but let the compiler make safer code faster. | halayli wrote: | Your definition of what's considered common sense is based | on your sole understanding of computers. It's more like | your intuition. | | Just because you don't understand something fully doesn't | mean it's not common sense. | | It's very common that programmers don't bother reading the | standard or attempt to understand the concept of an | abstract machine. | | Programmers that know what they are doing tend not to be | surprised by such nuances. | 1ris wrote: | The problem with programming is almost never the lack of | smartness. It's almost always the opposite: People being too | clever. | | The sane thing in a realistic scenario is to ban all usage of | memset and add a diagnostic to use memset_s, like any | responsible programmer should do with strcpy. Then however, | was it really wise to make this machinery, that the people | who need it the most are most likely to not have, necessary? | I really, really don't think so. | | \Edit: Here is some code written by a very smart person, | Niels Fugerson. I think (not sure) it tries to wipe sensive | material from memory and gets it wrong. Really not sure, tho. | | https://github.com/MacPass/KeePassKit/blob/master/TwoFish/tw. | .. | ncmncm wrote: | Yes, this code gets it wrong. | | It demonstrates that there are many different meanings for | "smart". Nobody qualifies for all of them. | secondcoming wrote: | Isn't memset different in that it's an external function that | compiler writers have chosen to intercept and perform | themselves? Why is saying 'use memset_s' acceptable when they | could have stopped treating memset as a special case? | chrisseaton wrote: | > Isn't memset different in that it's an external function | that compiler writers have chosen to intercept and perform | themselves? | | The semantics of memset and memset_s are defined by a | standard. They're not unknown functions that could do | anything, and they're breaking a rule by treating them | differently, like you're suggesting they are. | Someone wrote: | If you include a standard header (using <> brackets) and | call a function that the standard says get declared when | doing that, the compiler is allowed to assume you want the | standard behavior, and inline that/replace it by a | specialized version/etc. | | Optimizing _memset_ , in particular, can reap huge | benefits, for example for sizes of 4 or 8 bytes in tight | loops or for zeroing cache line sized blocks. | halayli wrote: | memset is not a special case. Compiler designers are very | much against special casing, and rightfully so. | | Compilers do have an intrinsic equivalent but that has | nothing to do with whether it gets optimized out or kept. | | Following SSA rules, if you are writing to a non-volatile | memory and not reading from it for the remaining duration | of its lifetime then it's dead code. | | use memset_s was a suggestion if your goal is to scrub | memory because it guarantees the write will happen. The | guarantee comes from the C standard: | | > Unlike memset, any call to the memset_s function shall be | evaluated strictly according to the rules of the abstract | machine as described in (5.1.2.3). That is, any call to the | memset_s function shall assume that the memory indicated by | s and n may be accessible in the future and thus must | contain the values indicated by c | | Which essentially means treat the memory as-if it's | volatile. | 1ris wrote: | Are you allowed to cast the void * to volatile void *, | pass that to memset and except the same result, or is the | void *, volatile void*, void * conversion eliminated | aswell? | ncmncm wrote: | You can wave whichever dead chickens you like over the | keyboard, all to the same effect. The compiler ensures | only semantics that are observable. | | If you pass a pointer to a volatile or atomic object to | another function the compiler cannot see into, the | compiler knows that any specified operations _might be_ | observable, so must act accordingly. Then, operations on | that object will not be optimized away. Or, anyway, not | until someone builds the program using Link Time | Optimization ( "-flto") and the called function body | becomes visible. | 1ris wrote: | I'm confused. Is the memset to the temporary volatile | void* observable or not, and is the temporary possibly | eliminated? | ncmncm wrote: | Treating "the memory as-if it's volatile" would _not_ | block such an optimization. Specifically, zeroing a | volatile or atomic that lives on the stack immediately | before returning from a function whose scope it lives in | is freely optimized away, because it is not observable. | | Many people mentally model atomic as akin to volatile, | about which they also frequently harbor superstitions. | | I knew a FreeBSD core committer who insisted (and | convinced our boss!) that volatile was entirely | sufficient to serialize interactions between threads. | | Someone else expected a line with a volatile or atomic | operation never to be optimized away, and was appalled | that it happened. Assignment to a volatile stack object | whose address has not escaped and is not used will, in | fact, be optimized away, regardless of the "clear | language of the Standard". | | Often, not performing what would appear to be a pointless | optimization would block a more substantial optimization. | Compiler writers optimize aggressively in order to break | through to the substantial ones, which are most usually | dead-code elimination that can then propagate backward | and upward. | | If you really need for a volatile object and operations | on it not to be optimized away, you need to allow a | pointer to it to escape to somewhere the compiler cannot | see. Have a function in a separate ".o" file, and pass | the pointer to that. | chrisseaton wrote: | A similar thing is that I often come across people who are well- | informed but still surprised that compilers will combine two | observable operations into one, and complain that some other | thread somehow 'should' be able to observe the intermediate | state. But I don't understand how they think they would be able | to tell the difference between an interleaving that _never | happens to happen_ , and one that _will never happen_. | voidnullnil wrote: | So what's the motivation for the article? Presumably someone | complained that the compiler optimizing atomics broke their | code. | chrisseaton wrote: | If they need some interleaving to happen at least n% of the | time for their program to work... what am I supposed to do? | Because it was never guaranteed to be interleaved at least n% | of the time, even when the atomics were not merged. Do they | want me to bypass the system scheduler and run threads | concurrently with forced interleavings against a statistical | distribution? I doubt they want that. So what do they want? | historyloop wrote: | It can be tricky in that fuzzing your program to discover race | conditions seems to behave perfectly on one compilation target | where these atomics are detected and reified, while you later | discover on another platform surprisingly your app crashes 20% | of the time. | | Ideally we test our applications on all hardware, with all | configuration permutations etc. etc. In practice we do rely on | our compilers translating our intent accurately and sometimes | such edge cases matter. | | Compatibility is a tricky thing. It's kind of like the argument | whether adding optional arguments to a function breaks BC. It | doesn't break BC if you don't pass those parameters. But if for | some reason you were passing extra parameters hoping they'd be | ignored (for example as a handler some other place) then adding | optional parameters WILL break BC and cause your software's | behavior to be undefined. | gizmo686 wrote: | It is not about translating intent accurately. What you need | is for compilation to be fully specified. Anywhere where | there is multiple correct ways to compile something | introduces a risk for bugs that only show up with some | compilers. | | If you are a compiler or library writer, one solution is to | avoid having useful properties that are not part of the spec. | For instance, Go does not guarantee any particular iteration | order for hashmaps; so they go out of there way to randomize | the iteration order, thereby preventing developers from | writing code that depends on a deterministic order. | | In the case of threading, what you would need to do is | essentially have a compiler/runtime that goes out of its way | to order and time operation in a random/adversarial manner. | | I've seen research that looks into doing this in a VM | environment; which would be inhibited by the type of compiler | optimizations being discussed. And others that modify the | compiler itself to replace the concurrency primitives with | runtime functions, that can then execute them in a fuzzed | order. | | Ultimately, fuzzing and testing can only give you confidence | that what is being tested is mostly correct. It can never | give you confidence that what is written is entirely correct. | If you want confidence in the latter, you need to invest in | some form of static analysis (which could either be built | into the language, such as a type system, or be an external | analysis tool). Ultimatly, writing even a moderately | complicated program (by modern standards) with full | confidence in its correctness would involve advancing the | state of the art of the field by decades (if not longer). | | For the most part, the field just doesn't care about programs | being fully correct; and accept it as a fact of life that | going onto new/untested platforms and configurations will | introduce/expose bugs. | b3morales wrote: | Very interesting. I would love to see tools emerge out of | that research. In my experience one of the basic problems | with testing/verifying concurrent code is that it is often | difficult-to-impossible to perturb a particular part of the | system, to exercise different orderings, and thus find | corners. | kccqzy wrote: | The article has this very nice example here: | x.fetch_add(1, std::memory_order_relaxed); | y.fetch_add(1, std::memory_order_relaxed); | x.fetch_add(1, std::memory_order_relaxed); | y.fetch_add(1, std::memory_order_relaxed); | | Can a different thread observe x and y only incremented once? | Before the optimization yes, it's possible. After the | optimization, no, it's not possible. The compiler removes a | possible observable state in the course of optimization, and | that occasionally surprises people. | chrisseaton wrote: | > and that occasionally surprises people | | But what did they want? | | They tell me they're not happy with 0% chance of some | interleaving they need. Are they happy with 0.000000000001%? | I'm not sure I understand the practical difference between | that and 0%? Can I tell them it is possible, but rely on the | death of the universe happening before they have a right to | complain it didn't turn up yet? If they have some hard | minimum they require, then what do they expect me to do? | Completely abstract from the system scheduler to make it | happen? | | Doesn't seem a reasonable expectation from programmers on | compiler writers, and likely wouldn't really be what they | wanted even if I made it happen. | dataflow wrote: | It might be 0.000000000001% but it need not be, right? | That's kind of the point. I'm pretty sure I could write you | a program where the probability of observing that would be | (say) 50% without the optimization, and with the | optimization that would go to 0%, defeating the program. | Should that be allowed? | | This is basically a statistical/probabilistic version of | another problem: should a compiler be allowed to change the | time complexity of an algorithm? If I write linear search, | and the compiler proves the list is sorted, should it be | allowed to switch to binary search--or vice-versa? | Traditionally the answer might be 'yes', but it's not | exactly hard to argue an answer of 'no' might also be | desirable in a number of situations. Now in this case, it's | not the time complexity that's changing, but the | statistical properties of the program... and while in some | cases it might be negligible (1E-10 or what have you), | that's not necessarily the case, at which point a | difference in degree really can become a difference in | kind. | chrisseaton wrote: | But no mechanism was ever guaranteeing the 50% in the | first place. So their program could already stop working | correctly without any changes and with the same compiler. | What I'm saying is I don't know how they'd even tell the | difference between my compiler being the problem, or them | just being unlucky and not getting the interleavings they | want even though they're statistically likely. | dataflow wrote: | I'm saying if you made a complier that made reduced that | expected 50% to roughly 0%, it wouldn't matter to the | user if it was 1E-10 or exactly 0%. _Both_ could be | undesirable for exactly the same reason. Again: you can | complain that the 1E-10 compiler is till "correct" | because there was never a 50% guarantee, but in doing | that you're completely missing the point. Here's an | exaggerated example. Imagine your Tesla autopilot's | accuracy suddenly dropped from >99.9999% to <0.00001% | overnight because of some stupid optimization (maybe this | one) that was _technically_ correct. Assume there was | never a _guarantee_ it would stay this accurate. Do you | _really_ think you could shift the blame to the user | because you were "technically" not guaranteeing | anything? | overgard wrote: | > A similar thing is that I often come across people who are | well-informed but still surprised that compilers will combine | two observable operations into one | | Well, think of it this way, if you're not into compilers | wouldn't you expect your code to run as written? Especially if | you put a breakpoint in there and you can see each step | happening sequently. I don't really think people that don't | know these things should be viewed as ignorant because the | compiler tends to be doing some very unintuitive things. | dang wrote: | One past thread: | | _No Sane Compiler Would Optimize Atomics_ - | https://news.ycombinator.com/item?id=11694325 - May 2016 (56 | comments) | MauranKilom wrote: | Could you fix the title to not be almost a century in the | future please? :) | elliotpage wrote: | 2105? I love posts from the future | tester756 wrote: | >For Compiler Writers | | >Get back to work, there's so much more to optimize... and so | much code to break! Help users write good code: the compiler | should provide diagnostics when it detects anti-patterns or | misuses of atomics. | | Is it worth to put effort into compiler optimizations instead of | putting more effort into providing "diagnostics" and informations | for programmer? | | in context of this: | http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.29.... | kccqzy wrote: | The very paragraph you are quoting asks compiler writers to | provide better diagnostics. | tester756 wrote: | but there's also | | >Get back to work, there's so much more to optimize... | | my point is that maybe code optimization research time could | be spent better | overgard wrote: | I'd love to have a compiler that just tells me what optimizations | could be made, but let me either do it myself (if syntactically | available in the language), or explicitly mark the optimization | as ok through a #pragma or something like that. I just think | having to read the disassembly to figure out what happened isn't | a great user experience. ___________________________________________________________________ (page generated 2021-07-11 23:00 UTC)