[HN Gopher] A Guide to Undefined Behavior in C and C++ (2010) ___________________________________________________________________ A Guide to Undefined Behavior in C and C++ (2010) Author : tmalsburg2 Score : 52 points Date : 2023-08-17 17:18 UTC (5 hours ago) (HTM) web link (blog.regehr.org) (TXT) w3m dump (blog.regehr.org) | Joker_vD wrote: | > Case 2: (b == 0) || ((a == INT32_MIN) && (b == -1)) | | > A Java compiler, in contrast, has obligations in Case 2 and | must deal with it (though in this particular case, it is likely | that there won't be runtime overhead since processors can usually | provide trapping behavior for integer divide by zero). | | Actually, there _will_ be runtime overhead on x86 /x64: Java | mandates that Integer.MinValue / (-1) evaluates to | Integer.MinValue (see 15.17.2. "Division Operator /" of the Java | Language Specification) but IDIV instruction raises #DE in such | circumstance. So the JITter actually emits | cmp eax, 0x80000000 jne .normalCase xor | edx, edx cmp $reg, -1 je .specialCase | .normalCase: cdq idiv $reg | .specialCase: | | code sequence as you can see in its source ([0][1]) instead of | simplistic "cdq; idiv $reg": because it _does not_ want trapping | behaviour in this particular case; but e.g. AArch64 doesn 't trap | neither division by zero nor INT_MIN / -1. That's why accurately | implementing your language's semantics on different platforms is | so annoying and why C standard left itself a nice shortcut. | | [0] | https://github.com/openjdk/jdk/blob/d27daf01d6361513a815e783... | | [1] | https://github.com/openjdk/jdk/blob/d27daf01d6361513a815e783... | fluoridation wrote: | On the other hand, C left the burden of implementing portable | semantics to its users. | Joker_vD wrote: | Yes, but when C was being made, the application-level | programmers knew the quirks of the platforms they used just | as well as the compiler writers because they were almost | precisely the same people. | Animats wrote: | The three big questions: | | 1. How big is it? | | 2. Who owns it? | | 3. Who locks it? | | Most undefined behavior in C/C++ involves those three questions. | | #1 is historically the most troublesome. And the most | inexcusable. Pascal, which predates C, didn't have that problem, | because arrays carried size info. Nor did Algol, Modula I, Modula | II, and Modula III. Modula I was a very low level language - | device registers were a language concept. | | Something I wrote on this back in 2012.[1] There was some | consensus at the time that this would work and would be backwards | compatible with C. But it would be a tough sell, and I didn't | want to spend my life selling it. | | [1] http://animats.com/papers/languages/safearraysforc43.pdf | jll29 wrote: | ...and Ada, too. I like the idea of attributes of data objects, | to access the size of x simple write x'Size (also for types | e.g. Natural'Size). | | The Wirth languages (from which Ada is also a descendant) were | so much more readable than C, yet relatively capable for | systems programming, as demonstrated by systems like TeX, | MacOS, Wirth's Modula compilers and the OS for the Lilith | workstation he co-designed from scratch. | Gibbon1 wrote: | Never used Ada but I think you can define range types so int | range 0...11. Which I feel is something that you really want | in embedded and applications level programming. | thesuperbigfrog wrote: | >> Never used Ada but I think you can define range types so | int range 0...11. | | Yes. Ada supports integral types with custom ranges: | | https://learn.adacore.com/courses/intro-to- | ada/chapters/stro... | tialaramex wrote: | In the medium-long term I want to do this for Rust as | "Pattern types" because the thing I actually want (custom | types with niches) is gated on Pattern types, as the way to | explain to the type system where the niche goes is a | Pattern. I was persuaded that we can't/ shouldn't just say | we'll half ass it, we must do it properly if we're doing | it. | | e.g. I don't necessarily have a use for an integer from 0 | to 11, but I _do_ see a use for BalancedI8, a one byte type | with values -127 to +127 via 0, thus omitting -128. I | reckon lots of people don 't need -128, whereas a niche is | very useful. Rust provides NonZeroI8, which has -128 | through +127 but no zero, but I find that's less often what | you want, and it's not today possible to make your own in | stable Rust (and in nightly Rust you need a not-for-mortals | perma-unstable attribute today). | winrid wrote: | #4 which is partly #2 - what thread is this callback being | invoked in? The calling thread? A thread pool in the library? | | Mostly a problem I have in java libraries, though. | JonChesterfield wrote: | I think a C implementation with overhead instead of UB is | implementable. I'd like to know what the fundamental | performance delta we get from UB is. Likewise not sure it's the | right choice for my life's work. | Quekid5 wrote: | The MINIMUM baseline is probably somewhere around | ASAN/UBSAN/etc. and those aren't exactly cheap... and they | don't even promise to catch _all_ the problems. The problem | is that almost every single little thing you can do in C has | _potential_ for UB, even just the + operator. | | So it would absolutely come at a HUGE performance cost, | unfortunately. | | More esoteric stuff is: If you do pointer arithmetic that | technically goes out of bounds and then _in_ bounds again... | that 's technically UB (can't remember if this is C++ only or | both), so you can't rely on knowing where everything is + | bounds checks. | matt3210 wrote: | What behaviors are undefined in rust? Oh wait nobody knows, since | it has no standard or language spec. | jcranmer wrote: | * Reading uninitialized memory | | * Violating pointer provenance | | * Out-of-bounds pointer accesses (though unlike C, I think, | it's legal to make a pointer go out-of-bounds and bring it back | in-bounds and use it) | | * Use-after-lifetime | | * Storing trap representations in variables | | * Having two mutable references to the same memory location | | * Data races | | Not an exhaustive list, and C has most of these (even the last | one, although change "two mutable references" to "two restrict | pointers"). Of course, C itself doesn't have an exhaustive list | (J.2 is not, in fact, an exhaustive list). | JonChesterfield wrote: | Pointer provenance is a nice example. A block of memory | cannot be read as an array of simd types sometimes and scalar | types otherwise. It can't contain atomic values which are | operated on using non-atomic operations during program | startup before you spawn any threads. | | There were proposals to let one mmap existing structures but | I don't know if any landed. Usually done with reinterpret | cast and hoping that rule violation doesn't break you. | | Pointer provenance does make most application code faster but | other times it opens a performance gap that you have to step | outside of C++ to close. Compiler extensions, switching off | the analysis, changing language. | angiosperm wrote: | Use of mmap itself is undefined in the language. | | Posix provides a definition that programs rely on, instead. | Implementers are allowed to define literally anything the | union of all standards leaves undefined. | JonChesterfield wrote: | Mmap itself is alright. You've got a void* from | somewhere, that's OK. You can placement new into it to | make objects. | | What isn't allowed is casting it to a hashtable type and | then using it as such. Because there is no hashtable | instance anywhere, and specifically not there, so you've | violated the pointer aliasing rules. | | The obvious fix is to guarantee that placement new | doesn't change the bytes, perhaps only for trivially | copyable types or similar constraint. I didn't see the | proposals in that direction land but also didn't see them | fail, so maybe the newer standard permits it. | LegionMammal978 wrote: | As I understand it, that's precisely what | std::start_lifetime_as<T>() does: it effectively performs | a placement new to create a T object, except that it | retains the existing bytes at the address. It only works | with implicit-lifetime types (i.e., scalars, or classes | with a trivial constructor), though, so it probably | wouldn't work with your hash table example, except | perhaps for an inline hash table. | JonChesterfield wrote: | Superb! Looking through https://en.cppreference.com/w/cpp | /memory/start_lifetime_as, this appears to be the right | thing. It also has volatile overloads (which it looks | like placement new still does not). This doesn't appear | to be implemented in libc++ yet but that seems fixable, | it'll go down the same object construction logic | placement new does. Thank you for the reference, that'll | fix some ugly edge cases in one of my libraries. | agalunar wrote: | > A block of memory cannot be read as an array of simd | types sometimes and scalar types otherwise. | | As far as I can tell, it is _currently_ the case that, | _using raw pointers,_ this is not actually undefined | behavior (but I never entirely trust my conclusions on | these matters). | | "&mut T and &T follow LLVM's scoped noalias model" | [1][referring to 2 and 3] but I am fairly sure this does | not currently apply to raw pointers, and "provenance is | implicitly shared with all pointers transitively derived | from the original pointer through operations like offset, | borrowing, and pointer casts." [4] | | [1] https://doc.rust-lang.org/reference/behavior- | considered-unde... | | [2] https://llvm.org/docs/LangRef.html#pointeraliasing | | [3] "noalias" under | https://llvm.org/docs/LangRef.html#parameter-attributes | | [4] https://doc.rust-lang.org/core/ptr/index.html | | Also excellent are | | https://faultlore.com/blah/fix-rust-pointers | | https://www.ralfj.de/blog/2018/07/24/pointers-and- | bytes.html | | https://www.ralfj.de/blog/2020/12/14/provenance.html | | https://www.ralfj.de/blog/2022/04/11/provenance- | exposed.html | | It seems likely you'd already be familiar with these; I'm | just putting them out there for anyone interested. | JonChesterfield wrote: | LLVM can represent various aliasing relationships, modulo | some risk of C++ inspired bugs in some passes. They might | all be stamped out now. I remember a bug report about one | that was open for many years. | | I'm happy to hear rust can (probably) represent the same | relationships LLVM can. C++ cannot, at least as of about | two years ago when I last looked through the | corresponding papers. All it can do is different types do | not alias, where atomic_int and int are different types. | proto_lambda wrote: | There is no undefined behaviour in Safe Rust. You're right | about Unsafe Rust of course. | lionkor wrote: | The ultimate "the code is the documentation" is "the compiler | is the language spec". | thesuperbigfrog wrote: | >> The ultimate "the code is the documentation" is "the | compiler is the language spec". | | Rust has a great potential to become a replacement for C and | C++, but the lack of a language specification is a | shortcoming that needs to be addressed for it to see wider | adoption, especially for safety-critical systems. | | If the Rust compiler does something surprising, people will | ask, "Is this a bug?" and without a spec the answer becomes | the language developers or the community asking, "What should | the compiler do in this situation?". | | It makes sense because the correct behavior (whatever that | is) has not been defined, but it has a feeling of "we are | making this up as we go along" because there is no formalized | answer defined. While this approach is fine for running your | website or building a command line tool, it is not acceptable | for safety-critical software. If the software breaks and | people die, the "we are making this up as we go along" | approach is not acceptable because it has too much risk. | lionkor wrote: | I fully agree, and its definitely a strange feeling coming | from C++ to not have a single, complete and extensive spec | to read up on if all else fails. | | I want to like Rust, but its already a kitchen sink on par | with C++ in complexity and misused quirks, not to mention | macros which hide complexity just like C macros did, that | the lack of a committee and spec makes it very difficult to | trust that it won't get more and more features as time goes | on (becoming like C++, in only the bad ways). | | I understand they have an RFC process, but thats not enough | for a language which is now so commonplace in discussion | (usually in the form of "if you did it in Rust, this | problem wouldnt exist", which is often even true). | iknowstuff wrote: | Rust macros don't hide anything. They're hygienic and | clearly annotated when used. | mike_hock wrote: | Rust macros are a crutch to work around the language's | shortcomings. It's just a better crutch than C's. | iknowstuff wrote: | >a shortcoming that needs to be addressed for it to see | wider adoption, especially for safety-critical systems. | | This seems like just a hunch of yours that does not seem to | be reflected by the real world. | thesuperbigfrog wrote: | >> This seems like just a hunch of yours that does not | seem to be reflected by the real world. | | What safety-critical systems are written in Rust? | | Where can I buy a validated Rust toolchain for safety- | critical work? | | Ferrocene is an effort to build a safety-critical Rust, | but it is not done yet: | | https://ferrous-systems.com/blog/ferrocene-update/ | mjw1007 wrote: | The good news is that the Rust project has recently agreed | to write a specification, and has a budget to hire an | editor for it. | | The less good news is that it's likely to take a long time | before anything resembling a complete description gets | written. | | You can follow its status at https://github.com/rust- | lang/rust/issues/113527 | thesuperbigfrog wrote: | >> The good news is that the Rust project has recently | agreed to write a specification, and has a budget to hire | an editor for it. | | This is awesome to hear. Following that issue . . . | zer8k wrote: | > In the long run, unsafe programming languages will not be used | by mainstream developers, but rather reserved for situations | where high performance and a low resource footprint are critical. | | I see no world where so-called "unsafe" languages would not be | used. Most graduates of Computer Science programs can, perhaps | with some trouble, implement a half decent C compiler in a | weekend or two. This is not a footnote. This fact alone means | that for any given piece of hardware you're more likely to find a | random C compiler you can use than anything else. Rust, being the | most likely contender to replace it, still cannot self-host and | the grammar is exponentially more complicated than C. It is more | like C + <whatever> will co-exist peacefully than something like | C being replaced (even ignoring the millions of lines of code | that already exist). Not for performance reasons but more that | you can churn out a C compiler quickly for almost anything given | a spec of the hardware. | | On topic, I find a desk reference for this is very useful. The | CERT C standard is pretty good to thumb through even if you don't | adhere to every suggestion. | pjmlp wrote: | Just wait until CVE become a liability like handling hazardous | chemicals. | ladberg wrote: | Eh, I don't disagree that unsafe languages will continue to be | used, but I disagree with ease of compiler design as the | reason. | | You are comparing one of the easier languages to write a | compiler for (C) with one of the hardest (Rust), and that's not | due to UB but due to other facets of the languages. I could | make up a new language that's equivalent to C in every way | except replace all UB with defined behavior and it wouldn't | make the naive compiler any different. | | Additionally, writing a compiler for a language should really | be a thing that happens only a handful of times while executing | the code happens trillions of times so I hope we don't | sacrifice safety to save compiler authors some work. | dralley wrote: | > Rust, being the most likely contender to replace it, still | cannot self-host | | What do you mean, "still cannot self-host?" | | You say that like it's a critical failure of the Rust project | that they need and are attempting to address rather than a | trivia item. Rust is perfectly happy relying on LLVM just like | (checks notes) _half the other languages in existence_. | | Libraries like LLVM are precisely what the comment you quote is | talking about. | | I'm not even sure that's true, anyway, with the cranelift | backend. Someone can chime in on whether it's good enough for | bootstrapping. | merlincorey wrote: | Self Hosting your own compiler traditionally was the "end- | game" of making a compile-able language. It's a sort of proof | of fitness that the language can literally stand on its own. | | This article about Zig achieving self-hosted status in | 2022[0] points out that they gained many advantages at the | cost of a lot of time and effort through this process. | Incidentally, they decided to self-host while also supporting | LLVM because of deficiencies in LLVM (mainly speed and target | limitations). This flexibility includes a separate "C" | backend to compile Zig to C in order to target for example | game consoles that require a specific C compiler be used. | | > You say that like it's a critical goal of the Rust project | rather than a trivia item. | | In my opinion, you are overly minimizing the potential | benefits to Rust and the Rust community for Rust to be self- | hosted. | | Of course, practically, right now it doesn't matter because | most people are more than happy to use the already working | system. | | [0] https://kristoff.it/blog/zig-self-hosted-now-what/ | dralley wrote: | As I said, the cranelift backend exists, and it provides | many of the same benefits such as improved compilation | speed. And it's written in Rust. | | But it still feels like a trivia item. C compilers written | in C exist, but almost nobody actually uses them. They use | GCC, Clang, and MSVC, written in C++. Everybody knows that | it's possible to self-host C, so the benefit of actually | doing so in practice is minimal. | | It's obviously possible to write a Rust compiler in Rust | end-to-end. Acting like it's a second tier language because | actively doing so not a top focus of the community is | gatekeep-y and ridiculous. | merlincorey wrote: | > Acting like it's a second tier language because | actively doing so not a top focus of the community is | gatekeep-y and ridiculous. | | Here's where I think you are quite a bit off target, | personally. | | I certainly was not and I don't believe the GP you | originally responded to was saying that "Rust is a second | tier language due to [lack of self-hosted compiler]", so | hopefully we can set that statement aside and ignore it | now. | | Let's instead focus on your first statement, which is | directly related to what GP and I were arguing: | | > It's obviously possible to write a Rust compiler in | Rust end-to-end. | | It is certainly possible but actually doing so is | completely non-obvious because the grammar for Rust is | much more complicated than C, and Rust has no formal | language specification (let alone an international | standard). | | While Python does not have an international standard, it | does have a formal language specification, which is what | allows for things like PyPy to exist. | | Meanwhile, to truly understand Rust, one must be an | expert in C and learn the `rustc` code base. | | It seems like, practically, knowing C and being able to | write compilers in C is quite useful if you want to make | an impact in Rust or maybe try your hand at making some | future Rust replacement (hopefully with a language | specification that others can follow). | dralley wrote: | > It is certainly possible but actually doing so is | completely non-obvious because the grammar for Rust is | much more complicated than C, and Rust has no formal | language specification (let alone an international | standard). | | The Rust compiler frontend is written in Rust. It doesn't | matter how non-trivial writing a Rust frontend is if you | can restrict the problem domain to writing a new backend | for the existing compiler frontend. | | And you can. As it stands there is the LLVM backend that | everyone is familiar with, the GCC backend which is | nearing completion, and the Cranelift backend which is | written in Rust. | | Zig is similar. Yes, they are going to replace LLVM by | default, but they're not getting rid of their LLVM | backend entirely. The main difference between Rust and | Zig here is a matter of defaults, where Rust defaults to | using LLVM while Zig will default to their self-hosted | compiler. | | > Meanwhile, to truly understand Rust, one must be an | expert in C and learn the `rustc` code base. | | Are you under the impression that the "rustc" codebase is | written in C/C++? It is not... It uses LLVM, yes, but | it's written in Rust. | | > I certainly was not and I don't believe the GP you | originally responded to was saying that "Rust is a second | tier language due to [lack of self-hosted compiler]", so | hopefully we can set that statement aside and ignore it | now. | | The discussion started with the statement that Rust will | never replace unsafe languages without the ability to | self-host, and then continued with the statement that | "Self Hosting your own compiler traditionally was the | "end-game" of making a compile-able language. It's a sort | of proof of fitness that the language can literally stand | on its own." | | I don't think that was a completely unfair reading of | these statements. The implication is that Rust is "not a | fit language" because it "cannot stand on its own" and | therefore "will never replace unsafe languages". | zer8k wrote: | > I don't think that was a completely unfair reading of | these statements. The implication is that Rust is "not a | fit language" because it "cannot stand on its own" and | therefore "will never replace unsafe languages". | | I didn't intend this. The primary gripe I had was the | grammar being complicated (and to be fair...not really | available in an easy way). That means the places we are | most likely see such bare metal shenanigans may not adopt | it because they can't draft a XYZ Co. Compiler. This is a | semi-common pattern with chip manufacturers. | | The conversation diverged after that. Self-hosting is | simply a signal that a language is "strong enough to | stand on its own". That doesn't mean non-self hosted | languages are bad. It just means you still need something | else to bootstrap it. In the land of bare metal stuff | like this matters. | merlincorey wrote: | > Zig is similar. Yes, they are going to replace LLVM by | default, but they're not getting rid of their LLVM | backend entirely. | | In the article I linked, they did not say they were | replacing LLVM by default, but they did say it would | become the default for DEBUG builds due to the faster | speed of compilation, to be clear. | | > > Meanwhile, to truly understand Rust, one must be an | expert in C and learn the `rustc` code base. | | > Are you under the impression that the "rustc" codebase | is written in C/C++? It is not... It uses LLVM, yes, but | it's written in Rust. | | I am not under that impression, but I can see how my | phrasing leads to that conclusion. | | After reviewing Rust's Bootstrap on Github[0] I can now | more precisely state that one's understanding of low- | level Rust will be enhanced by knowing C/C++ (for the | LLVM portions) as well as Python (for the Rust does not | exist on this system downloading of the stage0 binary | Cargo and Rust compilers from somewhere else). | | > Cranelift backend which is written in Rust | | When this happens, it seems like it'll be possible to get | the LLVM bits out of the bootstrap process and lead to a | fully self-hosted Rust. | | So while you may not personally value that, it seems like | some people in the Rust community do. | | [0] https://github.com/rust- | lang/rust/tree/master/src/bootstrap | LegionMammal978 wrote: | > When this happens, it seems like it'll be possible to | get the LLVM bits out of the bootstrap process and lead | to a fully self-hosted Rust. | | What do you mean by "when this happens"? GP's point is | that this has _already_ happened: the Cranelift backend | is feature-complete from the perspective of the language | [0], except for inline assembly and unwinding on panic. | It was merged into the upstream compiler in 2020 [1], and | a Cranelift-based Rust compiler is perfectly capable of | building another Rust compiler (with some config | changes). | | [0] https://github.com/bjorn3/rustc_codegen_cranelift | | [1] https://github.com/rust-lang/rust/pull/77975 | zer8k wrote: | Except gluing yourself to LLVM has it's own problems. | Like, for example, any platform that LLVM doesn't support | you can't support either. LLVM is great. The monoculture | and smug elitism it produces is not. | | > Acting like it's a second tier language because | actively doing so not a top focus of the community is | gatekeep-y and ridiculous. | | It is probably one of the major reasons we won't see a | Rust compiler shipped with an operating system for a very | long time. That doesn't make it second tier. However, | Rust fans seem to want to stick their head in the sand | when their baby is criticized. I am a Rust (language) fan | myself. I am just willing to criticize the language. I do | not understand why the Rust community has such a volatile | response to honest, valid, criticism. | learn-forever wrote: | it's a ridiculous criticism, and the insult doesn't make | it less ridiculous | dralley wrote: | >It is probably one of the major reasons we won't see a | Rust compiler shipped with an operating system for a very | long time. | | Even most linux distros don't ship with GCC out of the | box... much less MacOS and Windows with their respective | compilers. | | If your standard is "Gentoo and FreeBSD will never ship | it out of the box" then I'm going to 100% stand by my | statement that this is weird and gatekeep-y. | | Especially when the Windows kernel and userspace system | libraries both have Rust in them. | | https://www.bleepingcomputer.com/news/microsoft/new- | windows-... | | https://www.thurrott.com/windows/282471/microsoft-is- | rewriti... | tialaramex wrote: | > we won't see a Rust compiler shipped with an operating | system for a very long time. | | I can't figure out what this constraint means. | | My Windows laptop doesn't seem to have provided a C | compiler, so, maybe that's a problem for Windows? | | Huh, well I guess I can buy or download a third party | compiler, that's easy enough, but then, I can do that for | Rust too, so, doesn't seem like a difference. | | Meanwhile on this Fedora machine, the Rust compiler came | with the OS. So, is this not an operating system? Maybe | the stuff it comes with isn't "shipped with" it somehow? | And so there's no C compiler "shipped with" this | operating system either, although GCC was installed too ? | I just don't know what to make of such a criticism. | patrec wrote: | > Most graduates of Computer Science programs can, perhaps with | some trouble, implement a half decent C compiler in a weekend | or two. | | Where "most" of course means < 0.1%. | badsectoracula wrote: | > Most graduates of Computer Science programs can, perhaps with | some trouble, implement a half decent C compiler in a weekend | or two. This is not a footnote. This fact alone means that for | any given piece of hardware you're more likely to find a random | C compiler you can use than anything else. | | I think C being a (relatively) very simple is indeed a feature | it has - however not so much because you can make a compiler | for it easily (not that it isn't a pro, but it isn't that | important in practice) but because it means it is easier to | learn and easier to write tools for. | dale_glass wrote: | I don't see how that reasoning is supposed to work in modern | times. | | Who out there is seriously using a compiler churned out in a | weekend? The fact that you can do it doesn't mean anybody | seriously would use that. | | We're also not really creating architectures anymore. There's | RISC-V, and Rust already supports that. | zer8k wrote: | > Who out there is seriously using a compiler churned out in | a weekend? | | Someone at a chip manufacturer writing something for a brand | new chipset, for example. It takes a long time to get stuff | shoved into GCC. It's only in recent history has life settled | on one or two "big" compilers. There are still plenty of | other places where you will find bespoke compilers. Perhaps | not commonly, but they do exist (especially in embedded). | zabzonk wrote: | perhaps it is just me, but i have never experienced any of the | problems outlined in the comments here, despite of writing a | shedload of C and C++ code (and fortran, assembler and other | stuff). and i don't think i am a coding god. | AnimalMuppet wrote: | (2010) | dang wrote: | Added. Thanks! | dang wrote: | Related: | | _A Guide to Undefined Behavior in C and C++ (2010)_ - | https://news.ycombinator.com/item?id=18372613 - Nov 2018 (103 | comments) | | _A Guide to Undefined Behavior in C and C++ (2010)_ - | https://news.ycombinator.com/item?id=9884074 - July 2015 (10 | comments) | | _A Guide to Undefined Behavior in C and C++, Part 1_ - | https://news.ycombinator.com/item?id=2544159 - May 2011 (2 | comments) | jbandela1 wrote: | If you want some nice examples of how undefined behavior results | in weirdness, see | https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63... | | An interesting example from there is how the compiler can turn | int table[4]; bool exists_in_table(int v) { | for (int i = 0; i <= 4; i++) { if (table[i] == v) | return true; } return false; } | | Into: bool exists_in_table(int v) { | return true; } | fluoridation wrote: | What's odd about that example is that the optimization is only | valid if the loop in fact overflows the array every time. So | the compiler is proving that the array is being overflowed and | rather than emitting a warning to that effect, it generates | absurd code. | kllrnohj wrote: | > So the compiler is proving that the array is being | overflowed and rather than emitting a warning to that effect | <source>:5:13: warning: unsafe buffer access [-Wunsafe- | buffer-usage] 5 | if (table[i] == v) return | true; | ^~~~~ | | https://godbolt.org/z/zGxnKxvz6 | | This one is weirdly hard to get a compiler warning out of | which is a fair critque, but so many of the "Look what the | compiler did to my UB silently!" issues are not at all silent | and would have been stopped dead with "-Wall -Wextra -Werror" | iainmerrick wrote: | As noted elsewhere in this thread, GCC by default does the | "optimization" and doesn't warn. No doubt there are other | examples where Clang is the one that misbehaves. | | How are we supposed to know whether our code is being | compiled sensibly or not, without poring over the | disassembly? Just set all the warning flags and hope for | the best? | UncleMeat wrote: | I think that a big problem is that for every compile that | seems "not sensible" and is actually not sensible, there | are 100s or 1000s of compiles that would look absolutely | insane to a human but are actually exactly what you want | when you sit down and think about it for a long time. | | Almost all of the "don't do the overly clever stuff!" | proposals would throw away a huge amount of actually | productive clever stuff. | fluoridation wrote: | I think what the GP means by "not sensible" is that | proving that the code is broken in order to silently | optimize it more aggressively is not sensible. If your | theorem proven can find a class of bugs then have it emit | diagnostics. Don't _only_ use those bugs to make the code | run faster. Yes, make the code run faster, but let me | know I may be doing something nonsensical, since chances | are that it is nonsensical and it doesn 't cost anything | at run time. | UncleMeat wrote: | Right and the next part is the hard part: defining this | clearly. What I'm saying is that there is a surprising | amount of "wait, actually I do want that" when you dig | into this proposal. | mike_hock wrote: | A warning is only useful if it prescribes a code | transformation that affirms the programmer's intent and | silences the warning (unless the warning was a true | positive and caught a bug). You cannot simply emit a | warning every time you optimize based on UB. | | There is no `if(obvious out-of-bound access) silently | emit nonsense har har har` in the compiler's source code. | The compiler doesn't understand intent or the program as | a whole. It applies micro transformations that all make | sense in isolation. And while the compiler also tries to | detect erroneous programming patterns and warn about | those, that's exceedingly more difficult. | moefh wrote: | > whether our code is being compiled sensibly or not | | I'm failing to see what's not sensible about how that | code is compiled. | | The only possible way that function could return false is | if you read past the end of the array and the value there | happens to be different from `v`. Is it really the more | sensible to rely on that, rather than fixing a known | behavior in case of array overflow? | robinsonb5 wrote: | If the compiler's going to interpret undefined behaviour | as license to do something that runs counter to the | programmer's expectations, the most sensible course of | action is for the compiler to yell very loudly about it | instead of near-silently producing (differently!) broken | code. | | Currently that piece of code doesn't trigger a warning | with -Wall. It's not even flagged with -Wextra - it needs | -Weverything. | moefh wrote: | One man's "broken code produced by the compiler" is | another man's "excellently optimized code by the | compiler". | | Where to draw the line is not always clear, but here's a | very clear-cut example[1] where emitting a warning would | be bad. If you don't want to watch the video, it's | basically this: | | - the code technically contains undefined behavior, but | it will never be actually triggered by the program | | - changing the code to remove undefined behavior forces | the compiler to emit terrible code | | Making the compiler yell at the programmer in this case | would be terrible, but it's clearly a consequence of what | you're asking. | | [1] https://youtu.be/yG1OZ69H_-o?t=2358 | fanf2 wrote: | No, the logic for the optimization is: | | - a correct program does not access table[4] | | - therefore the loop must always exit early | | - the only way to exit early is to return true | tedunangst wrote: | No, the compiler knows the array isn't overflowed, because C | programs don't contain overflows. Therefore the loop must | exit via one of the return true statements. | JonChesterfield wrote: | The amazing part about examples like that is people read them, | check that the compiler really does work on that basis, and | then continue writing things in C++ anyway. Wild. | | Suppose I should expand on this. The idea seems to be either | 1/disbelief - compilers wouldn't really do this or 2/ | infallibility - my code contains no UB. | | Neither of those positions bears up well under reality. | Programming C++ is working with an adversary that will make | your code faster wherever it can, regardless of whether you | like the resulting behaviour of the binary. | | I suspect rust has inherited this perspective in the compiler | and guards against it with more aggressive semantic checks in | the front end. | Gibbon1 wrote: | What's amazing is programmers haven't tared and feathered the | standards committee and compiler writers for allowing crap | like that. | ninepoints wrote: | It's just as "amazing" to read these takes from techno | purists. You use software written in C++ daily, and it can be | a pragmatic choice regardless of your sensibilities. | erik_seaberg wrote: | And we have the core dumps to prove it. | | When any Costco sells a desktop _ten thousand_ times faster | than the one I started on, we can afford runtime sanity | checks. We don't have to keep living like this, with stacks | that randomly explode. | johnbellone wrote: | But it isn't Rust. | jacquesm wrote: | Lots of things 'aren't Rust'. In fact almost everything | isn't Rust. For now. That may change in due course but | right now I would guestimate the amount of Rust code | running on my daily drivers to pretty close to zero%. The | bulk is C or C++. | angiosperm wrote: | Hardly anything is. Literally none of the programs on my | machine are coded in Rust. (Firefox is reputed to have a | bit in it.) | jacquesm wrote: | About FF and Rust: | | https://news.ycombinator.com/item?id=30743577 | JonChesterfield wrote: | Definitely. There's loads of value delivered by C++ | implementations, including implementations of C++ and other | languages. The language design of speed over safety mostly | imposes a cost in developer / debugging time and fear of | upgrading the compiler toolchain. Occasionally it shows up | in real world disasters. | | I think we've got the balance wrong, partly because some | engineering considerations derive directly from separate | compilation. ODR no diagnostic required doesn't have to be | a thing any more. | peppermint_gum wrote: | >The amazing part about examples like that is people read | them, check that the compiler really does work on that basis, | and then continue writing things in C++ anyway. Wild. | | Well, in modern C++ this code would look like this: | std::array<int, 4> table; bool exists_in_table(int v) | { for (auto &elem : table) { if | (elem == v) return true; } return | false; } | | Or even simpler: std::array<int, 4> table; | bool exists_in_table(int v) { return | std::ranges::contains(table, v); } | | There's no shortage of footguns in C++, but nonetheless, | modern C++ is safer than C. | mike_hock wrote: | Weirdly, GCC fails to optimize this, but Clang does (if you | make the table static as in the original example). | gizmo686 wrote: | I actually would prefer to get the second output. The result | is wrong, but consistantly and deterministically so. The | naive implementation of the broken code is a heisenbug. | Sometimes it will work, and sometimes it won't, and any | attempt to debug it would likely perterb the system enough to | make the issue not surface. | | It wouldn't suprise me if I have run into the latter | situation without relizing it. When I got the the problem, I | would have just (incorrectly) assumed that the memory right | after the array happened to have the relevent value. I would | be counting my blessings that it happened consistantly enough | to be debuggable. | jll29 wrote: | I agree that it is better to get deterministic and | predictable behavior. | | Reminds me of when for a while, I worked on HP 9000s under | HP-UX and in parallel on an Intel 80486-based Linux box, | and what I noticed is that the Unix workstations crashed | sooner and more predictably with segmentation faults than | Linux on the PC (not sure if this has changed since the | early 1990s - probably had to do with the MMU); so | developing on HP under Unix and then finally compiling | under Linux led to better code quality. | _gabe_ wrote: | > check that the compiler really does work on that basis, and | then continue writing things in C++ anyway. Wild. | | My compiler (MSVC) doesn't do that[0]. Clang also doesn't do | this[1]. It's wild to me that GCC does this optimization[2]. | It's very subtle, but Raymond Chen and OP both say a compiler | _can_ create this optimization, not that it _will_. | | [0]: https://godbolt.org/z/bdx4EMzxe | | [1]: https://godbolt.org/z/z833Wa391 | | [2]: https://godbolt.org/z/6b8aq59M9 | jandrewrogers wrote: | > The amazing part about examples like that is people read | them, check that the compiler really does work on that basis, | and then continue writing things in C++ anyway. | | That isn't idiomatic C++ and hasn't been for a long time. | Sure, it's _possible_ to do it retro C-style, because | backward compatibility, but you generally don 't see that in | a modern code base. | JonChesterfield wrote: | The modern codebase has grown from a legacy one. The legacy | one with parts of the codebase that were C, then got | partially turned into object oriented C++, then partially | turned into template abstractions. The parts least likely | to have comprehensive test coverage. _That_ place is indeed | where a compiler upgrade is most likely to change the | behaviour of your application. ___________________________________________________________________ (page generated 2023-08-17 23:00 UTC)