[HN Gopher] A Guide to Undefined Behavior in C and C++ (2010)
       A Guide to Undefined Behavior in C and C++ (2010)
       Author : tmalsburg2
       Score  : 52 points
       Date   : 2023-08-17 17:18 UTC (5 hours ago)
 (HTM) web link (blog.regehr.org)
 (TXT) w3m dump (blog.regehr.org)
       | Joker_vD wrote:
       | > Case 2: (b == 0) || ((a == INT32_MIN) && (b == -1))
       | > A Java compiler, in contrast, has obligations in Case 2 and
       | must deal with it (though in this particular case, it is likely
       | that there won't be runtime overhead since processors can usually
       | provide trapping behavior for integer divide by zero).
       | Actually, there _will_ be runtime overhead on x86 /x64: Java
       | mandates that Integer.MinValue / (-1) evaluates to
       | Integer.MinValue (see 15.17.2. "Division Operator /" of the Java
       | Language Specification) but IDIV instruction raises #DE in such
       | circumstance. So the JITter actually emits
       | cmp  eax, 0x80000000             jne  .normalCase             xor
       | edx, edx             cmp  $reg, -1              je   .specialCase
       | .normalCase:             cdq             idiv $reg
       | .specialCase:
       | code sequence as you can see in its source ([0][1]) instead of
       | simplistic "cdq; idiv $reg": because it _does not_ want trapping
       | behaviour in this particular case; but e.g. AArch64 doesn 't trap
       | neither division by zero nor INT_MIN / -1. That's why accurately
       | implementing your language's semantics on different platforms is
       | so annoying and why C standard left itself a nice shortcut.
       | [0]
       | https://github.com/openjdk/jdk/blob/d27daf01d6361513a815e783...
       | [1]
       | https://github.com/openjdk/jdk/blob/d27daf01d6361513a815e783...
         | fluoridation wrote:
         | On the other hand, C left the burden of implementing portable
         | semantics to its users.
           | Joker_vD wrote:
           | Yes, but when C was being made, the application-level
           | programmers knew the quirks of the platforms they used just
           | as well as the compiler writers because they were almost
           | precisely the same people.
       | Animats wrote:
       | The three big questions:
       | 1. How big is it?
       | 2. Who owns it?
       | 3. Who locks it?
       | Most undefined behavior in C/C++ involves those three questions.
       | #1 is historically the most troublesome. And the most
       | inexcusable. Pascal, which predates C, didn't have that problem,
       | because arrays carried size info. Nor did Algol, Modula I, Modula
       | II, and Modula III. Modula I was a very low level language -
       | device registers were a language concept.
       | Something I wrote on this back in 2012.[1] There was some
       | consensus at the time that this would work and would be backwards
       | compatible with C. But it would be a tough sell, and I didn't
       | want to spend my life selling it.
       | [1] http://animats.com/papers/languages/safearraysforc43.pdf
         | jll29 wrote:
         | ...and Ada, too. I like the idea of attributes of data objects,
         | to access the size of x simple write x'Size (also for types
         | e.g. Natural'Size).
         | The Wirth languages (from which Ada is also a descendant) were
         | so much more readable than C, yet relatively capable for
         | systems programming, as demonstrated by systems like TeX,
         | MacOS, Wirth's Modula compilers and the OS for the Lilith
         | workstation he co-designed from scratch.
           | Gibbon1 wrote:
           | Never used Ada but I think you can define range types so int
           | range 0...11. Which I feel is something that you really want
           | in embedded and applications level programming.
             | thesuperbigfrog wrote:
             | >> Never used Ada but I think you can define range types so
             | int range 0...11.
             | Yes. Ada supports integral types with custom ranges:
             | https://learn.adacore.com/courses/intro-to-
             | ada/chapters/stro...
             | tialaramex wrote:
             | In the medium-long term I want to do this for Rust as
             | "Pattern types" because the thing I actually want (custom
             | types with niches) is gated on Pattern types, as the way to
             | explain to the type system where the niche goes is a
             | Pattern. I was persuaded that we can't/ shouldn't just say
             | we'll half ass it, we must do it properly if we're doing
             | it.
             | e.g. I don't necessarily have a use for an integer from 0
             | to 11, but I _do_ see a use for BalancedI8, a one byte type
             | with values -127 to +127 via 0, thus omitting -128. I
             | reckon lots of people don 't need -128, whereas a niche is
             | very useful. Rust provides NonZeroI8, which has -128
             | through +127 but no zero, but I find that's less often what
             | you want, and it's not today possible to make your own in
             | stable Rust (and in nightly Rust you need a not-for-mortals
             | perma-unstable attribute today).
         | winrid wrote:
         | #4 which is partly #2 - what thread is this callback being
         | invoked in? The calling thread? A thread pool in the library?
         | Mostly a problem I have in java libraries, though.
         | JonChesterfield wrote:
         | I think a C implementation with overhead instead of UB is
         | implementable. I'd like to know what the fundamental
         | performance delta we get from UB is. Likewise not sure it's the
         | right choice for my life's work.
           | Quekid5 wrote:
           | The MINIMUM baseline is probably somewhere around
           | ASAN/UBSAN/etc. and those aren't exactly cheap... and they
           | don't even promise to catch _all_ the problems. The problem
           | is that almost every single little thing you can do in C has
           | _potential_ for UB, even just the + operator.
           | So it would absolutely come at a HUGE performance cost,
           | unfortunately.
           | More esoteric stuff is: If you do pointer arithmetic that
           | technically goes out of bounds and then _in_ bounds again...
           | that 's technically UB (can't remember if this is C++ only or
           | both), so you can't rely on knowing where everything is +
           | bounds checks.
       | matt3210 wrote:
       | What behaviors are undefined in rust? Oh wait nobody knows, since
       | it has no standard or language spec.
         | jcranmer wrote:
         | * Reading uninitialized memory
         | * Violating pointer provenance
         | * Out-of-bounds pointer accesses (though unlike C, I think,
         | it's legal to make a pointer go out-of-bounds and bring it back
         | in-bounds and use it)
         | * Use-after-lifetime
         | * Storing trap representations in variables
         | * Having two mutable references to the same memory location
         | * Data races
         | Not an exhaustive list, and C has most of these (even the last
         | one, although change "two mutable references" to "two restrict
         | pointers"). Of course, C itself doesn't have an exhaustive list
         | (J.2 is not, in fact, an exhaustive list).
           | JonChesterfield wrote:
           | Pointer provenance is a nice example. A block of memory
           | cannot be read as an array of simd types sometimes and scalar
           | types otherwise. It can't contain atomic values which are
           | operated on using non-atomic operations during program
           | startup before you spawn any threads.
           | There were proposals to let one mmap existing structures but
           | I don't know if any landed. Usually done with reinterpret
           | cast and hoping that rule violation doesn't break you.
           | Pointer provenance does make most application code faster but
           | other times it opens a performance gap that you have to step
           | outside of C++ to close. Compiler extensions, switching off
           | the analysis, changing language.
             | angiosperm wrote:
             | Use of mmap itself is undefined in the language.
             | Posix provides a definition that programs rely on, instead.
             | Implementers are allowed to define literally anything the
             | union of all standards leaves undefined.
               | JonChesterfield wrote:
               | Mmap itself is alright. You've got a void* from
               | somewhere, that's OK. You can placement new into it to
               | make objects.
               | What isn't allowed is casting it to a hashtable type and
               | then using it as such. Because there is no hashtable
               | instance anywhere, and specifically not there, so you've
               | violated the pointer aliasing rules.
               | The obvious fix is to guarantee that placement new
               | doesn't change the bytes, perhaps only for trivially
               | copyable types or similar constraint. I didn't see the
               | proposals in that direction land but also didn't see them
               | fail, so maybe the newer standard permits it.
               | LegionMammal978 wrote:
               | As I understand it, that's precisely what
               | std::start_lifetime_as<T>() does: it effectively performs
               | a placement new to create a T object, except that it
               | retains the existing bytes at the address. It only works
               | with implicit-lifetime types (i.e., scalars, or classes
               | with a trivial constructor), though, so it probably
               | wouldn't work with your hash table example, except
               | perhaps for an inline hash table.
               | JonChesterfield wrote:
               | Superb! Looking through https://en.cppreference.com/w/cpp
               | /memory/start_lifetime_as, this appears to be the right
               | thing. It also has volatile overloads (which it looks
               | like placement new still does not). This doesn't appear
               | to be implemented in libc++ yet but that seems fixable,
               | it'll go down the same object construction logic
               | placement new does. Thank you for the reference, that'll
               | fix some ugly edge cases in one of my libraries.
             | agalunar wrote:
             | > A block of memory cannot be read as an array of simd
             | types sometimes and scalar types otherwise.
             | As far as I can tell, it is _currently_ the case that,
             | _using raw pointers,_ this is not actually undefined
             | behavior (but I never entirely trust my conclusions on
             | these matters).
             | "&mut T and &T follow LLVM's scoped noalias model"
             | [1][referring to 2 and 3] but I am fairly sure this does
             | not currently apply to raw pointers, and "provenance is
             | implicitly shared with all pointers transitively derived
             | from the original pointer through operations like offset,
             | borrowing, and pointer casts." [4]
             | [1] https://doc.rust-lang.org/reference/behavior-
             | considered-unde...
             | [2] https://llvm.org/docs/LangRef.html#pointeraliasing
             | [3] "noalias" under
             | https://llvm.org/docs/LangRef.html#parameter-attributes
             | [4] https://doc.rust-lang.org/core/ptr/index.html
             | Also excellent are
             | https://faultlore.com/blah/fix-rust-pointers
             | https://www.ralfj.de/blog/2018/07/24/pointers-and-
             | bytes.html
             | https://www.ralfj.de/blog/2020/12/14/provenance.html
             | https://www.ralfj.de/blog/2022/04/11/provenance-
             | exposed.html
             | It seems likely you'd already be familiar with these; I'm
             | just putting them out there for anyone interested.
               | JonChesterfield wrote:
               | LLVM can represent various aliasing relationships, modulo
               | some risk of C++ inspired bugs in some passes. They might
               | all be stamped out now. I remember a bug report about one
               | that was open for many years.
               | I'm happy to hear rust can (probably) represent the same
               | relationships LLVM can. C++ cannot, at least as of about
               | two years ago when I last looked through the
               | corresponding papers. All it can do is different types do
               | not alias, where atomic_int and int are different types.
         | proto_lambda wrote:
         | There is no undefined behaviour in Safe Rust. You're right
         | about Unsafe Rust of course.
         | lionkor wrote:
         | The ultimate "the code is the documentation" is "the compiler
         | is the language spec".
           | thesuperbigfrog wrote:
           | >> The ultimate "the code is the documentation" is "the
           | compiler is the language spec".
           | Rust has a great potential to become a replacement for C and
           | C++, but the lack of a language specification is a
           | shortcoming that needs to be addressed for it to see wider
           | adoption, especially for safety-critical systems.
           | If the Rust compiler does something surprising, people will
           | ask, "Is this a bug?" and without a spec the answer becomes
           | the language developers or the community asking, "What should
           | the compiler do in this situation?".
           | It makes sense because the correct behavior (whatever that
           | is) has not been defined, but it has a feeling of "we are
           | making this up as we go along" because there is no formalized
           | answer defined. While this approach is fine for running your
           | website or building a command line tool, it is not acceptable
           | for safety-critical software. If the software breaks and
           | people die, the "we are making this up as we go along"
           | approach is not acceptable because it has too much risk.
             | lionkor wrote:
             | I fully agree, and its definitely a strange feeling coming
             | from C++ to not have a single, complete and extensive spec
             | to read up on if all else fails.
             | I want to like Rust, but its already a kitchen sink on par
             | with C++ in complexity and misused quirks, not to mention
             | macros which hide complexity just like C macros did, that
             | the lack of a committee and spec makes it very difficult to
             | trust that it won't get more and more features as time goes
             | on (becoming like C++, in only the bad ways).
             | I understand they have an RFC process, but thats not enough
             | for a language which is now so commonplace in discussion
             | (usually in the form of "if you did it in Rust, this
             | problem wouldnt exist", which is often even true).
               | iknowstuff wrote:
               | Rust macros don't hide anything. They're hygienic and
               | clearly annotated when used.
               | mike_hock wrote:
               | Rust macros are a crutch to work around the language's
               | shortcomings. It's just a better crutch than C's.
             | iknowstuff wrote:
             | >a shortcoming that needs to be addressed for it to see
             | wider adoption, especially for safety-critical systems.
             | This seems like just a hunch of yours that does not seem to
             | be reflected by the real world.
               | thesuperbigfrog wrote:
               | >> This seems like just a hunch of yours that does not
               | seem to be reflected by the real world.
               | What safety-critical systems are written in Rust?
               | Where can I buy a validated Rust toolchain for safety-
               | critical work?
               | Ferrocene is an effort to build a safety-critical Rust,
               | but it is not done yet:
               | https://ferrous-systems.com/blog/ferrocene-update/
             | mjw1007 wrote:
             | The good news is that the Rust project has recently agreed
             | to write a specification, and has a budget to hire an
             | editor for it.
             | The less good news is that it's likely to take a long time
             | before anything resembling a complete description gets
             | written.
             | You can follow its status at https://github.com/rust-
             | lang/rust/issues/113527
               | thesuperbigfrog wrote:
               | >> The good news is that the Rust project has recently
               | agreed to write a specification, and has a budget to hire
               | an editor for it.
               | This is awesome to hear. Following that issue . . .
       | zer8k wrote:
       | > In the long run, unsafe programming languages will not be used
       | by mainstream developers, but rather reserved for situations
       | where high performance and a low resource footprint are critical.
       | I see no world where so-called "unsafe" languages would not be
       | used. Most graduates of Computer Science programs can, perhaps
       | with some trouble, implement a half decent C compiler in a
       | weekend or two. This is not a footnote. This fact alone means
       | that for any given piece of hardware you're more likely to find a
       | random C compiler you can use than anything else. Rust, being the
       | most likely contender to replace it, still cannot self-host and
       | the grammar is exponentially more complicated than C. It is more
       | like C + <whatever> will co-exist peacefully than something like
       | C being replaced (even ignoring the millions of lines of code
       | that already exist). Not for performance reasons but more that
       | you can churn out a C compiler quickly for almost anything given
       | a spec of the hardware.
       | On topic, I find a desk reference for this is very useful. The
       | CERT C standard is pretty good to thumb through even if you don't
       | adhere to every suggestion.
         | pjmlp wrote:
         | Just wait until CVE become a liability like handling hazardous
         | chemicals.
         | ladberg wrote:
         | Eh, I don't disagree that unsafe languages will continue to be
         | used, but I disagree with ease of compiler design as the
         | reason.
         | You are comparing one of the easier languages to write a
         | compiler for (C) with one of the hardest (Rust), and that's not
         | due to UB but due to other facets of the languages. I could
         | make up a new language that's equivalent to C in every way
         | except replace all UB with defined behavior and it wouldn't
         | make the naive compiler any different.
         | Additionally, writing a compiler for a language should really
         | be a thing that happens only a handful of times while executing
         | the code happens trillions of times so I hope we don't
         | sacrifice safety to save compiler authors some work.
         | dralley wrote:
         | > Rust, being the most likely contender to replace it, still
         | cannot self-host
         | What do you mean, "still cannot self-host?"
         | You say that like it's a critical failure of the Rust project
         | that they need and are attempting to address rather than a
         | trivia item. Rust is perfectly happy relying on LLVM just like
         | (checks notes) _half the other languages in existence_.
         | Libraries like LLVM are precisely what the comment you quote is
         | talking about.
         | I'm not even sure that's true, anyway, with the cranelift
         | backend. Someone can chime in on whether it's good enough for
         | bootstrapping.
           | merlincorey wrote:
           | Self Hosting your own compiler traditionally was the "end-
           | game" of making a compile-able language. It's a sort of proof
           | of fitness that the language can literally stand on its own.
           | This article about Zig achieving self-hosted status in
           | 2022[0] points out that they gained many advantages at the
           | cost of a lot of time and effort through this process.
           | Incidentally, they decided to self-host while also supporting
           | LLVM because of deficiencies in LLVM (mainly speed and target
           | limitations). This flexibility includes a separate "C"
           | backend to compile Zig to C in order to target for example
           | game consoles that require a specific C compiler be used.
           | > You say that like it's a critical goal of the Rust project
           | rather than a trivia item.
           | In my opinion, you are overly minimizing the potential
           | benefits to Rust and the Rust community for Rust to be self-
           | hosted.
           | Of course, practically, right now it doesn't matter because
           | most people are more than happy to use the already working
           | system.
           | [0] https://kristoff.it/blog/zig-self-hosted-now-what/
             | dralley wrote:
             | As I said, the cranelift backend exists, and it provides
             | many of the same benefits such as improved compilation
             | speed. And it's written in Rust.
             | But it still feels like a trivia item. C compilers written
             | in C exist, but almost nobody actually uses them. They use
             | GCC, Clang, and MSVC, written in C++. Everybody knows that
             | it's possible to self-host C, so the benefit of actually
             | doing so in practice is minimal.
             | It's obviously possible to write a Rust compiler in Rust
             | end-to-end. Acting like it's a second tier language because
             | actively doing so not a top focus of the community is
             | gatekeep-y and ridiculous.
               | merlincorey wrote:
               | > Acting like it's a second tier language because
               | actively doing so not a top focus of the community is
               | gatekeep-y and ridiculous.
               | Here's where I think you are quite a bit off target,
               | personally.
               | I certainly was not and I don't believe the GP you
               | originally responded to was saying that "Rust is a second
               | tier language due to [lack of self-hosted compiler]", so
               | hopefully we can set that statement aside and ignore it
               | now.
               | Let's instead focus on your first statement, which is
               | directly related to what GP and I were arguing:
               | > It's obviously possible to write a Rust compiler in
               | Rust end-to-end.
               | It is certainly possible but actually doing so is
               | completely non-obvious because the grammar for Rust is
               | much more complicated than C, and Rust has no formal
               | language specification (let alone an international
               | standard).
               | While Python does not have an international standard, it
               | does have a formal language specification, which is what
               | allows for things like PyPy to exist.
               | Meanwhile, to truly understand Rust, one must be an
               | expert in C and learn the `rustc` code base.
               | It seems like, practically, knowing C and being able to
               | write compilers in C is quite useful if you want to make
               | an impact in Rust or maybe try your hand at making some
               | future Rust replacement (hopefully with a language
               | specification that others can follow).
               | dralley wrote:
               | > It is certainly possible but actually doing so is
               | completely non-obvious because the grammar for Rust is
               | much more complicated than C, and Rust has no formal
               | language specification (let alone an international
               | standard).
               | The Rust compiler frontend is written in Rust. It doesn't
               | matter how non-trivial writing a Rust frontend is if you
               | can restrict the problem domain to writing a new backend
               | for the existing compiler frontend.
               | And you can. As it stands there is the LLVM backend that
               | everyone is familiar with, the GCC backend which is
               | nearing completion, and the Cranelift backend which is
               | written in Rust.
               | Zig is similar. Yes, they are going to replace LLVM by
               | default, but they're not getting rid of their LLVM
               | backend entirely. The main difference between Rust and
               | Zig here is a matter of defaults, where Rust defaults to
               | using LLVM while Zig will default to their self-hosted
               | compiler.
               | > Meanwhile, to truly understand Rust, one must be an
               | expert in C and learn the `rustc` code base.
               | Are you under the impression that the "rustc" codebase is
               | written in C/C++? It is not... It uses LLVM, yes, but
               | it's written in Rust.
               | > I certainly was not and I don't believe the GP you
               | originally responded to was saying that "Rust is a second
               | tier language due to [lack of self-hosted compiler]", so
               | hopefully we can set that statement aside and ignore it
               | now.
               | The discussion started with the statement that Rust will
               | never replace unsafe languages without the ability to
               | self-host, and then continued with the statement that
               | "Self Hosting your own compiler traditionally was the
               | "end-game" of making a compile-able language. It's a sort
               | of proof of fitness that the language can literally stand
               | on its own."
               | I don't think that was a completely unfair reading of
               | these statements. The implication is that Rust is "not a
               | fit language" because it "cannot stand on its own" and
               | therefore "will never replace unsafe languages".
               | zer8k wrote:
               | > I don't think that was a completely unfair reading of
               | these statements. The implication is that Rust is "not a
               | fit language" because it "cannot stand on its own" and
               | therefore "will never replace unsafe languages".
               | I didn't intend this. The primary gripe I had was the
               | grammar being complicated (and to be fair...not really
               | available in an easy way). That means the places we are
               | most likely see such bare metal shenanigans may not adopt
               | it because they can't draft a XYZ Co. Compiler. This is a
               | semi-common pattern with chip manufacturers.
               | The conversation diverged after that. Self-hosting is
               | simply a signal that a language is "strong enough to
               | stand on its own". That doesn't mean non-self hosted
               | languages are bad. It just means you still need something
               | else to bootstrap it. In the land of bare metal stuff
               | like this matters.
               | merlincorey wrote:
               | > Zig is similar. Yes, they are going to replace LLVM by
               | default, but they're not getting rid of their LLVM
               | backend entirely.
               | In the article I linked, they did not say they were
               | replacing LLVM by default, but they did say it would
               | become the default for DEBUG builds due to the faster
               | speed of compilation, to be clear.
               | > > Meanwhile, to truly understand Rust, one must be an
               | expert in C and learn the `rustc` code base.
               | > Are you under the impression that the "rustc" codebase
               | is written in C/C++? It is not... It uses LLVM, yes, but
               | it's written in Rust.
               | I am not under that impression, but I can see how my
               | phrasing leads to that conclusion.
               | After reviewing Rust's Bootstrap on Github[0] I can now
               | more precisely state that one's understanding of low-
               | level Rust will be enhanced by knowing C/C++ (for the
               | LLVM portions) as well as Python (for the Rust does not
               | exist on this system downloading of the stage0 binary
               | Cargo and Rust compilers from somewhere else).
               | > Cranelift backend which is written in Rust
               | When this happens, it seems like it'll be possible to get
               | the LLVM bits out of the bootstrap process and lead to a
               | fully self-hosted Rust.
               | So while you may not personally value that, it seems like
               | some people in the Rust community do.
               | [0] https://github.com/rust-
               | lang/rust/tree/master/src/bootstrap
               | LegionMammal978 wrote:
               | > When this happens, it seems like it'll be possible to
               | get the LLVM bits out of the bootstrap process and lead
               | to a fully self-hosted Rust.
               | What do you mean by "when this happens"? GP's point is
               | that this has _already_ happened: the Cranelift backend
               | is feature-complete from the perspective of the language
               | [0], except for inline assembly and unwinding on panic.
               | It was merged into the upstream compiler in 2020 [1], and
               | a Cranelift-based Rust compiler is perfectly capable of
               | building another Rust compiler (with some config
               | changes).
               | [0] https://github.com/bjorn3/rustc_codegen_cranelift
               | [1] https://github.com/rust-lang/rust/pull/77975
               | zer8k wrote:
               | Except gluing yourself to LLVM has it's own problems.
               | Like, for example, any platform that LLVM doesn't support
               | you can't support either. LLVM is great. The monoculture
               | and smug elitism it produces is not.
               | > Acting like it's a second tier language because
               | actively doing so not a top focus of the community is
               | gatekeep-y and ridiculous.
               | It is probably one of the major reasons we won't see a
               | Rust compiler shipped with an operating system for a very
               | long time. That doesn't make it second tier. However,
               | Rust fans seem to want to stick their head in the sand
               | when their baby is criticized. I am a Rust (language) fan
               | myself. I am just willing to criticize the language. I do
               | not understand why the Rust community has such a volatile
               | response to honest, valid, criticism.
               | learn-forever wrote:
               | it's a ridiculous criticism, and the insult doesn't make
               | it less ridiculous
               | dralley wrote:
               | >It is probably one of the major reasons we won't see a
               | Rust compiler shipped with an operating system for a very
               | long time.
               | Even most linux distros don't ship with GCC out of the
               | box... much less MacOS and Windows with their respective
               | compilers.
               | If your standard is "Gentoo and FreeBSD will never ship
               | it out of the box" then I'm going to 100% stand by my
               | statement that this is weird and gatekeep-y.
               | Especially when the Windows kernel and userspace system
               | libraries both have Rust in them.
               | https://www.bleepingcomputer.com/news/microsoft/new-
               | windows-...
               | https://www.thurrott.com/windows/282471/microsoft-is-
               | rewriti...
               | tialaramex wrote:
               | > we won't see a Rust compiler shipped with an operating
               | system for a very long time.
               | I can't figure out what this constraint means.
               | My Windows laptop doesn't seem to have provided a C
               | compiler, so, maybe that's a problem for Windows?
               | Huh, well I guess I can buy or download a third party
               | compiler, that's easy enough, but then, I can do that for
               | Rust too, so, doesn't seem like a difference.
               | Meanwhile on this Fedora machine, the Rust compiler came
               | with the OS. So, is this not an operating system? Maybe
               | the stuff it comes with isn't "shipped with" it somehow?
               | And so there's no C compiler "shipped with" this
               | operating system either, although GCC was installed too ?
               | I just don't know what to make of such a criticism.
         | patrec wrote:
         | > Most graduates of Computer Science programs can, perhaps with
         | some trouble, implement a half decent C compiler in a weekend
         | or two.
         | Where "most" of course means < 0.1%.
         | badsectoracula wrote:
         | > Most graduates of Computer Science programs can, perhaps with
         | some trouble, implement a half decent C compiler in a weekend
         | or two. This is not a footnote. This fact alone means that for
         | any given piece of hardware you're more likely to find a random
         | C compiler you can use than anything else.
         | I think C being a (relatively) very simple is indeed a feature
         | it has - however not so much because you can make a compiler
         | for it easily (not that it isn't a pro, but it isn't that
         | important in practice) but because it means it is easier to
         | learn and easier to write tools for.
         | dale_glass wrote:
         | I don't see how that reasoning is supposed to work in modern
         | times.
         | Who out there is seriously using a compiler churned out in a
         | weekend? The fact that you can do it doesn't mean anybody
         | seriously would use that.
         | We're also not really creating architectures anymore. There's
         | RISC-V, and Rust already supports that.
           | zer8k wrote:
           | > Who out there is seriously using a compiler churned out in
           | a weekend?
           | Someone at a chip manufacturer writing something for a brand
           | new chipset, for example. It takes a long time to get stuff
           | shoved into GCC. It's only in recent history has life settled
           | on one or two "big" compilers. There are still plenty of
           | other places where you will find bespoke compilers. Perhaps
           | not commonly, but they do exist (especially in embedded).
       | zabzonk wrote:
       | perhaps it is just me, but i have never experienced any of the
       | problems outlined in the comments here, despite of writing a
       | shedload of C and C++ code (and fortran, assembler and other
       | stuff). and i don't think i am a coding god.
       | AnimalMuppet wrote:
       | (2010)
         | dang wrote:
         | Added. Thanks!
       | dang wrote:
       | Related:
       |  _A Guide to Undefined Behavior in C and C++ (2010)_ -
       | https://news.ycombinator.com/item?id=18372613 - Nov 2018 (103
       | comments)
       |  _A Guide to Undefined Behavior in C and C++ (2010)_ -
       | https://news.ycombinator.com/item?id=9884074 - July 2015 (10
       | comments)
       |  _A Guide to Undefined Behavior in C and C++, Part 1_ -
       | https://news.ycombinator.com/item?id=2544159 - May 2011 (2
       | comments)
       | jbandela1 wrote:
       | If you want some nice examples of how undefined behavior results
       | in weirdness, see
       | https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63...
       | An interesting example from there is how the compiler can turn
       | int table[4];         bool exists_in_table(int v)         {
       | for (int i = 0; i <= 4; i++) {                 if (table[i] == v)
       | return true;             }             return false;         }
       | Into:                   bool exists_in_table(int v)         {
       | return true;         }
         | fluoridation wrote:
         | What's odd about that example is that the optimization is only
         | valid if the loop in fact overflows the array every time. So
         | the compiler is proving that the array is being overflowed and
         | rather than emitting a warning to that effect, it generates
         | absurd code.
           | kllrnohj wrote:
           | > So the compiler is proving that the array is being
           | overflowed and rather than emitting a warning to that effect
           | <source>:5:13: warning: unsafe buffer access [-Wunsafe-
           | buffer-usage]           5 |         if (table[i] == v) return
           | true;             |             ^~~~~
           | https://godbolt.org/z/zGxnKxvz6
           | This one is weirdly hard to get a compiler warning out of
           | which is a fair critque, but so many of the "Look what the
           | compiler did to my UB silently!" issues are not at all silent
           | and would have been stopped dead with "-Wall -Wextra -Werror"
             | iainmerrick wrote:
             | As noted elsewhere in this thread, GCC by default does the
             | "optimization" and doesn't warn. No doubt there are other
             | examples where Clang is the one that misbehaves.
             | How are we supposed to know whether our code is being
             | compiled sensibly or not, without poring over the
             | disassembly? Just set all the warning flags and hope for
             | the best?
               | UncleMeat wrote:
               | I think that a big problem is that for every compile that
               | seems "not sensible" and is actually not sensible, there
               | are 100s or 1000s of compiles that would look absolutely
               | insane to a human but are actually exactly what you want
               | when you sit down and think about it for a long time.
               | Almost all of the "don't do the overly clever stuff!"
               | proposals would throw away a huge amount of actually
               | productive clever stuff.
               | fluoridation wrote:
               | I think what the GP means by "not sensible" is that
               | proving that the code is broken in order to silently
               | optimize it more aggressively is not sensible. If your
               | theorem proven can find a class of bugs then have it emit
               | diagnostics. Don't _only_ use those bugs to make the code
               | run faster. Yes, make the code run faster, but let me
               | know I may be doing something nonsensical, since chances
               | are that it is nonsensical and it doesn 't cost anything
               | at run time.
               | UncleMeat wrote:
               | Right and the next part is the hard part: defining this
               | clearly. What I'm saying is that there is a surprising
               | amount of "wait, actually I do want that" when you dig
               | into this proposal.
               | mike_hock wrote:
               | A warning is only useful if it prescribes a code
               | transformation that affirms the programmer's intent and
               | silences the warning (unless the warning was a true
               | positive and caught a bug). You cannot simply emit a
               | warning every time you optimize based on UB.
               | There is no `if(obvious out-of-bound access) silently
               | emit nonsense har har har` in the compiler's source code.
               | The compiler doesn't understand intent or the program as
               | a whole. It applies micro transformations that all make
               | sense in isolation. And while the compiler also tries to
               | detect erroneous programming patterns and warn about
               | those, that's exceedingly more difficult.
               | moefh wrote:
               | > whether our code is being compiled sensibly or not
               | I'm failing to see what's not sensible about how that
               | code is compiled.
               | The only possible way that function could return false is
               | if you read past the end of the array and the value there
               | happens to be different from `v`. Is it really the more
               | sensible to rely on that, rather than fixing a known
               | behavior in case of array overflow?
               | robinsonb5 wrote:
               | If the compiler's going to interpret undefined behaviour
               | as license to do something that runs counter to the
               | programmer's expectations, the most sensible course of
               | action is for the compiler to yell very loudly about it
               | instead of near-silently producing (differently!) broken
               | code.
               | Currently that piece of code doesn't trigger a warning
               | with -Wall. It's not even flagged with -Wextra - it needs
               | -Weverything.
               | moefh wrote:
               | One man's "broken code produced by the compiler" is
               | another man's "excellently optimized code by the
               | compiler".
               | Where to draw the line is not always clear, but here's a
               | very clear-cut example[1] where emitting a warning would
               | be bad. If you don't want to watch the video, it's
               | basically this:
               | - the code technically contains undefined behavior, but
               | it will never be actually triggered by the program
               | - changing the code to remove undefined behavior forces
               | the compiler to emit terrible code
               | Making the compiler yell at the programmer in this case
               | would be terrible, but it's clearly a consequence of what
               | you're asking.
               | [1] https://youtu.be/yG1OZ69H_-o?t=2358
           | fanf2 wrote:
           | No, the logic for the optimization is:
           | - a correct program does not access table[4]
           | - therefore the loop must always exit early
           | - the only way to exit early is to return true
           | tedunangst wrote:
           | No, the compiler knows the array isn't overflowed, because C
           | programs don't contain overflows. Therefore the loop must
           | exit via one of the return true statements.
         | JonChesterfield wrote:
         | The amazing part about examples like that is people read them,
         | check that the compiler really does work on that basis, and
         | then continue writing things in C++ anyway. Wild.
         | Suppose I should expand on this. The idea seems to be either
         | 1/disbelief - compilers wouldn't really do this or 2/
         | infallibility - my code contains no UB.
         | Neither of those positions bears up well under reality.
         | Programming C++ is working with an adversary that will make
         | your code faster wherever it can, regardless of whether you
         | like the resulting behaviour of the binary.
         | I suspect rust has inherited this perspective in the compiler
         | and guards against it with more aggressive semantic checks in
         | the front end.
           | Gibbon1 wrote:
           | What's amazing is programmers haven't tared and feathered the
           | standards committee and compiler writers for allowing crap
           | like that.
           | ninepoints wrote:
           | It's just as "amazing" to read these takes from techno
           | purists. You use software written in C++ daily, and it can be
           | a pragmatic choice regardless of your sensibilities.
             | erik_seaberg wrote:
             | And we have the core dumps to prove it.
             | When any Costco sells a desktop _ten thousand_ times faster
             | than the one I started on, we can afford runtime sanity
             | checks. We don't have to keep living like this, with stacks
             | that randomly explode.
             | johnbellone wrote:
             | But it isn't Rust.
               | jacquesm wrote:
               | Lots of things 'aren't Rust'. In fact almost everything
               | isn't Rust. For now. That may change in due course but
               | right now I would guestimate the amount of Rust code
               | running on my daily drivers to pretty close to zero%. The
               | bulk is C or C++.
               | angiosperm wrote:
               | Hardly anything is. Literally none of the programs on my
               | machine are coded in Rust. (Firefox is reputed to have a
               | bit in it.)
               | jacquesm wrote:
               | About FF and Rust:
               | https://news.ycombinator.com/item?id=30743577
             | JonChesterfield wrote:
             | Definitely. There's loads of value delivered by C++
             | implementations, including implementations of C++ and other
             | languages. The language design of speed over safety mostly
             | imposes a cost in developer / debugging time and fear of
             | upgrading the compiler toolchain. Occasionally it shows up
             | in real world disasters.
             | I think we've got the balance wrong, partly because some
             | engineering considerations derive directly from separate
             | compilation. ODR no diagnostic required doesn't have to be
             | a thing any more.
           | peppermint_gum wrote:
           | >The amazing part about examples like that is people read
           | them, check that the compiler really does work on that basis,
           | and then continue writing things in C++ anyway. Wild.
           | Well, in modern C++ this code would look like this:
           | std::array<int, 4> table;         bool exists_in_table(int v)
           | {             for (auto &elem : table) {                 if
           | (elem == v) return true;             }             return
           | false;         }
           | Or even simpler:                   std::array<int, 4> table;
           | bool exists_in_table(int v)         {             return
           | std::ranges::contains(table, v);         }
           | There's no shortage of footguns in C++, but nonetheless,
           | modern C++ is safer than C.
             | mike_hock wrote:
             | Weirdly, GCC fails to optimize this, but Clang does (if you
             | make the table static as in the original example).
           | gizmo686 wrote:
           | I actually would prefer to get the second output. The result
           | is wrong, but consistantly and deterministically so. The
           | naive implementation of the broken code is a heisenbug.
           | Sometimes it will work, and sometimes it won't, and any
           | attempt to debug it would likely perterb the system enough to
           | make the issue not surface.
           | It wouldn't suprise me if I have run into the latter
           | situation without relizing it. When I got the the problem, I
           | would have just (incorrectly) assumed that the memory right
           | after the array happened to have the relevent value. I would
           | be counting my blessings that it happened consistantly enough
           | to be debuggable.
             | jll29 wrote:
             | I agree that it is better to get deterministic and
             | predictable behavior.
             | Reminds me of when for a while, I worked on HP 9000s under
             | HP-UX and in parallel on an Intel 80486-based Linux box,
             | and what I noticed is that the Unix workstations crashed
             | sooner and more predictably with segmentation faults than
             | Linux on the PC (not sure if this has changed since the
             | early 1990s - probably had to do with the MMU); so
             | developing on HP under Unix and then finally compiling
             | under Linux led to better code quality.
           | _gabe_ wrote:
           | > check that the compiler really does work on that basis, and
           | then continue writing things in C++ anyway. Wild.
           | My compiler (MSVC) doesn't do that[0]. Clang also doesn't do
           | this[1]. It's wild to me that GCC does this optimization[2].
           | It's very subtle, but Raymond Chen and OP both say a compiler
           | _can_ create this optimization, not that it _will_.
           | [0]: https://godbolt.org/z/bdx4EMzxe
           | [1]: https://godbolt.org/z/z833Wa391
           | [2]: https://godbolt.org/z/6b8aq59M9
           | jandrewrogers wrote:
           | > The amazing part about examples like that is people read
           | them, check that the compiler really does work on that basis,
           | and then continue writing things in C++ anyway.
           | That isn't idiomatic C++ and hasn't been for a long time.
           | Sure, it's _possible_ to do it retro C-style, because
           | backward compatibility, but you generally don 't see that in
           | a modern code base.
             | JonChesterfield wrote:
             | The modern codebase has grown from a legacy one. The legacy
             | one with parts of the codebase that were C, then got
             | partially turned into object oriented C++, then partially
             | turned into template abstractions. The parts least likely
             | to have comprehensive test coverage. _That_ place is indeed
             | where a compiler upgrade is most likely to change the
             | behaviour of your application.
       (page generated 2023-08-17 23:00 UTC)