[HN Gopher] C Integer Quiz
       ___________________________________________________________________
        
       C Integer Quiz
        
       Author : rwmj
       Score  : 247 points
       Date   : 2022-09-04 11:45 UTC (11 hours ago)
        
 (HTM) web link (www.acepace.net)
 (TXT) w3m dump (www.acepace.net)
        
       | nayuki wrote:
       | This tool I recently made can help:
       | https://www.nayuki.io/page/summary-of-c-cpp-integer-rules#ar...
       | 
       | For example, you can see what (signed long) > (unsigned int)
       | would convert to, under different environment bit width
       | assumptions.
       | 
       | Also, there's a discussion on Reddit:
       | https://www.reddit.com/r/cpp/comments/x4x01f/cc_arithmetic_c...
        
       | jart wrote:
       | Another interesting thing about C integers that the quiz doesn't
       | cover is that remainder is not modulus. For example, in Python:
       | >>: 2 % -5         -3
       | 
       | But in C:                   2 % -5 == 2
       | 
       | If you want to use modulus in C rather than Euclidean remainder,
       | then you have to use a function like this, which does what Python
       | does:                   long mod(long x, long y) {           if
       | (y == -1) return 0;           return x - y * (x / y - (x % y &&
       | (x ^ y) < 0));         }
        
         | cozzyd wrote:
         | One of my favorite tables:
         | https://en.m.wikipedia.org/wiki/Modulo_operation#In_programm...
        
         | marshallward wrote:
         | Fortran supports both of these, with `mod` as the C-like
         | truncated modulo and `modulo` as the floored modulo. Having
         | both is convenient, but you do get errors from people who don't
         | realize the difference
        
           | mtreis86 wrote:
           | Common Lisp as well, Rem and Mod
        
         | qsort wrote:
         | Mathematically speaking, the mod is usually taken to be
         | positive. Any reasonable definition would involve taking the
         | quotient of Z over the congruence relation, so I can see both
         | -3 and 2 being reasonable conventions to represent partitions.
         | 
         | OTOH, the largest "wat" of C-like languages is the following:
         | > -8 % 5       -3
         | 
         | why
         | 
         | Python here is once again correct:                 >>> -8 % 5
         | 2
        
           | veltas wrote:
           | Mathematically speaking if you ask for the modulus of -5 you
           | may as well get a negative number, and I think people may
           | validly have their own interpretations of what's "correct" at
           | this point.
        
             | qsort wrote:
             | I didn't say it's wrong.
             | 
             | You may as well use {white, blue, black, red, green} to
             | represent congruences mod 5, mathematically speaking it's
             | not wrong, as long as they respect the axioms of a field.
             | 
             | What's "wat" (in the same sense as the js "wat" talk) is
             | that the answer changes if the argument becomes negative.
             | By all reasonable definitions of modular arithmetic, -8 and
             | 7 are in the same class mod 5. Why is then:
             | > (-8 % 5) == (7 % 5)       false
             | 
             | in C-like languages? I'm pretty sure it's perfectly
             | consistent with all the relevant language standards, but
             | it's a "wat" nonetheless.
        
               | masklinn wrote:
               | > What's "wat" (in the same sense as the js "wat" talk)
               | is that the answer changes if the argument becomes
               | negative.
               | 
               | It's the same in Python though? you just prefer the way
               | it behaves (though to be fair it's generally more useful
               | and less troublesome): Python's remainder follows the
               | sign of the divisor (because it uses floored division),
               | while C's follows the sign of the dividend (because it
               | uses truncated division).
               | 
               | Strictly speaking, neither is an euclidian modulo.
        
               | owl57 wrote:
               | It's not symmetric. I believe most people with strong
               | mathematical background (like Guido:)) expect lambda x: x
               | % m to be some "computer approximation" to the standard
               | mapping from Z to Z/mZ, while not having deep
               | expectations about lambda m: x % m.
        
         | veltas wrote:
         | The really useful fact of C's remainder operator is that: x ==
         | x / y * y + x % y, provided nothing overflowed. This is usually
         | what you want when doing accurate calculations with integer
         | division.
        
           | xigoi wrote:
           | The modulo operation together with floor division also has
           | this property. And unlike C's operators, it also has the
           | useful property that (x + y) % y == x % y.
        
       | temac wrote:
       | Undefined behavior is probably the worst way we can imagine to
       | define constraints usable by optimizers. Sad that major languages
       | and implementations went this way.
        
         | Karellen wrote:
         | Yeah, I think a lot of C's `undefined behaviour` semantics
         | around assignments to ints should be reconsidered and changed
         | to `unspecified` or `implementation defined` behaviour, and
         | compilers can just do "whatever the hardware does". If that
         | includes traps on one arch, fine, let it trap.
         | 
         | I think `undefined behaviour` still has its place in C -
         | dereferencing a freed pointer comes to mind as an obvious
         | example - but I think a good proportion of the really
         | unintuitive UB conditions could be made saner without
         | sacrificing portability or optimisation opportunities.
        
           | owl57 wrote:
           | If you extend "unspecified" to allow traps, reading a bogus
           | pointer can also be unspecified, only writes undefined in the
           | current sense.
        
             | Karellen wrote:
             | I don't think so. With "unspecified" and "implementation-
             | defiend" behaviours, the implementation has to pick a
             | behaviour and be consistent with it. The difference is
             | whether they have to document that behaviour or not.
             | 
             | If the hardware traps on bogus pointers, then reading a
             | bogus pointer may trap. But if you read a recently-freed
             | pointer, it may still be valid according to the hardware
             | (e.g. will have valid PTEs into the processes address
             | space) so won't trap. Therefore you won't be able to
             | guarantee any particular behaviour on an invalid read, so I
             | don't think you'd be able to get away with "unspecified" or
             | "implementation defined" behaviour on most hardware.
        
               | owl57 wrote:
               | AFAIR reading uninitialized int, for example, is
               | "unspecified" (any value could be there). If we consider
               | adding "implementation defined with possible trap" for
               | overflow, we might as well add "unspecified with possible
               | trap" for reading an invalid pointer (any value could be
               | there, or it could trap, but no nasal demons).
        
               | temac wrote:
               | Using unitialized values is UB. Going back to naive
               | pointers is a lost cause because compilers have started
               | to do crazy optims (not even currently allowed by the
               | standards...) like origin analysis. You can't steal that
               | toy from the people implementing optims.
        
           | UncleMeat wrote:
           | "Implementation defined" is worse in a lot of ways. Now the
           | compiler has way less authority to tell you to stop! And we
           | still have the problem of the application probably not doing
           | what you want.
        
             | temac wrote:
             | Current compilers don't "tell you to stop". They silently
             | transform your program into complete garbage without even
             | causality restraints. The sanitizers can help but they are
             | merely dynamic when the dangerous tranformations are static
             | in the first place.
        
               | UncleMeat wrote:
               | It is true that the language has failed to provide tools
               | that help developers prevent UB and that this is a very
               | bad thing for the ecosystem. It _can_ change, though.
               | 
               | In practice, compilers aren't actually adversarial. A lot
               | of the discussion around UB is catastrophizing and talks
               | about how the compiler will order you pizza or delete
               | your disk. Some problems are real and I know some people
               | whose graduate school work was very specifically on the
               | problems that this causes for security-related checks but
               | compilers really really do not transform your program
               | into "complete garbage." They transform your program into
               | a program with a bug, which is a true statement about
               | that program.
               | 
               | I'm reminded of the apocryphal story about being asked
               | how a computer knows how to do the right thing if given
               | the wrong inputs. This feels similar.
        
         | UncleMeat wrote:
         | I do agree that the story around UB in C and C++ sucks. But UB
         | doesn't exist because the compiler engineers want to be able to
         | stick in optimizations. Once they are in the spec it makes
         | sense to follow the rules but most UB comes from a desire for
         | portability and to not privilege one platform over another.
         | 
         | And further, defining a lot of UB won't actually improve
         | things. Imagine we define signed integer overflowing behavior.
         | Hooray. Now your program just has a _different_ bug. If you 've
         | accidentally got signed integer overflow in your application
         | then "did some weird things because the compiler assumed it
         | would never overflow" is going to cause exactly the same amount
         | of havoc as "integer overflowed and now your algorithm is
         | almost certainly wrong."
        
           | nayuki wrote:
           | > And further, defining a lot of UB won't actually improve
           | things. Imagine we define signed integer overflowing
           | behavior. Hooray. Now your program just has a different bug.
           | 
           | This is exactly what JF Bastien argues in his hourlong talk:
           | https://youtu.be/JhUxIVf1qok?t=2284
           | 
           | So yeah, defining signed integer overflow isn't going to fix
           | bugs in the vast majority of existing programs.
           | 
           | That being said, enforcing signed overflow wraparound at
           | least makes debugging easier because it's reproducible. This
           | is how it is in Java land - int overflow wraps around, and
           | integer type widths are fixed, so if you trigger an overflow
           | while testing then it is reliably reproducible across all
           | conforming Java compiler and VM versions and platforms.
        
             | UncleMeat wrote:
             | It also makes tools less able to loudly yell at you for
             | having it in your application. Yes, you can turn on
             | optional warnings if you want but if you've got one or two
             | intended uses for this behavior then now you need
             | suppressions and all sorts of mess.
        
       | gary_0 wrote:
       | I got most of them right. My god, what have I become?
        
         | [deleted]
        
       | BirAdam wrote:
       | I have used C quite a bit, and I'm amazed at how many I got
       | wrong. Nice example of why C is hated by so many I suppose.
       | 
       | Personally, I love C but I don't use it in anything serious.
       | Probably for the best given I apparently cannot remember how C
       | ints work.
        
       | [deleted]
        
       | [deleted]
        
       | antirez wrote:
       | Surprised I got all the questions right because I tend to code
       | _around_ those limits, sticking to defining always safe types for
       | the problem at hand, in all the cases I 'm not sure about ranges,
       | possibile overflows and so forth. What I mean is that if you have
       | to remember the rules in a piece of code you are writing, it is
       | better to rewrite the code with new types that are obviously
       | safe.
        
         | bornfreddy wrote:
         | Admittedly my C is (very) rusty, but I was surprised at how few
         | of the answers I knew, so your comment made me feel even worse.
         | But then I saw your nickname. Yeah. :) (redis rocks btw)
         | 
         | Completely agree with you, and not just in C. Using esoteric
         | features of any language is equivalent to putting landmines in
         | front of your teammates - bad idea.
        
           | tomxor wrote:
           | > Using esoteric features of any language is equivalent to
           | putting landmines in front of your teammates
           | 
           | Also, outside of production code, using esoteric features is
           | a good way to get familiar with every corner of that language
           | - which is useful so that you can diffuse those landmines
           | when they are accidentally created. i.e know them to avoid
           | them.
        
       | jherskovic wrote:
       | Haven't written C in a long time. What a nightmare this is. Got
       | most of them predictably wrong.
        
       | simias wrote:
       | I did fairly well in the test (I didn't remember that you
       | couldn't shift into the sign bit), but it really helps highlight
       | how stupid the C promotion rules are. In C as soon as I have to
       | do arithmetics with anything but unsigned ints all sorts of alarm
       | bells start going off and I end up casting and recasting
       | everything to make sure it does what I think it does, coupled
       | with heavy testing. Now add floats and doubles into the mix and
       | all bets are off.
       | 
       | Rust not having a default "int" type and forcing you to
       | explicitly cast everything is such an improvement. Truly a poster
       | child for "less is more". Yeah it makes the code more verbose,
       | but at least I don't have to worry about "lmao you have a signed
       | int overflow in there you absolute nincompoop, this is going to
       | break with -O3 when GCC 16 releases 8 years from now, but only on
       | ARM32 when compiled in Thumb mode!"
        
         | WalterBright wrote:
         | The trouble with explicit casting is if the code is refactored
         | to change the underlying integer type, the explicit casts may
         | silently truncate the integer, introducing bugs.
         | 
         | D follows the C integral promotion rules, with a couple crucial
         | modifications:
         | 
         | 1. No implicit conversions are done that throw away information
         | - those will require an explicit cast. For example:
         | int i = 1999;         char c = i;           // not allowed
         | char d = cast(char)i; // explicit cast
         | 
         | 2. The compiler keeps track of the range of values an
         | expression could have, and allows narrowing conversions when
         | they can be proven to not lose information. For example:
         | int i = 1999;         char c = i & 0xFF;  // allowed
         | 
         | The idea is to safely avoid needing casts, in order to avoid
         | the bugs that silently creep in with refactoring.
         | 
         | Continuing with the notion that casts should be avoided where
         | practical is the cast expression has its own keyword. This
         | makes casting greppable, so the code review can find them. C
         | casts require a C parser with lookahead to find.
         | 
         | One other difference: D's integer types have fixed sizes. A
         | char is 8 bits, a short is 16, an int is 32, and a long is 64.
         | This is based on my experience that a vast amount of C
         | programming time is spent trying to account for the
         | implementation-defined sizes of the integer types. As a result,
         | D code out of the box tends to be far more portable than C.
         | 
         | D also defines integer math as 2's complement arithmetic. All
         | that 1's complement stuff belongs in the dustbin of history.
        
           | scrame wrote:
           | I never really clicked with D, but I always like reading your
           | discussions on these details.
        
           | simias wrote:
           | Yeah for sure, having more expressive casts and
           | differentiating between "upcasts" and "downcasts" is
           | definitely better. My point that even "lossy" casts are
           | better than C's weird arcane implicit promotion rules by
           | quite a margin IMO.
           | 
           | Rust's current handling of this issue is by no mean perfect,
           | although it's been steadily improving and I definitely don't
           | use `as` as much as I used to.
           | 
           | >D also defines integer math as 2's complement arithmetic.
           | 
           | I think modern C standards does so as well, but I _think_
           | than signed overflow is still UB, so it 's mostly about
           | defining signed-to-unsigned conversions. There are flags on
           | many compilers to tell them to assume wrapping arithmetic but
           | obviously that's not standard...
        
           | WalterBright wrote:
           | It's not perfect. People do complain that:
           | char a,b,c;         a = b + c;   // error         a =
           | cast(char)(b + c); // ok
           | 
           | produces a truncation error, and requires a cast. But char
           | types have a very small range, and overflow may be
           | unexpected. So D makes the right choice here to promote to
           | int, and require a cast.
        
           | loeg wrote:
           | The most recent C standards also explicitly define integer
           | math as 2's complement. Admittedly, much later than D.
        
             | WalterBright wrote:
             | The last 1's complement machine I encountered was, never.
             | Not in nearly 50 years of programming. I think those
             | machines left for the gray havens before I was born.
             | 
             | C should also make char unsigned (D does). Optionally
             | signed chars are an abomination.
        
               | xeeeeeeeeeeenu wrote:
               | >The last 1's complement machine I encountered was,
               | never. Not in nearly 50 years of programming. I think
               | those machines left for the gray havens before I was
               | born.
               | 
               | Unisys OS 2200 uses one's complement[1] and Unisys MCP
               | uses signed magnitude[2]. Both are still around.
               | 
               | [1] - https://public.support.unisys.com/2200/docs/CP19.0/
               | 78310422-... (page 108) - "UCS C represents an integer in
               | 36-bit ones complement form (or 72-bit ones complement
               | form, if the long long type attribute is specified)."
               | 
               | [2] -
               | https://public.support.unisys.com/aseries/docs/ClearPath-
               | MCP... (page 304) - "ClearPath MCP C uses a signed-
               | magnitude representation for integers instead of
               | two's-complement representation. Furthermore, ClearPath
               | MCP C integers use only 40 of the 48 bits in the word: a
               | separate sign bit and the low order 39 bits for the
               | absolute value."
        
           | Measter wrote:
           | > The trouble with explicit casting is if the code is
           | refactored to change the underlying integer type, the
           | explicit casts may silently truncate the integer, introducing
           | bugs.
           | 
           | That depends on how the casting is provided. For the C-style
           | casting, or Rust's `as` casting, yes that is a problem.
           | However, another way casting could be provided is through
           | conversion functions that are only infallible if information
           | isn't lost. For example, let's say we have the functions
           | `to_u16` and `to_i16`. For an `i8` the first function could
           | return `Option<u16>`, the second `i16`, while for a `u8` they
           | would return `u16` and `i16`. That way, any change to the
           | types that could cause it to now silently truncate would
           | instead cause a compiler error because of the type mismatch.
           | 
           | Rust almost gets there with its `Into` and `TryInto` traits
           | which do provide that functionality, but trying to use them
           | in an expression causes type inference to fail, which just
           | makes them a pain in the ass to use.
        
             | nayuki wrote:
             | You can use From and TryFrom, like u32::try_from(5i32).
             | https://doc.rust-lang.org/std/convert/trait.From.html ,
             | https://doc.rust-lang.org/std/convert/trait.TryFrom.html
        
               | tialaramex wrote:
               | And, notably in this context, Rust's traits are auto-
               | implemented in a chain, so if From<Foo> for Bar, then
               | Into<Bar> for Foo, and thus TryFrom<Foo> for Bar, and
               | thus in turn TryInto<Bar> for Foo.
               | 
               | This means that if first_thing used to have a non-
               | overlapping value space so that converting it to
               | other_thing might fail, so you wrote
               | other_thing = first_thing.try_into().blahblahblah;
               | 
               | ... if you later refactor and now first_thing is a subset
               | of other_thing so that the conversion can never fail, the
               | previous code still works fine, the try_into() call just
               | never fails. In fact, the compiler even _knows_ it can 't
               | fail, because its error type is now Infallible, a sum
               | type with nothing in it, so the compiler can see this
               | never happens, and optimise accordingly.
        
             | WalterBright wrote:
             | Yeah, having two different cast operations can help here.
             | But I like the simpler approach.
        
         | tialaramex wrote:
         | Still, I would rather not have Rust's 'as' cast. I should like
         | to see all, or at least most uses of 'as' deprecated in a later
         | Rust edition.
         | 
         | Rust's 'as' will silently throw away data to achieve what you
         | asked. Sometimes you wanted that, sometimes you didn't realise,
         | and requiring into() and try_into() instead helps fix that.
         | 
         | For example suppose I have a variable named beans, I forgot
         | what type it is, but it's got the count of beans in it, there
         | should definitely be a non-negative value because we were
         | counting beans, and it shouldn't be more than a few thousand,
         | so we're going to put that in a u16 variable named enough. let
         | enough = beans as u16;
         | 
         | This works just fine, even though beans is i64 (ie a signed
         | 64-bit integer). Until one day beans is -1 due to an arithmetic
         | error elsewhere, and now enough is 65535, which is pretty
         | surprising. It's not _undefined_ but it is surprising.
         | 
         | If we instead write let enough = beans.into(); we get a
         | compiler error, the compiler can't see any way to turn i64 into
         | u16 without risk of loss, so this won't work. We can write
         | beans.try_into().unwrap() and get a panic if actually beans was
         | out of range, or we can actually write the try_into() handler
         | properly if we realise, seeing it can fail, what we're actually
         | dealing with here.
        
           | jcranmer wrote:
           | If I can get away with using .into() over as, I will.
           | However, there are so many cases where you can't, and
           | .try_into().unwrap() is just way too unwieldy. In particular,
           | there's no way for me to go "I know I'm only ever going to
           | run on 64-bit machines where sizeof(usize) == sizeof(u64), so
           | usize should implement From<u64>" (and even more annoying, I
           | might be carefully making sure this works on 32-bit and
           | 64-bit machines and get screwed because Rust thinks it could
           | be portable to a 16-bit usize despite the fact I'm allocating
           | hundreds of MB of memory).
           | 
           | And of course there are times I do want to bitcast negative
           | values to large positive unsigned values or vice versa,
           | without error. So while I do understand why maybe you
           | shouldn't use as, at the end of the day, it just ends up
           | being easier to use it than not use it.
        
             | gspr wrote:
             | > try_into().unwrap() is just way too unwieldy.
             | 
             | I always carry with me a tiu() alias method for exactly
             | that reason (in a TryIntoUnwrap trait implemented for every
             | pair of types which implements TryInto).
        
               | arcticbull wrote:
               | You can also do try_from()? if you impl From<TryIntError>
               | on your local error type.
        
           | ridiculous_fish wrote:
           | A flip side is that some safe conversions also produce
           | compiler errors. On a 64-bit system, usize and u64 are the
           | same width, but they are not convertible: neither is Into the
           | other, so you cannot lean on the compiler in the way you
           | describe.
           | 
           | You might use try_into(), but then you risk panicking on a
           | 32-bit system, instead of getting a compile-time error.
        
       | svnpenn wrote:
       | Exactly why I don't use C anymore. Other languages like Go don't
       | allow this stuff.
       | 
       | https://go.dev/play/p/D2J6Y9ol41W
        
       | ummonk wrote:
       | Wait so does an unsigned int promote to signed int for shift?
        
       | Jimajesty wrote:
       | I felt pretty smart getting that first question, then proceeded
       | to fall down the stairs.
        
       | Turing_Machine wrote:
       | The correct answer for many of these is "don't do that". :-)
        
         | IncRnd wrote:
         | That is the correct answer to the questions in this quiz!
         | 
         | Most professionals don't worry about these effects, instead
         | choosing to cast and use parenthesis directly. Those instruct
         | the compiler instead of relying upon warnings from the
         | compiler.
        
       | jstimpfle wrote:
       | It's important to notice the disclaimer that this is applies to
       | x86/x86-64 GCC like platforms, in particular int is assumed to be
       | 32 bits.
       | 
       | As antirez said as well, I don't pride myself in understanding
       | the intricacies of integer arithmetic and promotion well. I try
       | to write clear code by writing around the less commonly
       | understood rules. Nevertheless I wanted to test myself. There are
       | two questions that surprised me somewhat.
       | 
       | Apparently you can't left-shift a negative value even by a zero
       | amount, as in (-1 << 0).
       | 
       | And is it true that the value of "-1L > 1U" is platform
       | dependent? I had assumed that 1U would be promoted to 1L in this
       | expression, even on x86 where unsigned int and long have the same
       | number of bits. According to the following document, long int has
       | a "higher rank" than unsigned int.
       | https://wiki.sei.cmu.edu/confluence/display/c/INT02-C.+Under... .
       | (Edit: according to rules 4 and 5 of "Usual arithmetic
       | conversions", it's not only the rank but also about "the values
       | that can be represented")
        
       | Dwedit wrote:
       | On Windows, `long` and `int` are the same size, even when
       | building for x64.
        
       | einpoklum wrote:
       | I taught first-semester C for several years, and still got a
       | couple of these wrong - although, to be honest, the way we taught
       | the class at my alma mater we steered well clear of these
       | situations. We told students that C perform type promotions and
       | automatic conversions, but either had them use a uniform type to
       | begin with - int typically - or told them to convert explicitly
       | and avoid
       | 
       | Still, the most important suggestion I would make here is: Always
       | compile with warnings enabled, in particular:
       | 
       | * -Wstrict-overflow=1 and maybe even -Wstrict-overflow=3
       | 
       | * -Wsign-compare
       | 
       | * -Wsign-conversion
       | 
       | * -Wfloat-conversion
       | 
       | * -Wconversion
       | 
       | * -Wshift-negative-value
       | 
       | (or just `-Wall -Wextra`)
       | 
       | And maybe also:
       | 
       | * -fsanitize=signed-integer-overflow
       | 
       | * -fsanitize=float-cast-overflow
       | 
       | (These are GCC flags, should work for clang as well.)
        
       | Cola2265 wrote:
        
       | pif wrote:
       | In the end, it's very simple and intuitive.
       | 
       | 1- If you need arithmetic, use signed; if you need a bitmask, use
       | signed. C is not assembly, and bit shift is no multiplication nor
       | division.
       | 
       | 2- Make sure you stay within bound, as in: don't even think you
       | can approach the boundaries. C is not assembly, and overflow flag
       | does not exist.
        
       | codeflo wrote:
       | I thought I did well until the bit-shifting questions. Who in
       | their right mind designs a language where shifting the bits of a
       | u16 silently converts into an i32? Doubly so since that fact
       | alone directly causes UB -- keeping the u16 would have been
       | perfectly fine.
       | 
       | (Edit, since some respondents seem to miss this, explanations
       | about efficiency or ISAs might justify promoting to u32 (though
       | even that's debatable), but not i32. A design that auto-promotes
       | an unsigned type, where every operation is nicely defined and
       | total, into a signed type, where you run into all kinds of
       | undefined behavior on overflow, is simply crazy.)
        
         | jcranmer wrote:
         | If you have a 32-bit architecture, you don't necessarily have
         | 8-bit and 16-bit hardware operations outside of memory
         | operations (load/store).
         | 
         | Now, I don't find this reasoning persuasive--it's not _that_
         | hard to emulate an 8-bit or 16-bit operation--and judging from
         | the history of post-C languages, most other language designers
         | are equally unmoved by this reasoning, but I can see someone in
         | their right mind designing a language that acts like this.
         | Especially if the first architecture they 're developing on is
         | precisely such on architecture (PDP11 doesn't have byte-sized
         | add/sub/mul/div).
        
           | simias wrote:
           | I think the prevailing philosophy for C (and later C++) was
           | that code should map closely to the hardware and not expand
           | to complicated "microcode". A left shift in Cshould just be a
           | left shift in the underlying ISA, within reason. That's where
           | most of the undefined behaviours come from. Having to add
           | masking and other operations to emulate a 16bit shift on a
           | 32bit architecture feels un-C-like, for better or worse.
           | 
           | IMO the real issue is not so much the fact that all shifts of
           | any type < int is treated as if it were an int, it's that the
           | language doesn't force you to acknowledge that in the code.
           | If you got a compilation error when trying to shift a short
           | and had to explicitly promote to int in order to make it
           | through, at the very least it can't lead to an oversight from
           | a careless programmer.
           | 
           | C is trying to be clever but only goes half way, resulting in
           | the worst of both worlds IMO.
        
             | codeflo wrote:
             | The ISA might force someone to extend a value to 32 bits
             | (debatable, but let's go with it). It never forces you to
             | treat an unsigned int as signed. It also doesn't require
             | inserting UB into the process.
        
               | simias wrote:
               | I agree, the whole signed vs. unsigned generally feels
               | like an afterthought in C (probably because, to a certain
               | extent, it was). `char`'s sign being implementation-
               | defined is a pretty wild design choice that wasted many
               | hours of my life while porting code between ARM and x86.
               | 
               | UBs are not required, but you need them if you want C to
               | behave as a macro-assembler as well as allowing for
               | aggressive optimizations. For instance `a << b` if b is
               | greater than a's width is genuinely UB if you write
               | portable code, different CPUs will do different things in
               | this situation. Defining the behaviour means that the
               | compiler would have to insert additional opcodes to make
               | the behaviour identical on all platforms.
               | 
               | You may argue that it's still better than having UB but
               | that's just not C's design philosophy, for better or
               | worse.
        
               | gsliepen wrote:
               | It's worse than that. `char`'s signedness being
               | implementation defined is one thing, but then having the
               | standard library provide a function called `getchar()`
               | that returns not a `char` but an `unsigned char` cast to
               | an `int` is diabolical.
        
               | tsimionescu wrote:
               | > For instance `a << b` if b is greater than a's width is
               | genuinely UB if you write portable code, different CPUs
               | will do different things in this situation. Defining the
               | behaviour means that the compiler would have to insert
               | additional opcodes to make the behaviour identical on all
               | platforms.
               | 
               | Your seem to be mixing up implementation defined behavior
               | and undefined behavior. It would have been perfectly
               | reasonable to make this choice if signed integer ovwrflow
               | were implementation-defined, but it is unfortunately not
               | - it is undefined behavior instead. This means that a
               | program containing this instruction is not valid C and
               | may "legally" have any effect whatsoever.
        
               | codeflo wrote:
               | That's true, I'd add something more. Your reasoning about
               | hardware differences would only justify implementation-
               | defined behavior, not undefined behavior. The distinction
               | is important here: undefined behavior is when the
               | compiler can make surprising optimizations in other parts
               | of the code assuming something doesn't happen.
        
               | jcranmer wrote:
               | I imagine the reason oversized shifts are UB is because
               | some 40-year-old computer hardware trapped on oversized
               | shifts, as traps are always UB in C.
        
         | nayuki wrote:
         | > Who in their right mind designs a language where shifting the
         | bits of a u16 silently converts into an i32?
         | 
         | Yeah, hence why I asked this question years ago:
         | https://stackoverflow.com/questions/39964651/is-masking-befo...
        
         | veltas wrote:
         | Because everything smaller than an int is usually promoted to
         | int. int is the 'word' in C that things are calculated in. Even
         | character constants are ints, all enum constants are ints, the
         | default type was int when default types were still a thing.
        
           | edflsafoiewq wrote:
           | Yes. The model C has is that the CPU has an ALU that that
           | operates as (word,word)->word with int being the smallest
           | word size. This explains many of C's conversion rules: to
           | operate on a single integer, it first has to be promoted to a
           | word size; to operate on two integers, they first have to be
           | converted to a common word size, etc.
        
           | lifthrasiir wrote:
           | And that makes writing a correct _and_ portable C futile.
           | Yes, defined-size types are a thing and I almost exclusively
           | use them, but since the size of `int` itself is unknown and
           | integer promotion depends on that size, the meaning of the
           | code using only defined-size types can still vary across
           | platforms. intN_t etc. are only typedefs and do not form
           | their own type hierarchy, which in my opinion is a huge
           | mistake.
        
           | jefftk wrote:
           | But why not turn u16 into u32? Why switch it to being signed
           | on promotion?
        
             | rwmj wrote:
             | I'm sure the answer is going to be along the lines of
             | "because PCC did that and they standardized it" :-/
             | 
             | Here's a fun standardization problem I came across recently
             | (nothing to do with C):
             | http://mywiki.wooledge.org/BashFAQ/105
        
             | ynfnehf wrote:
             | ANSI describes why in their rationale:
             | https://www.lysator.liu.se/c/rat/c2.html#3-2-1 .
             | The unsigned preserving rules greatly increase the number
             | of situations where unsigned int confronts signed int to
             | yield a questionably signed result, whereas the value
             | preserving rules minimize such confrontations.  Thus, the
             | value preserving rules were considered to be safer for the
             | novice, or unwary, programmer.  After much discussion, the
             | Committee decided in favor of value preserving rules,
             | despite the fact that the UNIX C compilers had evolved in
             | the direction of unsigned preserving.
        
             | veltas wrote:
             | Believe it or not, this makes the behavior more like what
             | you'd expect in many cases. For example:
             | uint8_t x = 4;       extern volatile uint64_t *reg;
             | *reg &= ~x;
             | 
             | In the last statement x is promoted to an int, and then
             | when the logical NOT occurs every bit is set to 1,
             | including the high bit. When it's converted to a uint64_t
             | for the AND, the high bits are also set to 1. So the result
             | is that the final statement clears only bit 2 in *reg.
             | 
             | If it promoted to unsigned int, then it would also clear
             | bits 32-63.
        
               | codeflo wrote:
               | I don't find that very convincing. It's simply ambiguous,
               | I might want this:                   *reg &=
               | ~(uint64_t)x;
               | 
               | or, and there's no elegant way to even write this in C, I
               | might want:                   *reg
               | &=(uint64_t)(uint8_t)~x;
               | 
               | The fact that I have to write two casts here to undo the
               | damage of the auto-promotion is evidence of how broken
               | this is.
        
               | veltas wrote:
               | That second line can be written:                 *reg &=
               | (uint8_t)~x;
               | 
               | Or:                 *reg &= ~x & 0xFF;
        
               | tialaramex wrote:
               | extern volatile uint64_t *reg;       *reg &= ~x;
               | 
               | People should stop doing this. What this _means_ is:
               | extern volatile uint64_t *reg;       uint64_t tmp = *reg;
               | tmp &= ~x;       *reg = temp;
               | 
               | But of course when you write _that_ chances are somebody
               | will point out that you 're running in interruptible
               | context sometimes in this function, so that's actually
               | introducing a race condition. Why didn't they say so when
               | you wrote it your way? Because that looked like a single
               | operation and so it wasn't obvious it might get
               | interrupted.
        
         | scatters wrote:
         | Because sub-integer types are for storage, not computation.
         | 
         | Yes, it'd be better if you had to explicitly cast to int or
         | unsigned to perform arithmetic, but that ship has sailed.
        
       | leni536 wrote:
       | Doesn't list my favorite footgun.
       | 
       | x and y are unsigned short. The expression `x*y` has defined
       | behavior for...
       | 
       | a) all values of x and y.
       | 
       | b) some values of x and y.
       | 
       | c) no values of x and y.
        
         | nayuki wrote:
         | If short = 16 bits and int = 16 bits, then x and y will be
         | promoted to unsigned int. Unsigned multiplication has
         | wraparound behavior, so x*y will be defined for all values.
         | 
         | If short = 16 bits and int = 32 bits (or heck even 17 bits),
         | then x and y will be promoted to signed int. Signed
         | multiplication overflow is undefined behavior, so x*y will be
         | undefined for some values when x*y is too large. In particular,
         | 0xFFFF * 0xFFFF = 0xFFFE_0001, which is larger than INT_MAX =
         | 0x7FFF_FFFF.
         | 
         | If short = 16 bits and int = 64 bits (or even 33 bits), then x
         | and y will be promoted to signed int. The range of x*y will
         | always fit int, so no overflow occurs, and the expression is
         | defined for all input values.
         | 
         | Isn't C fun?
        
         | loeg wrote:
         | (B), assuming int is 32-bit and short is 16-bit. The
         | multiplication promotes both operands and result to signed int,
         | right? So if both x and y are uint16_max, the result overflows
         | signed int and is UB, I think.
         | 
         | But if int were larger (eg 64 bit) and short remained 16 bits,
         | there's no overflow and the answer is (A). I think.
        
         | jstimpfle wrote:
         | There was definitely a question that covered the auto-promotion
         | to int, maybe with slightly different types.
        
         | [deleted]
        
         | shultays wrote:
         | I am gonna say A but I guess I dont the foot gun here
        
           | Veliladon wrote:
           | The shorts are not promoted to ints (if you even declared the
           | result as that type) until after the multiplication. The
           | result will first be put into a short and then promoted to
           | int. It's basically asking for an overflow error. Given that
           | 256^2 is 65536 you don't have to be multiplying large numbers
           | before hitting that overflow.
        
             | loeg wrote:
             | Unsigned short has defined overflow, though.
        
             | shultays wrote:
             | I don't follow, assuming short is half the size of int it
             | would only overflow if most significant bits of both values
             | are 1. Where did that 256^2 come from? It wouldn't overflow
             | if a and b were 256
             | 
             | I missed that the result would be promoted to signed int,
             | which only most significant bits are set
        
               | [deleted]
        
             | LegionMammal978 wrote:
             | All operands are always promoted to int (or unsigned int,
             | long, unsigned long, etc.) _before_ any operation. The
             | footgun in this example is that even though you 're using
             | unsigned integers and would expect guaranteed wrapping
             | semantics, the promotion to signed int makes it UB for
             | USHORT_MAX*USHORT_MAX.
        
           | jandrese wrote:
           | 65536 * 65536 would overflow an int.
        
             | shultays wrote:
             | Ah I didn't know it would be promoted to signed int.
             | Another thread here explains it as well. Thanks
        
       | synergy20 wrote:
       | typically you shall avoid shift on signed integers, especially
       | NEVER left shift on signed char|short|int|whatever.
       | 
       | I limit shift strictly to unsigned integer numbers.
        
         | nayuki wrote:
         | Following this rule strictly can be tricky.
         | uint16_t x = (...);         uint16_t y = x << 15;
         | 
         | Any arithmetic involving uint16_t will be promoted to some kind
         | of int. If int is 16 bits wide, then uint16_t will be promoted
         | to unsigned int before the shift, and all values of x are safe.
         | Otherwise int is at least 17 bits wide, then uint16_t will be
         | promoted to signed int before the shift.
         | 
         | On a weird but legal platform where int is 24 bits wide, the
         | expression (uint16_t)0xFFFF << 15 will cause undefined
         | behavior.
         | 
         | My workaround for this is to force promotion to unsigned int:
         | (0U + x) << 15.
         | https://stackoverflow.com/questions/39964651/is-masking-befo...
        
       | siggen wrote:
       | Got all these correct. I was expecting something more convoluted
       | or exotic undefined behavior.
        
       | dinom wrote:
       | Seems like a good example of why interview quizzes aren't a
       | panacea.
        
       | flykespice wrote:
       | This quiz only reminded me how _little_ I know about C
       | (thankfully those case are corner-cases of incompetent
       | programming so other and I can ignore their existence).
       | 
       | Good grief, thanks Quiz for reminding me to stay away from that
       | mess of language.
        
       | greesil wrote:
       | -Wall -Werror
       | 
       | #include <stdint.h>
       | 
       | and don't use primitive types, and you will avoid many of these
       | issues.
        
       | ghoward wrote:
       | I'm a heavy use of C, and most of what I got wrong were around
       | integer promotion.
       | 
       | I'm glad I run clang with -Weverything and use ASan and UBSan.
        
       | jcranmer wrote:
       | The answer of the final question (INT_MIN % -1) is wrong in a way
       | that's somewhat dangerous.
       | 
       | If you read the text of C99 carefully, yes, it's implied that
       | INT_MIN % -1 should be well-defined to be 0. However, the %
       | operator is usually implemented in hardware as part of the same
       | instruction that does division, which means that on hardware
       | where INT_MIN / -1 traps (thereby causing undefined behavior),
       | INT_MIN % -1 will also trap. The wording was changed in C11 (and
       | C++11) to make INT_MIN % -1 explicitly undefined behavior, and
       | given the reasoning for why the wording was changed, users should
       | expect that it retains its undefined behavior even in C89 and C99
       | modes, even on 20-year-old compilers that predate C11.
        
         | greaterthan3 wrote:
         | >The wording was changed in C11
         | 
         | And here's an exact quote from the C11 standard:
         | 
         | >If the quotient a/b is representable, the expression (a/b)*b +
         | a%b shall equal a; otherwise, the behavior of both a/b and a%b
         | is undefined.
         | 
         | http://port70.net/~nsz/c/c11/n1570.html#6.5.5p6
        
       | lultimouomo wrote:
       | Beware that this is very much not a "C Integer quiz", but a
       | "ILP32/LP64 Iinteger quiz". As sono as you move not to some weird
       | exotic architecture, but simply to 64bit Windows(!!) some quiz
       | answers will not hold.
       | 
       | For a website meant to educate programmers on C language gotchas,
       | this is a pretty lackluster effort.
       | 
       | Even the initial disclaimer, "assume x86/x86-64 GCC/CLang", is
       | wrong, as the compiler does not have anything to do with integer
       | widths.
        
         | flykespice wrote:
         | My impression is this wasn't muxh to educate programmers on C
         | language gotchas but to remind you just how messy and fragile
         | this language is (so you can stay far away).
        
           | lultimouomo wrote:
           | One more reason not to give the impression that you can
           | assume that long is wider than int which is wider then short!
        
         | ok123456 wrote:
         | The quiz is wrong because they assume that the length of short
         | is less than int in some questions. A short can be the same
         | length as an int. It just can't be longer.
        
       | analog31 wrote:
       | One of my friends likes to scold me about using a programming
       | language (Python) that doesn't enforce type declarations. I
       | gently remind him that his language (C) has eight different types
       | of integers.
        
         | einpoklum wrote:
         | { char, short, int, long, long long } x { signed, unsigned } =
         | 10 types, and then there's bool and maybe int128_t, so maybe
         | 12.
        
         | Aardwolf wrote:
         | Heh, it has _way_ more than 8.
         | 
         | For char, you have 3: signed char, unsigned char and char. It's
         | not specified if char without keyword is signed or unsigned.
         | 
         | You have integer types such as size_t, ssize_t and ptrdiff_t.
         | They may, under the hood, match one of the other standard int
         | types, however this differs per platform, so you can't e.g.
         | just easily print size_t using the standard printf formatters,
         | you really have to treat is as its own type. Also wchar_t and
         | such of course.
         | 
         | Then you have all the integers in stdint.h and inttypes.h. Same
         | here applies as for size_t. At least you know how many bits you
         | get from several of them, unlike from something like "long".
         | 
         | Then your compiler may also provide additional types such as
         | __int128 and __uint128_t.
        
           | spc476 wrote:
           | > so you can't e.g. just easily print size_t using the
           | standard printf formatters
           | 
           | This has been fixed in C99. For size_t, it's "%zu", for
           | ptrdiff_t it's "%td", for ssize_t it's "&zd" and for wchar_t,
           | it's "&lc".
        
       | veltas wrote:
       | Third question is wrong. It's implementation defined, because
       | unsigned short might be the same rank as unsigned int in some
       | implementations, in which case it remains unsigned when promotion
       | occurs.
        
         | jwilk wrote:
         | They may have the same width, but not the same rank.
         | 
         | From C99 SS6.3.1.1:
         | 
         | > _-- The rank of long long int shall be greater than the rank
         | of long int, which shall be greater than the rank of int, which
         | shall be greater than the rank of short int,_ [...]
         | 
         | > _-- The rank of any unsigned integer type shall equal the
         | rank of the corresponding signed integer type, if any._
        
           | [deleted]
        
           | moefh wrote:
           | That's true, but it doesn't matter to the point being made.
           | 
           | According to the standard[1], if short and int have the same
           | size (even if not the same rank) both numbers are converted
           | to unsigned int (that is, the unsigned integer type
           | corresponding to int) because int can't represent all values
           | of unsigned short.
           | 
           | The usual arithmetic conversions never "promote" an unsigned
           | to signed if doing so would change the value.
           | 
           | [1] http://port70.net/~nsz/c/c99/n1256.html#6.3.1.8
        
         | rwmj wrote:
         | He does say on the first page: _All other things being equal,
         | assume GCC /LLVM x86/x64 implementation-defined behaviors._ Are
         | there any normal, modern machines where short and int are the
         | same size? I think the last machine where that was true was the
         | 8086.
        
           | codeflo wrote:
           | Define "machine". All kinds of microcontrollers are
           | programmed with C.
        
           | moefh wrote:
           | That's still a bad question: the behavior is implementation-
           | defined according to the standard, so having "implementation-
           | defined" as a wrong option is ambiguous and confusing.
           | 
           | The question would be fine (given that note) if
           | "impementation-defined" was not an option, like for example
           | the question about "SCHAR_MAX == CHAR_MAX".
        
           | fuckstick wrote:
           | > I think the last machine where that was true was the 8086.
           | 
           | The last machine with a word size of 16 bits? The 286 was as
           | well. There were others like the WDC 65816 (the Apple IIgs
           | and SNES CPU).
           | 
           | It just so happens that there simply far fewer 16 bit CPUs
           | then there were 8 or 32 (or "32/16" like the 68k). Also 8 bit
           | CPUs are simply a poor fit for C by their nature and the
           | assumptions C makes. But the numerous ones still relevant
           | today will use 16 bit ints.
           | 
           | The use of Real or V86 mode on the x86 went on for many years
           | after the demise of the 8086. I think it is a somewhat of a
           | joke at this point that they're teaching Turbo C in some
           | developing countries.
        
           | veltas wrote:
           | On x86 systems like 8086 short and int were the same size.
           | And that appears to be a footnote they've added after being
           | called out for being wrong. The question gives
           | "implementation defined" as an option and in other questions
           | seems to specify the ABI, and in some assumes it without
           | saying again. Very inconsistent, they should fix their quiz
           | really.
           | 
           | The last x86 processor I know of to have these sizes is
           | 80286.
        
             | galangalalgol wrote:
             | Avr microcontrollers still have 16bit ints, probably 8051
             | and pic too, but I don't use those. Lots of people do
             | though. TI dsp uses 48 bit long, so don't count on int and
             | long being the same either.
        
       | MauranKilom wrote:
       | _> What does the expression SCHAR_MAX == CHAR_MAX evaluate to?_
       | 
       |  _> Sorry about that -- I didn 't give you enough information to
       | answer this one. The signedness of the char type is
       | implementation-defined_
       | 
       | ...why not have an "implementation-defined" answer button then,
       | because that's what people should know (instead of knowing all
       | ABIs) and what the question is about anyway?
       | 
       |  _> If these operators were right-associative, the expression [x
       | - 1 + 1] would be defined for all values of x._
       | 
       | That's just wrong, no? If + and - were right-associative, it
       | would be parsed as x - (1 + 1), which is decidedly not "defined
       | for all values of x".
        
         | shultays wrote:
         | I raised an eyebrow on that as well, in some questions you have
         | implentation defined for an answer so I assumed the author just
         | wrongly assumed undefined covers that
        
         | necovek wrote:
         | Other questions might also be "implementation defined", thus a
         | caveat to assume GCC/LLVM implementations on x86/amd64 at the
         | start.
         | 
         | I.e. C standard enforces undefined very sparingly iirc, and
         | most of the corner cases are implementation defined.
         | 
         | I may also be misremembering things: it's been 20 years since
         | I've carefully read C99 (draft, really) for fun :))
        
           | kevin_thibedeau wrote:
           | Even the compiler assumption isn't enough. There is also an
           | implicit assumption in the answers that x86-64 is using LP64
           | when some platforms will use LLP64 (Windows) or ILP64 (Cray).
        
       | 6a74 wrote:
       | Neat quiz. Reminds me that the absolute value of INT_MIN in C
       | (and many other languages) is undefined, but will generally still
       | return a negative value. This is a "gotcha" that a lot of people
       | are unaware of.
       | 
       | > abs(-2147483648) = -2147483648
        
         | lizardactivist wrote:
         | A consequence of most, if not all, CPUs today using two's
         | complement integers.
         | 
         | I think one's complement is more sensible since it doesn't have
         | this problem, but it loses out because it requires a more
         | complex ISA and implementation.
        
           | [deleted]
        
           | nayuki wrote:
           | Ones' complement (correct spelling) has negative zero, which
           | I would argue is a far worse problem.
        
             | xigoi wrote:
             | Why in the fuck is it "ones' complement", but "two's
             | complement"?
        
               | tsimionescu wrote:
               | According to Wikipedia:
               | 
               | > The name "ones' complement" (note this is possessive of
               | the plural "ones", not of a singular "one") refers to the
               | fact that such an inverted value, if added to the
               | original, would always produce an 'all ones' number
        
           | veltas wrote:
           | It's annoying that negation, ABS, and division can overflow
           | with two's complement. But how I look at it: lots of
           | operations can already overflow, just a fact of signed
           | integers, and you need to guard against that overflow in
           | portable code already. It doesn't seem to be fundamentally
           | worse that those extra operations can overflow.
        
         | shultays wrote:
         | It is undefined since it involves integer overflow
        
       | Cu3PO42 wrote:
       | One thing to note is that long int is the same size as int on x64
       | Windows, at least in the MSVC ABI. clang also conforms to this.
       | 
       | This is relevant to the question asking if -1L > 1U.
        
         | auxym wrote:
         | same with ARM-EABI (32bit cortex-M MCUs). Int and long int are
         | both 32 bits.
        
           | MobiusHorizons wrote:
           | I think that is expected (it's what happens for x86 as well)
           | what's surprising about the parent is that long is apparently
           | 32 bit on windows x64. Usually long should be 64 bit on a
           | machine with 64 bit words
        
       | boxfire wrote:
       | It's very worth pointing out and in fact advocating for the
       | compiler options -fwrapv and -ftrapv
       | 
       | Which both make signed integers take the expected two's
       | complement behavior, or traps signed integer overflow,
       | respectively.
       | 
       | N.B. left shifting into the sign bit is still undefined
       | behavior... On x86-64 gcc and clang seem to perform the shift as
       | if the number is interpreted unsigned, then shifted, then
       | interpreted signed.
        
       | forrestthewoods wrote:
       | In my opinion the three original sins of C are: nullptr, null-
       | terminated strings, and implicit conversions.
       | 
       | Abolish those three concepts and C is a significantly improved
       | language!
        
         | pjmlp wrote:
         | no bounds checkings, enums are hardly better than #define,
         | pointer decay....
        
       | kazinator wrote:
       | Found a disappointing bug in the test:                 (unsigned
       | short)1 > -1
       | 
       | Correct answer is: implementation-defined.
       | 
       | The left operand of > has type _unsigned short_ , before
       | promotion. On C implementations for today's popular, mainstream
       | machines, _short_ is narrower than _int_ ; therefore, whether it
       | is signed or unsigned it goes to _int_.
       | 
       | In that common case, we are doing a 1 > -1 comparison in the
       | _int_ type.
       | 
       | However, _unsigned short_ may be exactly as wide as int, in which
       | case it cannot promote to _int_ , because its values do not fit
       | into that type. It promotes to _unsigned int_ in that case. Both
       | sides will go to _unsigned int_ {*}, and so we are comparing 1 >
       | UINT_MAX which is 0.
       | 
       | Maybe the author should name this "GCC integer quiz for 32 bit
       | x86", and drop the harder choices like "implementation-defined".
       | 
       | ---
       | 
       | {*} It's more nuanced here. If we have a signed and unsigned
       | integer operand of the same rank, it could go either way. If the
       | unsigned type has a limited range so that all its vallues are
       | representable in the signed type, then the unsigned type goes to
       | signed. My remark represents only the predominant situation
       | whereby signed and unsigned types of the same rank have
       | overlapping ranges that don't fit into each other: the unsigned
       | version of a type not only lacks negatives, but has extra
       | positives.
        
         | mort96 wrote:
         | The site says:
         | 
         | > All other things being equal, assume GCC/LLVM x86/x64
         | implementation-defined behaviors.
        
         | jwilk wrote:
         | Discussed in this thread:
         | 
         | https://news.ycombinator.com/item?id=32712578
        
       | phist_mcgee wrote:
       | I know this is a tangential comment, but the fact that the site
       | author went to the effort to add a css reset, and then _doesn 't_
       | go ahead and add in any kind of margins is pretty bizarre.
       | 
       | The fact that the text butts up to the left of the page with no
       | margin is pretty incredible.
        
       | loeg wrote:
       | The quiz is a little broken. CHAR_MAX is implementation defined
       | but for some reason that question doesn't have that option, even
       | though other questions do.
        
       | ynfnehf wrote:
       | Another fun one:                 struct { long unsigned a : 31; }
       | t = { 1 };
       | 
       | What is t.a > -1? What about if a is instead a bit-field of width
       | 32? (Assuming the same platform as in the quiz.)
        
       | sposeray wrote:
        
       | manaskarekar wrote:
       | Tangentially related, https://cppquiz.org/, for C++.
        
       | santimoller66 wrote:
        
       | RedShift1 wrote:
       | I only have cursory experience with C, mostly by programming some
       | Arduino stuff, so I can only add this: good god this is a
       | minefield.
        
         | rwmj wrote:
         | I've been a professional C programmer for nearly 40 years and
         | didn't do especially well (I got about 2/3rds right). In my
         | view it doesn't really matter. Enable maximum warnings and fix
         | everything the compiler warns about. I also use Coverity once
         | in a while, and every test is run with and without valgrind. If
         | you do that you'll be fine.
        
           | adastra22 wrote:
           | Yeah, in practice these sort of gotchas ought to trigger
           | warnings. And for a C/C++ developer, warnings should never be
           | allowed to persist in a code base.
        
           | RedShift1 wrote:
           | I thought some deductive reasoning and thinking about how
           | things work at the byte level would save me. Computer said
           | no.
        
             | nayuki wrote:
             | When you write C programs, you are coding on the C abstract
             | machine as specified by the language standard. If your code
             | triggers signed integer overflow, then the C abstract
             | machine says that it's undefined behavior, and the compiler
             | and runtime can make your code behave however it wishes. It
             | doesn't matter that the underlying concrete machine (POSIX,
             | x86 ISA, etc.) has wraparound signed overflow, because your
             | code has to interact with both the abstract machine and the
             | concrete machine at the same time. See:
             | https://www.youtube.com/watch?v=ZAji7PkXaKY
        
               | petergeoghegan wrote:
               | > When you write C programs, you are coding on the C
               | abstract machine as specified by the language standard.
               | 
               | Are you really, though? I would argue that it's a matter
               | of perspective and/or semantics.
               | 
               | The Linux kernel is built with -fwrapv and with -fno-
               | strict-aliasing, and uses idioms that depend on it
               | directly. We can surmise from that that the kernel must
               | be:
               | 
               | 1. Exhibiting undefined behavior (according to a literal
               | interpretation of the standard)
               | 
               | OR:
               | 
               | 2. Not written in C.
               | 
               | Either way, it's quite reasonable to wonder just how much
               | practical applicability your statement really has in any
               | given situation -- since you didn't have any caveats.
               | It's not as if the kernel is some esoteric, obscure case;
               | it's arguably the single most important C codebase in the
               | world. Plus there are plenty of other big C codebases
               | that take the same approach besides Linux.
               | 
               | Lots of compiler people seem to take the same hard line
               | on the issue -- "the C abstract machine" and whatnot. It
               | always surprises me, because it seems to presuppose that
               | the _only_ thing that matters is what the ISO standard
               | says. The actual experience of people working on large C
               | codebases doesn 't seem to even get _acknowledged_. Nor
               | does the fact that the committee and the people that work
               | on compilers have significant overlap.
               | 
               | I'm not claiming that "low-level C hackers are right and
               | the compiler people are wrong". I'm merely pointing out
               | that there is a vast cultural chasm that just doesn't
               | seem to be acknowledged.
        
           | nayuki wrote:
           | > In my view it doesn't really matter.
           | 
           | One of these days, the compiler will do something surprising
           | to one of your expressions involving signed integer overflow,
           | like converting x < x + 1 to true. Or it'll delete a whole
           | loop because it noticed that your code is guaranteed to
           | trigger an out-of-bounds array read, e.g.:
           | https://devblogs.microsoft.com/oldnewthing/?p=633
           | 
           | > If you do that you'll be fine.
           | 
           | I would not trust code written based on the methodology you
           | described. However, if you also add
           | -fsanitize=undefined,address (UBSan and ASan) and pass those
           | tests, then I would trust your code.
        
         | bluetomcat wrote:
         | Almost all of these examples would be sloppy programming in
         | real code. The "int", "short", "long", "long long" and "char"
         | types are generic labels which shouldn't be used where size and
         | signedness matters. For a guarantee of size and signedness one
         | should use the "[u]intXX_t" types. For sizes and pointer
         | arithmetic - size_t. For subtracting pointers - ptrdiff_t. You
         | simply wouldn't have most of these issues if you stick to the
         | correct types.
        
           | rwmj wrote:
           | Absolutely. _int_ is a  "code smell" for us, except in some
           | well-defined cases (storing file descriptors for example). If
           | someone is using int to iterate a loop, then the code is more
           | usually wrong than right, they should be using size_t.
        
             | fizzynut wrote:
             | I don't think a for loop using an int is bad or even "more
             | wrong than right". If anything int is much better than
             | using size_t.
             | 
             | Using an integer, in the 1000s of for loops I've written,
             | none get even remotely close to the billions - it is
             | optimizing for a 1 in a million case, and if I know
             | something can run into the billions of iterations I'm going
             | to pay more attention to anyway. I've seen 0 occurrences of
             | bugs relating to this kind of overflow.
             | 
             | Using a size_t, it is effectively an unsigned integer that
             | risks underflowing which can easily cause bugs like
             | infinite loops if decrementing or other bugs if doing any
             | index arithmetic. I've seen many occurrences of these kind
             | of bugs.
        
               | rwmj wrote:
               | On Linux/x86-64 int is 31 bits, so you've probably
               | introduced "1000s" of security bugs where the attacker
               | only needs the persistence to add 2 billion items to a
               | network input, local file or similar, to generate a
               | negative pointer of their choosing.
               | 
               | Any such code submitted to one of our projects would be
               | rejected or fixed to use the proper type.
        
               | fizzynut wrote:
               | Please don't make this adversarial.
               | 
               | I've not introduced a security bug in every for loop I've
               | written. What I've written shouldn't be controversial,
               | just take a look at Googles style guide:
               | 
               | "We use int very often, for integers we know are not
               | going to be too big, e.g., loop counters. Use plain old
               | int for such things. You should assume that an int is at
               | least 32 bits, but don't assume that it has more than 32
               | bits. If you need a 64-bit integer type, use int64_t or
               | uint64_t.
               | 
               | For integers we know can be "big", use int64_t.
               | 
               | You should not use the unsigned integer types such as
               | uint32_t, unless there is a valid reason such as
               | representing a bit pattern rather than a number, or you
               | need defined overflow modulo 2^N. In particular, do not
               | use unsigned types to say a number will never be
               | negative. Instead, use assertions for this.
               | 
               | If your code is a container that returns a size, be sure
               | to use a type that will accommodate any possible usage of
               | your container. When in doubt, use a larger type rather
               | than a smaller type.
               | 
               | Use care when converting integer types. Integer
               | conversions and promotions can cause undefined behavior,
               | leading to security bugs and other problems."
        
           | scatters wrote:
           | The true types (int, unsigned etc) are correct for local
           | variables, since they describe registers. The sized aliases
           | are correct for struct fields and for arrays in memory.
           | 
           | You should prefer signed types for computing (if not storing)
           | sizes and for pointer arithmetic, since they are more
           | forgiving with underflow.
        
           | mananaysiempre wrote:
           | Unfortunately, even if you're using uint16_t (which,
           | remember, the C standard _does not guarantee exists_ ) on a
           | platform with (say) 32-bit integers, you're still actually
           | computing in _signed int_ due to promotion.
        
       | dave84 wrote:
       | Didn't notice I got some wrong because the green color made me
       | think I got them right.
        
         | ape4 wrote:
         | Yeah that's a UX bug in the quiz
        
         | jbverschoor wrote:
         | That's to keep the obscurity vibe
        
         | gonzo41 wrote:
         | Just like programming in C. You think you got it right, but
         | then bugs...
        
           | pjmlp wrote:
           | Or thinking "how my compiler does it" == ISO C.
        
       | junon wrote:
       | Just a note for the author: The Green background on wrong answers
       | was entirely confusing. Thought I was a C god at first.
        
       | CamperBob2 wrote:
       | Correct answer to most of these questions: "I don't know, and I
       | don't care, because it would never even cross my mind to write
       | code in which most of these questions would come up. If forced to
       | review or debug code written by someone else who knew the answers
       | and leveraged that knowledge, I would either complain about it
       | loudly or rewrite it quietly."
       | 
       | C, like Rome, is a wilderness of tigers. Why ask for trouble?
        
       | up2isomorphism wrote:
       | These examples do look scary. However, in my 16 years of
       | experience of mostly writing In C, I have never found a single
       | chance that I need to actually use the expressions that I got the
       | answers wrong.
       | 
       | Never mix unsigned and signed integer comparison unless you know
       | exactly what you are doing.
       | 
       | And never do arithmetic on boundaries definitions like INT_MAX,
       | they are boundaries , why you need to compute a derived value on
       | boundaries?
       | 
       | If you do not need arithmetic behavior, do not use signed
       | integer, use unsigned. Because computer does not understand sign,
       | so if you do not need a sign, do not use it.
       | 
       | You do not need to be a language committee member to write C ,
       | you just need to understand the reason behind its design.
        
         | bee_rider wrote:
         | Is it your experience that there are many projects where you
         | can reliably say some variable will never need arithmetic
         | behavior?
         | 
         | I don't have 16 years of experience writing C, but
         | 
         | > If you do not need arithmetic behavior, do not use signed
         | integer, use unsigned
         | 
         | does seem in a roundabout way to match the advice that I've
         | gotten from other folks -- usually stick to signed (because you
         | never know if somebody is going to want to use an integer in,
         | say, a downwards loop). Your comment just seems to highlight
         | the less usual case, where you can be sure that nobody will
         | ever need that arithmetic behavior... maybe it depends on the
         | type of applications, though.
        
           | up2isomorphism wrote:
           | It always depends on the area of focus. You can also
           | consistently choose the default int type to signed while
           | being fully aware of you lose half range but gain signed
           | integer arithmetic. For me I am mostly doing system and
           | networking so I tends to to unsigned. The key is to choose a
           | consistent default while aware of that you are in a non-
           | intuitive zone when that default type puts you in.
        
         | tredre3 wrote:
         | > If you do not need arithmetic behavior, do not use signed
         | integer, use unsigned. Because computer does not understand
         | sign, so if you do not need a sign, do not use it.
         | 
         | That's interesting. In my career (embedded development) I've
         | learned to do the opposite. Always use signed unless you have a
         | reason not to. Even if a value can't naturally be negative, use
         | signed. Use unsigned only if you need the extra bit, or if
         | you're doing bitwise operations.
         | 
         | > Because computer does not understand sign
         | 
         | Computers understand signs just fine, we're long past the days
         | of the 6502 N-flag being a glorified bit 7 check. All CPUs have
         | signed instructions.
        
           | up2isomorphism wrote:
           | > That's interesting. In my career (embedded development)
           | I've learned to do the opposite. Always use signed unless you
           | have a reason not to. Even if a value can't naturally be
           | negative, use signed. Use unsigned only if you need the extra
           | bit, or if you're doing bitwise operations.
           | 
           | In this case, since you make sure that the extra size gained
           | by unsigned is not important to you all, then you can also go
           | with signed by default. Basically it is the tradeoff between
           | 1. robust in majority of the use cases 2. capability to do
           | signed arithmetic 3. additional positive integer range. As
           | long as you make a consistent selection and be mindful when
           | you are in the danger zone, it can be handled.
        
           | WaffleIronMaker wrote:
           | > Always use signed unless you have a reason not to. Even if
           | a value can't naturally be negative, use signed.
           | 
           | Can you elaborate on what benefits this approach has? I would
           | feel that, especially when a number cannot be negative,
           | unsigned integers seem like a proper representation of the
           | data?
        
             | gugagore wrote:
             | Here is an example: https://wesmckinney.com/blog/avoid-
             | unsigned-integers/
        
         | Sirened wrote:
         | > And never do arithmetic on boundaries definitions like
         | INT_MAX, they are boundaries
         | 
         | I'd argue something stronger: if you care about boundaries like
         | INT_MAX, you should never be comparing them using your regular
         | comparison tools. I.e., even though there are correct ways to
         | compute whether x + 1 will overflow, don't bother trying to do
         | that and instead always use __builtin_add_overflow since you
         | can't fuck it up. Trying to do these sorts of edge checks are
         | incredibly hard and it has lead to numerous security
         | vulnerabilities due to checks being optimized out. The builtins
         | do exactly what they say and you don't have to worry about UB
         | blowing your foot off.
        
         | jcelerier wrote:
         | The problem is that for unsigned one of the boundaries is 0,
         | which is an extremely common number - there isn't a couple
         | months where I don't find a bug due to a size-1 somewhere
        
           | 10000truths wrote:
           | Subtraction would presumably fall under the arithmetic
           | behavior that OP was talking about.
        
         | pantalaimon wrote:
         | The compiler will also warn about these cases where you better
         | be explicit to avoid undesired behavior.
        
       | Nokinside wrote:
       | The first rule of C good programming "Treat warnings as errors".
       | Compile with all warnings enabled, selectively ignore only
       | warnings you know for sure are not important for program
       | semantics.
       | 
       | Simple MISRA C with all warnings on is already close to language
       | with strict type checking. C compilers give you a choice, use it.
       | If you are programming for money, good static analyzer makes it
       | possible to write safety critical code.
        
         | pjmlp wrote:
         | Worth noting Dennis own words,
         | 
         | "Although the first edition of K&R described most of the rules
         | that brought C's type structure to its present form, many
         | programs written in the older, more relaxed style persisted,
         | and so did compilers that tolerated it. To encourage people to
         | pay more attention to the official language rules, to detect
         | legal but suspicious constructions, and to help find interface
         | mismatches undetectable with simple mechanisms for separate
         | compilation, Steve Johnson adapted his pcc compiler to produce
         | lint [Johnson 79b], which scanned a set of files and remarked
         | on dubious constructions."
         | 
         | Dennis M. Ritchie -- https://www.bell-
         | labs.com/usr/dmr/www/chist.html
         | 
         | Unfortunately too many think they know better than the language
         | authors themselves.
        
           | protomikron wrote:
           | What is the history of "undefined behavior" [in C compilers
           | and the standard] in general? I suppose originally it was
           | supposed to guide compiler engineers, but we all know that
           | backfired, as many compilers try to exploit undefined
           | behavior to optimize code, but that can be problematic in
           | security sensitive code (e.g. if uninitialized memory is
           | optimized away) - there have been discussions between
           | security engineers, kernel developers and GCC hackers about
           | how to implement/interpret the standard.
           | 
           | Would it be possible to have a standard, where undefined
           | behavior is just a compile error? What would we lose - apart
           | from legacy compatibility?
        
             | jcranmer wrote:
             | > Would it be possible to have a standard, where undefined
             | behavior is just a compile error?
             | 
             | No [if you're aiming for something in the same vein as C].
             | Undefined behavior is ultimately an inherently dynamic
             | property--certain values could make a statement execute
             | undefined behavior, and consequently, virtually every
             | statement could potentially cause undefined behavior. Note
             | that this remains true even in languages like Rust: Rust
             | has _loads_ of undefined behavior, but you do have to wrap
             | code in unsafe blocks to potentially cause undefined
             | behavior.
             | 
             | > What would we lose - apart from legacy compatibility?
             | 
             | In particular, it is clear at this point that if you want
             | to permit converting integers to pointers, you will either
             | have to live with undefined behavior (via pointer
             | provenance) or forgo basically _all_ optimization
             | whatsoever.
        
             | pjmlp wrote:
             | C sucked on 8 and 16 bit home computers, to be fair, all
             | high level systems programming languages had their own set
             | of issues regarding optimal code generation, thus Assembly
             | was the name of the name for ultimate performance.
             | 
             | UB started as means to not kick out computer architectures
             | that would otherwise not be able to be targeted by fully
             | compliant ISO C compilers.
             | 
             | Given that C prefers to be a kind of portable macro
             | assembler than care about security, it was only a matter of
             | time until those escape hatches started to be taken
             | advantage for optimizations.
             | 
             | Same applies to other languages, however since their
             | communities tend to prefer security before ultimate
             | performance, some optimization paths are not considered as
             | that would hurt their safety goals.
             | 
             | In what concerns C, C++ and Objective-C, dropping UB
             | optimizations would mean going back to the 1990's in terms
             | of code quality.
        
             | robryk wrote:
             | Would                  int foo(int x, int y) { return x+y;
             | }
             | 
             | compile? After all, this function can be called in a way
             | that causes UB.
        
             | Veliladon wrote:
             | That's pretty much what Rust was created to do.
        
               | pjmlp wrote:
               | Like many others before it, hopefully it gets more
               | adoption this time.
        
         | wyldfire wrote:
         | Unfortunately some of this stuff just can't be detected
         | statically. So while warnings are an excellent starting point,
         | I recommend also building and testing with UBSan+ASan enabled.
        
         | veltas wrote:
         | I do agree about compiler warnings, but MISRA C imposes a lot
         | of unnecessary rules, a lot of unhelpful rules on how you write
         | expressions, and tries to also act like C's type system works a
         | different way than it does. In practice I have found it to
         | actually create bugs. Read the MISRA rules and appendices on
         | their effective type model and the C standard, they have gaps
         | where MISRA actually forces you to write code that looks
         | correct and doesn't work with C's model. I strongly recommend
         | against using MISRA, even in automotive or aviation code
         | (although it may unfortunately be a requirement on any such
         | project).
        
       | lizardactivist wrote:
       | I really hate C.
        
         | qznc wrote:
         | I made a similar test for floating point:
         | https://beza1e1.tuxen.de/no_real_numbers.html
         | 
         | Since most languages use the same IEEE-754, you can hate them
         | all.
        
           | lizardactivist wrote:
           | Unbiased, equal hate, fully in line with today's political
           | correctness? Fine, I really hate programming languages.
        
       | ananonymoususer wrote:
       | I have an issue with this one: Assume x has type int . Is the
       | expression x<<32 ...
       | 
       | Defined for all values of x
       | 
       | Defines for some values of x
       | 
       | Defined for no values of x
       | 
       | (chose second answer) Wrong answer
       | 
       | Shifting (in either direction) by an amount equalling or
       | exceeding the bitwidth of the promoted operand is an error in
       | C99.
       | 
       | So according to Wikipedia:
       | https://en.wikipedia.org/wiki/C_data_types
       | 
       | int signed signed int Basic signed integer type. Capable of
       | containing at least the [-32,767, +32,767] range.[3][a]
       | 
       | So a minimum of 16 bits is used for int, but no maximum is
       | specified. Thus, if my C compiler on my 64-bit architecture uses
       | 64 bits for int, this is perfectly allowed by the specification
       | and my answer is correct.
        
         | nayuki wrote:
         | The top of the page says:
         | 
         | > All other things being equal, assume GCC/LLVM x86/x64
         | implementation-defined behaviors.
         | 
         | However, you're right to point out that the quiz would be
         | better if you can't make any more assumptions than guaranteed
         | by the basic language standard.
        
       ___________________________________________________________________
       (page generated 2022-09-04 23:00 UTC)