[HN Gopher] C Integer Quiz ___________________________________________________________________ C Integer Quiz Author : rwmj Score : 247 points Date : 2022-09-04 11:45 UTC (11 hours ago) (HTM) web link (www.acepace.net) (TXT) w3m dump (www.acepace.net) | nayuki wrote: | This tool I recently made can help: | https://www.nayuki.io/page/summary-of-c-cpp-integer-rules#ar... | | For example, you can see what (signed long) > (unsigned int) | would convert to, under different environment bit width | assumptions. | | Also, there's a discussion on Reddit: | https://www.reddit.com/r/cpp/comments/x4x01f/cc_arithmetic_c... | jart wrote: | Another interesting thing about C integers that the quiz doesn't | cover is that remainder is not modulus. For example, in Python: | >>: 2 % -5 -3 | | But in C: 2 % -5 == 2 | | If you want to use modulus in C rather than Euclidean remainder, | then you have to use a function like this, which does what Python | does: long mod(long x, long y) { if | (y == -1) return 0; return x - y * (x / y - (x % y && | (x ^ y) < 0)); } | cozzyd wrote: | One of my favorite tables: | https://en.m.wikipedia.org/wiki/Modulo_operation#In_programm... | marshallward wrote: | Fortran supports both of these, with `mod` as the C-like | truncated modulo and `modulo` as the floored modulo. Having | both is convenient, but you do get errors from people who don't | realize the difference | mtreis86 wrote: | Common Lisp as well, Rem and Mod | qsort wrote: | Mathematically speaking, the mod is usually taken to be | positive. Any reasonable definition would involve taking the | quotient of Z over the congruence relation, so I can see both | -3 and 2 being reasonable conventions to represent partitions. | | OTOH, the largest "wat" of C-like languages is the following: | > -8 % 5 -3 | | why | | Python here is once again correct: >>> -8 % 5 | 2 | veltas wrote: | Mathematically speaking if you ask for the modulus of -5 you | may as well get a negative number, and I think people may | validly have their own interpretations of what's "correct" at | this point. | qsort wrote: | I didn't say it's wrong. | | You may as well use {white, blue, black, red, green} to | represent congruences mod 5, mathematically speaking it's | not wrong, as long as they respect the axioms of a field. | | What's "wat" (in the same sense as the js "wat" talk) is | that the answer changes if the argument becomes negative. | By all reasonable definitions of modular arithmetic, -8 and | 7 are in the same class mod 5. Why is then: | > (-8 % 5) == (7 % 5) false | | in C-like languages? I'm pretty sure it's perfectly | consistent with all the relevant language standards, but | it's a "wat" nonetheless. | masklinn wrote: | > What's "wat" (in the same sense as the js "wat" talk) | is that the answer changes if the argument becomes | negative. | | It's the same in Python though? you just prefer the way | it behaves (though to be fair it's generally more useful | and less troublesome): Python's remainder follows the | sign of the divisor (because it uses floored division), | while C's follows the sign of the dividend (because it | uses truncated division). | | Strictly speaking, neither is an euclidian modulo. | owl57 wrote: | It's not symmetric. I believe most people with strong | mathematical background (like Guido:)) expect lambda x: x | % m to be some "computer approximation" to the standard | mapping from Z to Z/mZ, while not having deep | expectations about lambda m: x % m. | veltas wrote: | The really useful fact of C's remainder operator is that: x == | x / y * y + x % y, provided nothing overflowed. This is usually | what you want when doing accurate calculations with integer | division. | xigoi wrote: | The modulo operation together with floor division also has | this property. And unlike C's operators, it also has the | useful property that (x + y) % y == x % y. | temac wrote: | Undefined behavior is probably the worst way we can imagine to | define constraints usable by optimizers. Sad that major languages | and implementations went this way. | Karellen wrote: | Yeah, I think a lot of C's `undefined behaviour` semantics | around assignments to ints should be reconsidered and changed | to `unspecified` or `implementation defined` behaviour, and | compilers can just do "whatever the hardware does". If that | includes traps on one arch, fine, let it trap. | | I think `undefined behaviour` still has its place in C - | dereferencing a freed pointer comes to mind as an obvious | example - but I think a good proportion of the really | unintuitive UB conditions could be made saner without | sacrificing portability or optimisation opportunities. | owl57 wrote: | If you extend "unspecified" to allow traps, reading a bogus | pointer can also be unspecified, only writes undefined in the | current sense. | Karellen wrote: | I don't think so. With "unspecified" and "implementation- | defiend" behaviours, the implementation has to pick a | behaviour and be consistent with it. The difference is | whether they have to document that behaviour or not. | | If the hardware traps on bogus pointers, then reading a | bogus pointer may trap. But if you read a recently-freed | pointer, it may still be valid according to the hardware | (e.g. will have valid PTEs into the processes address | space) so won't trap. Therefore you won't be able to | guarantee any particular behaviour on an invalid read, so I | don't think you'd be able to get away with "unspecified" or | "implementation defined" behaviour on most hardware. | owl57 wrote: | AFAIR reading uninitialized int, for example, is | "unspecified" (any value could be there). If we consider | adding "implementation defined with possible trap" for | overflow, we might as well add "unspecified with possible | trap" for reading an invalid pointer (any value could be | there, or it could trap, but no nasal demons). | temac wrote: | Using unitialized values is UB. Going back to naive | pointers is a lost cause because compilers have started | to do crazy optims (not even currently allowed by the | standards...) like origin analysis. You can't steal that | toy from the people implementing optims. | UncleMeat wrote: | "Implementation defined" is worse in a lot of ways. Now the | compiler has way less authority to tell you to stop! And we | still have the problem of the application probably not doing | what you want. | temac wrote: | Current compilers don't "tell you to stop". They silently | transform your program into complete garbage without even | causality restraints. The sanitizers can help but they are | merely dynamic when the dangerous tranformations are static | in the first place. | UncleMeat wrote: | It is true that the language has failed to provide tools | that help developers prevent UB and that this is a very | bad thing for the ecosystem. It _can_ change, though. | | In practice, compilers aren't actually adversarial. A lot | of the discussion around UB is catastrophizing and talks | about how the compiler will order you pizza or delete | your disk. Some problems are real and I know some people | whose graduate school work was very specifically on the | problems that this causes for security-related checks but | compilers really really do not transform your program | into "complete garbage." They transform your program into | a program with a bug, which is a true statement about | that program. | | I'm reminded of the apocryphal story about being asked | how a computer knows how to do the right thing if given | the wrong inputs. This feels similar. | UncleMeat wrote: | I do agree that the story around UB in C and C++ sucks. But UB | doesn't exist because the compiler engineers want to be able to | stick in optimizations. Once they are in the spec it makes | sense to follow the rules but most UB comes from a desire for | portability and to not privilege one platform over another. | | And further, defining a lot of UB won't actually improve | things. Imagine we define signed integer overflowing behavior. | Hooray. Now your program just has a _different_ bug. If you 've | accidentally got signed integer overflow in your application | then "did some weird things because the compiler assumed it | would never overflow" is going to cause exactly the same amount | of havoc as "integer overflowed and now your algorithm is | almost certainly wrong." | nayuki wrote: | > And further, defining a lot of UB won't actually improve | things. Imagine we define signed integer overflowing | behavior. Hooray. Now your program just has a different bug. | | This is exactly what JF Bastien argues in his hourlong talk: | https://youtu.be/JhUxIVf1qok?t=2284 | | So yeah, defining signed integer overflow isn't going to fix | bugs in the vast majority of existing programs. | | That being said, enforcing signed overflow wraparound at | least makes debugging easier because it's reproducible. This | is how it is in Java land - int overflow wraps around, and | integer type widths are fixed, so if you trigger an overflow | while testing then it is reliably reproducible across all | conforming Java compiler and VM versions and platforms. | UncleMeat wrote: | It also makes tools less able to loudly yell at you for | having it in your application. Yes, you can turn on | optional warnings if you want but if you've got one or two | intended uses for this behavior then now you need | suppressions and all sorts of mess. | gary_0 wrote: | I got most of them right. My god, what have I become? | [deleted] | BirAdam wrote: | I have used C quite a bit, and I'm amazed at how many I got | wrong. Nice example of why C is hated by so many I suppose. | | Personally, I love C but I don't use it in anything serious. | Probably for the best given I apparently cannot remember how C | ints work. | [deleted] | [deleted] | antirez wrote: | Surprised I got all the questions right because I tend to code | _around_ those limits, sticking to defining always safe types for | the problem at hand, in all the cases I 'm not sure about ranges, | possibile overflows and so forth. What I mean is that if you have | to remember the rules in a piece of code you are writing, it is | better to rewrite the code with new types that are obviously | safe. | bornfreddy wrote: | Admittedly my C is (very) rusty, but I was surprised at how few | of the answers I knew, so your comment made me feel even worse. | But then I saw your nickname. Yeah. :) (redis rocks btw) | | Completely agree with you, and not just in C. Using esoteric | features of any language is equivalent to putting landmines in | front of your teammates - bad idea. | tomxor wrote: | > Using esoteric features of any language is equivalent to | putting landmines in front of your teammates | | Also, outside of production code, using esoteric features is | a good way to get familiar with every corner of that language | - which is useful so that you can diffuse those landmines | when they are accidentally created. i.e know them to avoid | them. | jherskovic wrote: | Haven't written C in a long time. What a nightmare this is. Got | most of them predictably wrong. | simias wrote: | I did fairly well in the test (I didn't remember that you | couldn't shift into the sign bit), but it really helps highlight | how stupid the C promotion rules are. In C as soon as I have to | do arithmetics with anything but unsigned ints all sorts of alarm | bells start going off and I end up casting and recasting | everything to make sure it does what I think it does, coupled | with heavy testing. Now add floats and doubles into the mix and | all bets are off. | | Rust not having a default "int" type and forcing you to | explicitly cast everything is such an improvement. Truly a poster | child for "less is more". Yeah it makes the code more verbose, | but at least I don't have to worry about "lmao you have a signed | int overflow in there you absolute nincompoop, this is going to | break with -O3 when GCC 16 releases 8 years from now, but only on | ARM32 when compiled in Thumb mode!" | WalterBright wrote: | The trouble with explicit casting is if the code is refactored | to change the underlying integer type, the explicit casts may | silently truncate the integer, introducing bugs. | | D follows the C integral promotion rules, with a couple crucial | modifications: | | 1. No implicit conversions are done that throw away information | - those will require an explicit cast. For example: | int i = 1999; char c = i; // not allowed | char d = cast(char)i; // explicit cast | | 2. The compiler keeps track of the range of values an | expression could have, and allows narrowing conversions when | they can be proven to not lose information. For example: | int i = 1999; char c = i & 0xFF; // allowed | | The idea is to safely avoid needing casts, in order to avoid | the bugs that silently creep in with refactoring. | | Continuing with the notion that casts should be avoided where | practical is the cast expression has its own keyword. This | makes casting greppable, so the code review can find them. C | casts require a C parser with lookahead to find. | | One other difference: D's integer types have fixed sizes. A | char is 8 bits, a short is 16, an int is 32, and a long is 64. | This is based on my experience that a vast amount of C | programming time is spent trying to account for the | implementation-defined sizes of the integer types. As a result, | D code out of the box tends to be far more portable than C. | | D also defines integer math as 2's complement arithmetic. All | that 1's complement stuff belongs in the dustbin of history. | scrame wrote: | I never really clicked with D, but I always like reading your | discussions on these details. | simias wrote: | Yeah for sure, having more expressive casts and | differentiating between "upcasts" and "downcasts" is | definitely better. My point that even "lossy" casts are | better than C's weird arcane implicit promotion rules by | quite a margin IMO. | | Rust's current handling of this issue is by no mean perfect, | although it's been steadily improving and I definitely don't | use `as` as much as I used to. | | >D also defines integer math as 2's complement arithmetic. | | I think modern C standards does so as well, but I _think_ | than signed overflow is still UB, so it 's mostly about | defining signed-to-unsigned conversions. There are flags on | many compilers to tell them to assume wrapping arithmetic but | obviously that's not standard... | WalterBright wrote: | It's not perfect. People do complain that: | char a,b,c; a = b + c; // error a = | cast(char)(b + c); // ok | | produces a truncation error, and requires a cast. But char | types have a very small range, and overflow may be | unexpected. So D makes the right choice here to promote to | int, and require a cast. | loeg wrote: | The most recent C standards also explicitly define integer | math as 2's complement. Admittedly, much later than D. | WalterBright wrote: | The last 1's complement machine I encountered was, never. | Not in nearly 50 years of programming. I think those | machines left for the gray havens before I was born. | | C should also make char unsigned (D does). Optionally | signed chars are an abomination. | xeeeeeeeeeeenu wrote: | >The last 1's complement machine I encountered was, | never. Not in nearly 50 years of programming. I think | those machines left for the gray havens before I was | born. | | Unisys OS 2200 uses one's complement[1] and Unisys MCP | uses signed magnitude[2]. Both are still around. | | [1] - https://public.support.unisys.com/2200/docs/CP19.0/ | 78310422-... (page 108) - "UCS C represents an integer in | 36-bit ones complement form (or 72-bit ones complement | form, if the long long type attribute is specified)." | | [2] - | https://public.support.unisys.com/aseries/docs/ClearPath- | MCP... (page 304) - "ClearPath MCP C uses a signed- | magnitude representation for integers instead of | two's-complement representation. Furthermore, ClearPath | MCP C integers use only 40 of the 48 bits in the word: a | separate sign bit and the low order 39 bits for the | absolute value." | Measter wrote: | > The trouble with explicit casting is if the code is | refactored to change the underlying integer type, the | explicit casts may silently truncate the integer, introducing | bugs. | | That depends on how the casting is provided. For the C-style | casting, or Rust's `as` casting, yes that is a problem. | However, another way casting could be provided is through | conversion functions that are only infallible if information | isn't lost. For example, let's say we have the functions | `to_u16` and `to_i16`. For an `i8` the first function could | return `Option<u16>`, the second `i16`, while for a `u8` they | would return `u16` and `i16`. That way, any change to the | types that could cause it to now silently truncate would | instead cause a compiler error because of the type mismatch. | | Rust almost gets there with its `Into` and `TryInto` traits | which do provide that functionality, but trying to use them | in an expression causes type inference to fail, which just | makes them a pain in the ass to use. | nayuki wrote: | You can use From and TryFrom, like u32::try_from(5i32). | https://doc.rust-lang.org/std/convert/trait.From.html , | https://doc.rust-lang.org/std/convert/trait.TryFrom.html | tialaramex wrote: | And, notably in this context, Rust's traits are auto- | implemented in a chain, so if From<Foo> for Bar, then | Into<Bar> for Foo, and thus TryFrom<Foo> for Bar, and | thus in turn TryInto<Bar> for Foo. | | This means that if first_thing used to have a non- | overlapping value space so that converting it to | other_thing might fail, so you wrote | other_thing = first_thing.try_into().blahblahblah; | | ... if you later refactor and now first_thing is a subset | of other_thing so that the conversion can never fail, the | previous code still works fine, the try_into() call just | never fails. In fact, the compiler even _knows_ it can 't | fail, because its error type is now Infallible, a sum | type with nothing in it, so the compiler can see this | never happens, and optimise accordingly. | WalterBright wrote: | Yeah, having two different cast operations can help here. | But I like the simpler approach. | tialaramex wrote: | Still, I would rather not have Rust's 'as' cast. I should like | to see all, or at least most uses of 'as' deprecated in a later | Rust edition. | | Rust's 'as' will silently throw away data to achieve what you | asked. Sometimes you wanted that, sometimes you didn't realise, | and requiring into() and try_into() instead helps fix that. | | For example suppose I have a variable named beans, I forgot | what type it is, but it's got the count of beans in it, there | should definitely be a non-negative value because we were | counting beans, and it shouldn't be more than a few thousand, | so we're going to put that in a u16 variable named enough. let | enough = beans as u16; | | This works just fine, even though beans is i64 (ie a signed | 64-bit integer). Until one day beans is -1 due to an arithmetic | error elsewhere, and now enough is 65535, which is pretty | surprising. It's not _undefined_ but it is surprising. | | If we instead write let enough = beans.into(); we get a | compiler error, the compiler can't see any way to turn i64 into | u16 without risk of loss, so this won't work. We can write | beans.try_into().unwrap() and get a panic if actually beans was | out of range, or we can actually write the try_into() handler | properly if we realise, seeing it can fail, what we're actually | dealing with here. | jcranmer wrote: | If I can get away with using .into() over as, I will. | However, there are so many cases where you can't, and | .try_into().unwrap() is just way too unwieldy. In particular, | there's no way for me to go "I know I'm only ever going to | run on 64-bit machines where sizeof(usize) == sizeof(u64), so | usize should implement From<u64>" (and even more annoying, I | might be carefully making sure this works on 32-bit and | 64-bit machines and get screwed because Rust thinks it could | be portable to a 16-bit usize despite the fact I'm allocating | hundreds of MB of memory). | | And of course there are times I do want to bitcast negative | values to large positive unsigned values or vice versa, | without error. So while I do understand why maybe you | shouldn't use as, at the end of the day, it just ends up | being easier to use it than not use it. | gspr wrote: | > try_into().unwrap() is just way too unwieldy. | | I always carry with me a tiu() alias method for exactly | that reason (in a TryIntoUnwrap trait implemented for every | pair of types which implements TryInto). | arcticbull wrote: | You can also do try_from()? if you impl From<TryIntError> | on your local error type. | ridiculous_fish wrote: | A flip side is that some safe conversions also produce | compiler errors. On a 64-bit system, usize and u64 are the | same width, but they are not convertible: neither is Into the | other, so you cannot lean on the compiler in the way you | describe. | | You might use try_into(), but then you risk panicking on a | 32-bit system, instead of getting a compile-time error. | svnpenn wrote: | Exactly why I don't use C anymore. Other languages like Go don't | allow this stuff. | | https://go.dev/play/p/D2J6Y9ol41W | ummonk wrote: | Wait so does an unsigned int promote to signed int for shift? | Jimajesty wrote: | I felt pretty smart getting that first question, then proceeded | to fall down the stairs. | Turing_Machine wrote: | The correct answer for many of these is "don't do that". :-) | IncRnd wrote: | That is the correct answer to the questions in this quiz! | | Most professionals don't worry about these effects, instead | choosing to cast and use parenthesis directly. Those instruct | the compiler instead of relying upon warnings from the | compiler. | jstimpfle wrote: | It's important to notice the disclaimer that this is applies to | x86/x86-64 GCC like platforms, in particular int is assumed to be | 32 bits. | | As antirez said as well, I don't pride myself in understanding | the intricacies of integer arithmetic and promotion well. I try | to write clear code by writing around the less commonly | understood rules. Nevertheless I wanted to test myself. There are | two questions that surprised me somewhat. | | Apparently you can't left-shift a negative value even by a zero | amount, as in (-1 << 0). | | And is it true that the value of "-1L > 1U" is platform | dependent? I had assumed that 1U would be promoted to 1L in this | expression, even on x86 where unsigned int and long have the same | number of bits. According to the following document, long int has | a "higher rank" than unsigned int. | https://wiki.sei.cmu.edu/confluence/display/c/INT02-C.+Under... . | (Edit: according to rules 4 and 5 of "Usual arithmetic | conversions", it's not only the rank but also about "the values | that can be represented") | Dwedit wrote: | On Windows, `long` and `int` are the same size, even when | building for x64. | einpoklum wrote: | I taught first-semester C for several years, and still got a | couple of these wrong - although, to be honest, the way we taught | the class at my alma mater we steered well clear of these | situations. We told students that C perform type promotions and | automatic conversions, but either had them use a uniform type to | begin with - int typically - or told them to convert explicitly | and avoid | | Still, the most important suggestion I would make here is: Always | compile with warnings enabled, in particular: | | * -Wstrict-overflow=1 and maybe even -Wstrict-overflow=3 | | * -Wsign-compare | | * -Wsign-conversion | | * -Wfloat-conversion | | * -Wconversion | | * -Wshift-negative-value | | (or just `-Wall -Wextra`) | | And maybe also: | | * -fsanitize=signed-integer-overflow | | * -fsanitize=float-cast-overflow | | (These are GCC flags, should work for clang as well.) | Cola2265 wrote: | pif wrote: | In the end, it's very simple and intuitive. | | 1- If you need arithmetic, use signed; if you need a bitmask, use | signed. C is not assembly, and bit shift is no multiplication nor | division. | | 2- Make sure you stay within bound, as in: don't even think you | can approach the boundaries. C is not assembly, and overflow flag | does not exist. | codeflo wrote: | I thought I did well until the bit-shifting questions. Who in | their right mind designs a language where shifting the bits of a | u16 silently converts into an i32? Doubly so since that fact | alone directly causes UB -- keeping the u16 would have been | perfectly fine. | | (Edit, since some respondents seem to miss this, explanations | about efficiency or ISAs might justify promoting to u32 (though | even that's debatable), but not i32. A design that auto-promotes | an unsigned type, where every operation is nicely defined and | total, into a signed type, where you run into all kinds of | undefined behavior on overflow, is simply crazy.) | jcranmer wrote: | If you have a 32-bit architecture, you don't necessarily have | 8-bit and 16-bit hardware operations outside of memory | operations (load/store). | | Now, I don't find this reasoning persuasive--it's not _that_ | hard to emulate an 8-bit or 16-bit operation--and judging from | the history of post-C languages, most other language designers | are equally unmoved by this reasoning, but I can see someone in | their right mind designing a language that acts like this. | Especially if the first architecture they 're developing on is | precisely such on architecture (PDP11 doesn't have byte-sized | add/sub/mul/div). | simias wrote: | I think the prevailing philosophy for C (and later C++) was | that code should map closely to the hardware and not expand | to complicated "microcode". A left shift in Cshould just be a | left shift in the underlying ISA, within reason. That's where | most of the undefined behaviours come from. Having to add | masking and other operations to emulate a 16bit shift on a | 32bit architecture feels un-C-like, for better or worse. | | IMO the real issue is not so much the fact that all shifts of | any type < int is treated as if it were an int, it's that the | language doesn't force you to acknowledge that in the code. | If you got a compilation error when trying to shift a short | and had to explicitly promote to int in order to make it | through, at the very least it can't lead to an oversight from | a careless programmer. | | C is trying to be clever but only goes half way, resulting in | the worst of both worlds IMO. | codeflo wrote: | The ISA might force someone to extend a value to 32 bits | (debatable, but let's go with it). It never forces you to | treat an unsigned int as signed. It also doesn't require | inserting UB into the process. | simias wrote: | I agree, the whole signed vs. unsigned generally feels | like an afterthought in C (probably because, to a certain | extent, it was). `char`'s sign being implementation- | defined is a pretty wild design choice that wasted many | hours of my life while porting code between ARM and x86. | | UBs are not required, but you need them if you want C to | behave as a macro-assembler as well as allowing for | aggressive optimizations. For instance `a << b` if b is | greater than a's width is genuinely UB if you write | portable code, different CPUs will do different things in | this situation. Defining the behaviour means that the | compiler would have to insert additional opcodes to make | the behaviour identical on all platforms. | | You may argue that it's still better than having UB but | that's just not C's design philosophy, for better or | worse. | gsliepen wrote: | It's worse than that. `char`'s signedness being | implementation defined is one thing, but then having the | standard library provide a function called `getchar()` | that returns not a `char` but an `unsigned char` cast to | an `int` is diabolical. | tsimionescu wrote: | > For instance `a << b` if b is greater than a's width is | genuinely UB if you write portable code, different CPUs | will do different things in this situation. Defining the | behaviour means that the compiler would have to insert | additional opcodes to make the behaviour identical on all | platforms. | | Your seem to be mixing up implementation defined behavior | and undefined behavior. It would have been perfectly | reasonable to make this choice if signed integer ovwrflow | were implementation-defined, but it is unfortunately not | - it is undefined behavior instead. This means that a | program containing this instruction is not valid C and | may "legally" have any effect whatsoever. | codeflo wrote: | That's true, I'd add something more. Your reasoning about | hardware differences would only justify implementation- | defined behavior, not undefined behavior. The distinction | is important here: undefined behavior is when the | compiler can make surprising optimizations in other parts | of the code assuming something doesn't happen. | jcranmer wrote: | I imagine the reason oversized shifts are UB is because | some 40-year-old computer hardware trapped on oversized | shifts, as traps are always UB in C. | nayuki wrote: | > Who in their right mind designs a language where shifting the | bits of a u16 silently converts into an i32? | | Yeah, hence why I asked this question years ago: | https://stackoverflow.com/questions/39964651/is-masking-befo... | veltas wrote: | Because everything smaller than an int is usually promoted to | int. int is the 'word' in C that things are calculated in. Even | character constants are ints, all enum constants are ints, the | default type was int when default types were still a thing. | edflsafoiewq wrote: | Yes. The model C has is that the CPU has an ALU that that | operates as (word,word)->word with int being the smallest | word size. This explains many of C's conversion rules: to | operate on a single integer, it first has to be promoted to a | word size; to operate on two integers, they first have to be | converted to a common word size, etc. | lifthrasiir wrote: | And that makes writing a correct _and_ portable C futile. | Yes, defined-size types are a thing and I almost exclusively | use them, but since the size of `int` itself is unknown and | integer promotion depends on that size, the meaning of the | code using only defined-size types can still vary across | platforms. intN_t etc. are only typedefs and do not form | their own type hierarchy, which in my opinion is a huge | mistake. | jefftk wrote: | But why not turn u16 into u32? Why switch it to being signed | on promotion? | rwmj wrote: | I'm sure the answer is going to be along the lines of | "because PCC did that and they standardized it" :-/ | | Here's a fun standardization problem I came across recently | (nothing to do with C): | http://mywiki.wooledge.org/BashFAQ/105 | ynfnehf wrote: | ANSI describes why in their rationale: | https://www.lysator.liu.se/c/rat/c2.html#3-2-1 . | The unsigned preserving rules greatly increase the number | of situations where unsigned int confronts signed int to | yield a questionably signed result, whereas the value | preserving rules minimize such confrontations. Thus, the | value preserving rules were considered to be safer for the | novice, or unwary, programmer. After much discussion, the | Committee decided in favor of value preserving rules, | despite the fact that the UNIX C compilers had evolved in | the direction of unsigned preserving. | veltas wrote: | Believe it or not, this makes the behavior more like what | you'd expect in many cases. For example: | uint8_t x = 4; extern volatile uint64_t *reg; | *reg &= ~x; | | In the last statement x is promoted to an int, and then | when the logical NOT occurs every bit is set to 1, | including the high bit. When it's converted to a uint64_t | for the AND, the high bits are also set to 1. So the result | is that the final statement clears only bit 2 in *reg. | | If it promoted to unsigned int, then it would also clear | bits 32-63. | codeflo wrote: | I don't find that very convincing. It's simply ambiguous, | I might want this: *reg &= | ~(uint64_t)x; | | or, and there's no elegant way to even write this in C, I | might want: *reg | &=(uint64_t)(uint8_t)~x; | | The fact that I have to write two casts here to undo the | damage of the auto-promotion is evidence of how broken | this is. | veltas wrote: | That second line can be written: *reg &= | (uint8_t)~x; | | Or: *reg &= ~x & 0xFF; | tialaramex wrote: | extern volatile uint64_t *reg; *reg &= ~x; | | People should stop doing this. What this _means_ is: | extern volatile uint64_t *reg; uint64_t tmp = *reg; | tmp &= ~x; *reg = temp; | | But of course when you write _that_ chances are somebody | will point out that you 're running in interruptible | context sometimes in this function, so that's actually | introducing a race condition. Why didn't they say so when | you wrote it your way? Because that looked like a single | operation and so it wasn't obvious it might get | interrupted. | scatters wrote: | Because sub-integer types are for storage, not computation. | | Yes, it'd be better if you had to explicitly cast to int or | unsigned to perform arithmetic, but that ship has sailed. | leni536 wrote: | Doesn't list my favorite footgun. | | x and y are unsigned short. The expression `x*y` has defined | behavior for... | | a) all values of x and y. | | b) some values of x and y. | | c) no values of x and y. | nayuki wrote: | If short = 16 bits and int = 16 bits, then x and y will be | promoted to unsigned int. Unsigned multiplication has | wraparound behavior, so x*y will be defined for all values. | | If short = 16 bits and int = 32 bits (or heck even 17 bits), | then x and y will be promoted to signed int. Signed | multiplication overflow is undefined behavior, so x*y will be | undefined for some values when x*y is too large. In particular, | 0xFFFF * 0xFFFF = 0xFFFE_0001, which is larger than INT_MAX = | 0x7FFF_FFFF. | | If short = 16 bits and int = 64 bits (or even 33 bits), then x | and y will be promoted to signed int. The range of x*y will | always fit int, so no overflow occurs, and the expression is | defined for all input values. | | Isn't C fun? | loeg wrote: | (B), assuming int is 32-bit and short is 16-bit. The | multiplication promotes both operands and result to signed int, | right? So if both x and y are uint16_max, the result overflows | signed int and is UB, I think. | | But if int were larger (eg 64 bit) and short remained 16 bits, | there's no overflow and the answer is (A). I think. | jstimpfle wrote: | There was definitely a question that covered the auto-promotion | to int, maybe with slightly different types. | [deleted] | shultays wrote: | I am gonna say A but I guess I dont the foot gun here | Veliladon wrote: | The shorts are not promoted to ints (if you even declared the | result as that type) until after the multiplication. The | result will first be put into a short and then promoted to | int. It's basically asking for an overflow error. Given that | 256^2 is 65536 you don't have to be multiplying large numbers | before hitting that overflow. | loeg wrote: | Unsigned short has defined overflow, though. | shultays wrote: | I don't follow, assuming short is half the size of int it | would only overflow if most significant bits of both values | are 1. Where did that 256^2 come from? It wouldn't overflow | if a and b were 256 | | I missed that the result would be promoted to signed int, | which only most significant bits are set | [deleted] | LegionMammal978 wrote: | All operands are always promoted to int (or unsigned int, | long, unsigned long, etc.) _before_ any operation. The | footgun in this example is that even though you 're using | unsigned integers and would expect guaranteed wrapping | semantics, the promotion to signed int makes it UB for | USHORT_MAX*USHORT_MAX. | jandrese wrote: | 65536 * 65536 would overflow an int. | shultays wrote: | Ah I didn't know it would be promoted to signed int. | Another thread here explains it as well. Thanks | synergy20 wrote: | typically you shall avoid shift on signed integers, especially | NEVER left shift on signed char|short|int|whatever. | | I limit shift strictly to unsigned integer numbers. | nayuki wrote: | Following this rule strictly can be tricky. | uint16_t x = (...); uint16_t y = x << 15; | | Any arithmetic involving uint16_t will be promoted to some kind | of int. If int is 16 bits wide, then uint16_t will be promoted | to unsigned int before the shift, and all values of x are safe. | Otherwise int is at least 17 bits wide, then uint16_t will be | promoted to signed int before the shift. | | On a weird but legal platform where int is 24 bits wide, the | expression (uint16_t)0xFFFF << 15 will cause undefined | behavior. | | My workaround for this is to force promotion to unsigned int: | (0U + x) << 15. | https://stackoverflow.com/questions/39964651/is-masking-befo... | siggen wrote: | Got all these correct. I was expecting something more convoluted | or exotic undefined behavior. | dinom wrote: | Seems like a good example of why interview quizzes aren't a | panacea. | flykespice wrote: | This quiz only reminded me how _little_ I know about C | (thankfully those case are corner-cases of incompetent | programming so other and I can ignore their existence). | | Good grief, thanks Quiz for reminding me to stay away from that | mess of language. | greesil wrote: | -Wall -Werror | | #include <stdint.h> | | and don't use primitive types, and you will avoid many of these | issues. | ghoward wrote: | I'm a heavy use of C, and most of what I got wrong were around | integer promotion. | | I'm glad I run clang with -Weverything and use ASan and UBSan. | jcranmer wrote: | The answer of the final question (INT_MIN % -1) is wrong in a way | that's somewhat dangerous. | | If you read the text of C99 carefully, yes, it's implied that | INT_MIN % -1 should be well-defined to be 0. However, the % | operator is usually implemented in hardware as part of the same | instruction that does division, which means that on hardware | where INT_MIN / -1 traps (thereby causing undefined behavior), | INT_MIN % -1 will also trap. The wording was changed in C11 (and | C++11) to make INT_MIN % -1 explicitly undefined behavior, and | given the reasoning for why the wording was changed, users should | expect that it retains its undefined behavior even in C89 and C99 | modes, even on 20-year-old compilers that predate C11. | greaterthan3 wrote: | >The wording was changed in C11 | | And here's an exact quote from the C11 standard: | | >If the quotient a/b is representable, the expression (a/b)*b + | a%b shall equal a; otherwise, the behavior of both a/b and a%b | is undefined. | | http://port70.net/~nsz/c/c11/n1570.html#6.5.5p6 | lultimouomo wrote: | Beware that this is very much not a "C Integer quiz", but a | "ILP32/LP64 Iinteger quiz". As sono as you move not to some weird | exotic architecture, but simply to 64bit Windows(!!) some quiz | answers will not hold. | | For a website meant to educate programmers on C language gotchas, | this is a pretty lackluster effort. | | Even the initial disclaimer, "assume x86/x86-64 GCC/CLang", is | wrong, as the compiler does not have anything to do with integer | widths. | flykespice wrote: | My impression is this wasn't muxh to educate programmers on C | language gotchas but to remind you just how messy and fragile | this language is (so you can stay far away). | lultimouomo wrote: | One more reason not to give the impression that you can | assume that long is wider than int which is wider then short! | ok123456 wrote: | The quiz is wrong because they assume that the length of short | is less than int in some questions. A short can be the same | length as an int. It just can't be longer. | analog31 wrote: | One of my friends likes to scold me about using a programming | language (Python) that doesn't enforce type declarations. I | gently remind him that his language (C) has eight different types | of integers. | einpoklum wrote: | { char, short, int, long, long long } x { signed, unsigned } = | 10 types, and then there's bool and maybe int128_t, so maybe | 12. | Aardwolf wrote: | Heh, it has _way_ more than 8. | | For char, you have 3: signed char, unsigned char and char. It's | not specified if char without keyword is signed or unsigned. | | You have integer types such as size_t, ssize_t and ptrdiff_t. | They may, under the hood, match one of the other standard int | types, however this differs per platform, so you can't e.g. | just easily print size_t using the standard printf formatters, | you really have to treat is as its own type. Also wchar_t and | such of course. | | Then you have all the integers in stdint.h and inttypes.h. Same | here applies as for size_t. At least you know how many bits you | get from several of them, unlike from something like "long". | | Then your compiler may also provide additional types such as | __int128 and __uint128_t. | spc476 wrote: | > so you can't e.g. just easily print size_t using the | standard printf formatters | | This has been fixed in C99. For size_t, it's "%zu", for | ptrdiff_t it's "%td", for ssize_t it's "&zd" and for wchar_t, | it's "&lc". | veltas wrote: | Third question is wrong. It's implementation defined, because | unsigned short might be the same rank as unsigned int in some | implementations, in which case it remains unsigned when promotion | occurs. | jwilk wrote: | They may have the same width, but not the same rank. | | From C99 SS6.3.1.1: | | > _-- The rank of long long int shall be greater than the rank | of long int, which shall be greater than the rank of int, which | shall be greater than the rank of short int,_ [...] | | > _-- The rank of any unsigned integer type shall equal the | rank of the corresponding signed integer type, if any._ | [deleted] | moefh wrote: | That's true, but it doesn't matter to the point being made. | | According to the standard[1], if short and int have the same | size (even if not the same rank) both numbers are converted | to unsigned int (that is, the unsigned integer type | corresponding to int) because int can't represent all values | of unsigned short. | | The usual arithmetic conversions never "promote" an unsigned | to signed if doing so would change the value. | | [1] http://port70.net/~nsz/c/c99/n1256.html#6.3.1.8 | rwmj wrote: | He does say on the first page: _All other things being equal, | assume GCC /LLVM x86/x64 implementation-defined behaviors._ Are | there any normal, modern machines where short and int are the | same size? I think the last machine where that was true was the | 8086. | codeflo wrote: | Define "machine". All kinds of microcontrollers are | programmed with C. | moefh wrote: | That's still a bad question: the behavior is implementation- | defined according to the standard, so having "implementation- | defined" as a wrong option is ambiguous and confusing. | | The question would be fine (given that note) if | "impementation-defined" was not an option, like for example | the question about "SCHAR_MAX == CHAR_MAX". | fuckstick wrote: | > I think the last machine where that was true was the 8086. | | The last machine with a word size of 16 bits? The 286 was as | well. There were others like the WDC 65816 (the Apple IIgs | and SNES CPU). | | It just so happens that there simply far fewer 16 bit CPUs | then there were 8 or 32 (or "32/16" like the 68k). Also 8 bit | CPUs are simply a poor fit for C by their nature and the | assumptions C makes. But the numerous ones still relevant | today will use 16 bit ints. | | The use of Real or V86 mode on the x86 went on for many years | after the demise of the 8086. I think it is a somewhat of a | joke at this point that they're teaching Turbo C in some | developing countries. | veltas wrote: | On x86 systems like 8086 short and int were the same size. | And that appears to be a footnote they've added after being | called out for being wrong. The question gives | "implementation defined" as an option and in other questions | seems to specify the ABI, and in some assumes it without | saying again. Very inconsistent, they should fix their quiz | really. | | The last x86 processor I know of to have these sizes is | 80286. | galangalalgol wrote: | Avr microcontrollers still have 16bit ints, probably 8051 | and pic too, but I don't use those. Lots of people do | though. TI dsp uses 48 bit long, so don't count on int and | long being the same either. | MauranKilom wrote: | _> What does the expression SCHAR_MAX == CHAR_MAX evaluate to?_ | | _> Sorry about that -- I didn 't give you enough information to | answer this one. The signedness of the char type is | implementation-defined_ | | ...why not have an "implementation-defined" answer button then, | because that's what people should know (instead of knowing all | ABIs) and what the question is about anyway? | | _> If these operators were right-associative, the expression [x | - 1 + 1] would be defined for all values of x._ | | That's just wrong, no? If + and - were right-associative, it | would be parsed as x - (1 + 1), which is decidedly not "defined | for all values of x". | shultays wrote: | I raised an eyebrow on that as well, in some questions you have | implentation defined for an answer so I assumed the author just | wrongly assumed undefined covers that | necovek wrote: | Other questions might also be "implementation defined", thus a | caveat to assume GCC/LLVM implementations on x86/amd64 at the | start. | | I.e. C standard enforces undefined very sparingly iirc, and | most of the corner cases are implementation defined. | | I may also be misremembering things: it's been 20 years since | I've carefully read C99 (draft, really) for fun :)) | kevin_thibedeau wrote: | Even the compiler assumption isn't enough. There is also an | implicit assumption in the answers that x86-64 is using LP64 | when some platforms will use LLP64 (Windows) or ILP64 (Cray). | 6a74 wrote: | Neat quiz. Reminds me that the absolute value of INT_MIN in C | (and many other languages) is undefined, but will generally still | return a negative value. This is a "gotcha" that a lot of people | are unaware of. | | > abs(-2147483648) = -2147483648 | lizardactivist wrote: | A consequence of most, if not all, CPUs today using two's | complement integers. | | I think one's complement is more sensible since it doesn't have | this problem, but it loses out because it requires a more | complex ISA and implementation. | [deleted] | nayuki wrote: | Ones' complement (correct spelling) has negative zero, which | I would argue is a far worse problem. | xigoi wrote: | Why in the fuck is it "ones' complement", but "two's | complement"? | tsimionescu wrote: | According to Wikipedia: | | > The name "ones' complement" (note this is possessive of | the plural "ones", not of a singular "one") refers to the | fact that such an inverted value, if added to the | original, would always produce an 'all ones' number | veltas wrote: | It's annoying that negation, ABS, and division can overflow | with two's complement. But how I look at it: lots of | operations can already overflow, just a fact of signed | integers, and you need to guard against that overflow in | portable code already. It doesn't seem to be fundamentally | worse that those extra operations can overflow. | shultays wrote: | It is undefined since it involves integer overflow | Cu3PO42 wrote: | One thing to note is that long int is the same size as int on x64 | Windows, at least in the MSVC ABI. clang also conforms to this. | | This is relevant to the question asking if -1L > 1U. | auxym wrote: | same with ARM-EABI (32bit cortex-M MCUs). Int and long int are | both 32 bits. | MobiusHorizons wrote: | I think that is expected (it's what happens for x86 as well) | what's surprising about the parent is that long is apparently | 32 bit on windows x64. Usually long should be 64 bit on a | machine with 64 bit words | boxfire wrote: | It's very worth pointing out and in fact advocating for the | compiler options -fwrapv and -ftrapv | | Which both make signed integers take the expected two's | complement behavior, or traps signed integer overflow, | respectively. | | N.B. left shifting into the sign bit is still undefined | behavior... On x86-64 gcc and clang seem to perform the shift as | if the number is interpreted unsigned, then shifted, then | interpreted signed. | forrestthewoods wrote: | In my opinion the three original sins of C are: nullptr, null- | terminated strings, and implicit conversions. | | Abolish those three concepts and C is a significantly improved | language! | pjmlp wrote: | no bounds checkings, enums are hardly better than #define, | pointer decay.... | kazinator wrote: | Found a disappointing bug in the test: (unsigned | short)1 > -1 | | Correct answer is: implementation-defined. | | The left operand of > has type _unsigned short_ , before | promotion. On C implementations for today's popular, mainstream | machines, _short_ is narrower than _int_ ; therefore, whether it | is signed or unsigned it goes to _int_. | | In that common case, we are doing a 1 > -1 comparison in the | _int_ type. | | However, _unsigned short_ may be exactly as wide as int, in which | case it cannot promote to _int_ , because its values do not fit | into that type. It promotes to _unsigned int_ in that case. Both | sides will go to _unsigned int_ {*}, and so we are comparing 1 > | UINT_MAX which is 0. | | Maybe the author should name this "GCC integer quiz for 32 bit | x86", and drop the harder choices like "implementation-defined". | | --- | | {*} It's more nuanced here. If we have a signed and unsigned | integer operand of the same rank, it could go either way. If the | unsigned type has a limited range so that all its vallues are | representable in the signed type, then the unsigned type goes to | signed. My remark represents only the predominant situation | whereby signed and unsigned types of the same rank have | overlapping ranges that don't fit into each other: the unsigned | version of a type not only lacks negatives, but has extra | positives. | mort96 wrote: | The site says: | | > All other things being equal, assume GCC/LLVM x86/x64 | implementation-defined behaviors. | jwilk wrote: | Discussed in this thread: | | https://news.ycombinator.com/item?id=32712578 | phist_mcgee wrote: | I know this is a tangential comment, but the fact that the site | author went to the effort to add a css reset, and then _doesn 't_ | go ahead and add in any kind of margins is pretty bizarre. | | The fact that the text butts up to the left of the page with no | margin is pretty incredible. | loeg wrote: | The quiz is a little broken. CHAR_MAX is implementation defined | but for some reason that question doesn't have that option, even | though other questions do. | ynfnehf wrote: | Another fun one: struct { long unsigned a : 31; } | t = { 1 }; | | What is t.a > -1? What about if a is instead a bit-field of width | 32? (Assuming the same platform as in the quiz.) | sposeray wrote: | manaskarekar wrote: | Tangentially related, https://cppquiz.org/, for C++. | santimoller66 wrote: | RedShift1 wrote: | I only have cursory experience with C, mostly by programming some | Arduino stuff, so I can only add this: good god this is a | minefield. | rwmj wrote: | I've been a professional C programmer for nearly 40 years and | didn't do especially well (I got about 2/3rds right). In my | view it doesn't really matter. Enable maximum warnings and fix | everything the compiler warns about. I also use Coverity once | in a while, and every test is run with and without valgrind. If | you do that you'll be fine. | adastra22 wrote: | Yeah, in practice these sort of gotchas ought to trigger | warnings. And for a C/C++ developer, warnings should never be | allowed to persist in a code base. | RedShift1 wrote: | I thought some deductive reasoning and thinking about how | things work at the byte level would save me. Computer said | no. | nayuki wrote: | When you write C programs, you are coding on the C abstract | machine as specified by the language standard. If your code | triggers signed integer overflow, then the C abstract | machine says that it's undefined behavior, and the compiler | and runtime can make your code behave however it wishes. It | doesn't matter that the underlying concrete machine (POSIX, | x86 ISA, etc.) has wraparound signed overflow, because your | code has to interact with both the abstract machine and the | concrete machine at the same time. See: | https://www.youtube.com/watch?v=ZAji7PkXaKY | petergeoghegan wrote: | > When you write C programs, you are coding on the C | abstract machine as specified by the language standard. | | Are you really, though? I would argue that it's a matter | of perspective and/or semantics. | | The Linux kernel is built with -fwrapv and with -fno- | strict-aliasing, and uses idioms that depend on it | directly. We can surmise from that that the kernel must | be: | | 1. Exhibiting undefined behavior (according to a literal | interpretation of the standard) | | OR: | | 2. Not written in C. | | Either way, it's quite reasonable to wonder just how much | practical applicability your statement really has in any | given situation -- since you didn't have any caveats. | It's not as if the kernel is some esoteric, obscure case; | it's arguably the single most important C codebase in the | world. Plus there are plenty of other big C codebases | that take the same approach besides Linux. | | Lots of compiler people seem to take the same hard line | on the issue -- "the C abstract machine" and whatnot. It | always surprises me, because it seems to presuppose that | the _only_ thing that matters is what the ISO standard | says. The actual experience of people working on large C | codebases doesn 't seem to even get _acknowledged_. Nor | does the fact that the committee and the people that work | on compilers have significant overlap. | | I'm not claiming that "low-level C hackers are right and | the compiler people are wrong". I'm merely pointing out | that there is a vast cultural chasm that just doesn't | seem to be acknowledged. | nayuki wrote: | > In my view it doesn't really matter. | | One of these days, the compiler will do something surprising | to one of your expressions involving signed integer overflow, | like converting x < x + 1 to true. Or it'll delete a whole | loop because it noticed that your code is guaranteed to | trigger an out-of-bounds array read, e.g.: | https://devblogs.microsoft.com/oldnewthing/?p=633 | | > If you do that you'll be fine. | | I would not trust code written based on the methodology you | described. However, if you also add | -fsanitize=undefined,address (UBSan and ASan) and pass those | tests, then I would trust your code. | bluetomcat wrote: | Almost all of these examples would be sloppy programming in | real code. The "int", "short", "long", "long long" and "char" | types are generic labels which shouldn't be used where size and | signedness matters. For a guarantee of size and signedness one | should use the "[u]intXX_t" types. For sizes and pointer | arithmetic - size_t. For subtracting pointers - ptrdiff_t. You | simply wouldn't have most of these issues if you stick to the | correct types. | rwmj wrote: | Absolutely. _int_ is a "code smell" for us, except in some | well-defined cases (storing file descriptors for example). If | someone is using int to iterate a loop, then the code is more | usually wrong than right, they should be using size_t. | fizzynut wrote: | I don't think a for loop using an int is bad or even "more | wrong than right". If anything int is much better than | using size_t. | | Using an integer, in the 1000s of for loops I've written, | none get even remotely close to the billions - it is | optimizing for a 1 in a million case, and if I know | something can run into the billions of iterations I'm going | to pay more attention to anyway. I've seen 0 occurrences of | bugs relating to this kind of overflow. | | Using a size_t, it is effectively an unsigned integer that | risks underflowing which can easily cause bugs like | infinite loops if decrementing or other bugs if doing any | index arithmetic. I've seen many occurrences of these kind | of bugs. | rwmj wrote: | On Linux/x86-64 int is 31 bits, so you've probably | introduced "1000s" of security bugs where the attacker | only needs the persistence to add 2 billion items to a | network input, local file or similar, to generate a | negative pointer of their choosing. | | Any such code submitted to one of our projects would be | rejected or fixed to use the proper type. | fizzynut wrote: | Please don't make this adversarial. | | I've not introduced a security bug in every for loop I've | written. What I've written shouldn't be controversial, | just take a look at Googles style guide: | | "We use int very often, for integers we know are not | going to be too big, e.g., loop counters. Use plain old | int for such things. You should assume that an int is at | least 32 bits, but don't assume that it has more than 32 | bits. If you need a 64-bit integer type, use int64_t or | uint64_t. | | For integers we know can be "big", use int64_t. | | You should not use the unsigned integer types such as | uint32_t, unless there is a valid reason such as | representing a bit pattern rather than a number, or you | need defined overflow modulo 2^N. In particular, do not | use unsigned types to say a number will never be | negative. Instead, use assertions for this. | | If your code is a container that returns a size, be sure | to use a type that will accommodate any possible usage of | your container. When in doubt, use a larger type rather | than a smaller type. | | Use care when converting integer types. Integer | conversions and promotions can cause undefined behavior, | leading to security bugs and other problems." | scatters wrote: | The true types (int, unsigned etc) are correct for local | variables, since they describe registers. The sized aliases | are correct for struct fields and for arrays in memory. | | You should prefer signed types for computing (if not storing) | sizes and for pointer arithmetic, since they are more | forgiving with underflow. | mananaysiempre wrote: | Unfortunately, even if you're using uint16_t (which, | remember, the C standard _does not guarantee exists_ ) on a | platform with (say) 32-bit integers, you're still actually | computing in _signed int_ due to promotion. | dave84 wrote: | Didn't notice I got some wrong because the green color made me | think I got them right. | ape4 wrote: | Yeah that's a UX bug in the quiz | jbverschoor wrote: | That's to keep the obscurity vibe | gonzo41 wrote: | Just like programming in C. You think you got it right, but | then bugs... | pjmlp wrote: | Or thinking "how my compiler does it" == ISO C. | junon wrote: | Just a note for the author: The Green background on wrong answers | was entirely confusing. Thought I was a C god at first. | CamperBob2 wrote: | Correct answer to most of these questions: "I don't know, and I | don't care, because it would never even cross my mind to write | code in which most of these questions would come up. If forced to | review or debug code written by someone else who knew the answers | and leveraged that knowledge, I would either complain about it | loudly or rewrite it quietly." | | C, like Rome, is a wilderness of tigers. Why ask for trouble? | up2isomorphism wrote: | These examples do look scary. However, in my 16 years of | experience of mostly writing In C, I have never found a single | chance that I need to actually use the expressions that I got the | answers wrong. | | Never mix unsigned and signed integer comparison unless you know | exactly what you are doing. | | And never do arithmetic on boundaries definitions like INT_MAX, | they are boundaries , why you need to compute a derived value on | boundaries? | | If you do not need arithmetic behavior, do not use signed | integer, use unsigned. Because computer does not understand sign, | so if you do not need a sign, do not use it. | | You do not need to be a language committee member to write C , | you just need to understand the reason behind its design. | bee_rider wrote: | Is it your experience that there are many projects where you | can reliably say some variable will never need arithmetic | behavior? | | I don't have 16 years of experience writing C, but | | > If you do not need arithmetic behavior, do not use signed | integer, use unsigned | | does seem in a roundabout way to match the advice that I've | gotten from other folks -- usually stick to signed (because you | never know if somebody is going to want to use an integer in, | say, a downwards loop). Your comment just seems to highlight | the less usual case, where you can be sure that nobody will | ever need that arithmetic behavior... maybe it depends on the | type of applications, though. | up2isomorphism wrote: | It always depends on the area of focus. You can also | consistently choose the default int type to signed while | being fully aware of you lose half range but gain signed | integer arithmetic. For me I am mostly doing system and | networking so I tends to to unsigned. The key is to choose a | consistent default while aware of that you are in a non- | intuitive zone when that default type puts you in. | tredre3 wrote: | > If you do not need arithmetic behavior, do not use signed | integer, use unsigned. Because computer does not understand | sign, so if you do not need a sign, do not use it. | | That's interesting. In my career (embedded development) I've | learned to do the opposite. Always use signed unless you have a | reason not to. Even if a value can't naturally be negative, use | signed. Use unsigned only if you need the extra bit, or if | you're doing bitwise operations. | | > Because computer does not understand sign | | Computers understand signs just fine, we're long past the days | of the 6502 N-flag being a glorified bit 7 check. All CPUs have | signed instructions. | up2isomorphism wrote: | > That's interesting. In my career (embedded development) | I've learned to do the opposite. Always use signed unless you | have a reason not to. Even if a value can't naturally be | negative, use signed. Use unsigned only if you need the extra | bit, or if you're doing bitwise operations. | | In this case, since you make sure that the extra size gained | by unsigned is not important to you all, then you can also go | with signed by default. Basically it is the tradeoff between | 1. robust in majority of the use cases 2. capability to do | signed arithmetic 3. additional positive integer range. As | long as you make a consistent selection and be mindful when | you are in the danger zone, it can be handled. | WaffleIronMaker wrote: | > Always use signed unless you have a reason not to. Even if | a value can't naturally be negative, use signed. | | Can you elaborate on what benefits this approach has? I would | feel that, especially when a number cannot be negative, | unsigned integers seem like a proper representation of the | data? | gugagore wrote: | Here is an example: https://wesmckinney.com/blog/avoid- | unsigned-integers/ | Sirened wrote: | > And never do arithmetic on boundaries definitions like | INT_MAX, they are boundaries | | I'd argue something stronger: if you care about boundaries like | INT_MAX, you should never be comparing them using your regular | comparison tools. I.e., even though there are correct ways to | compute whether x + 1 will overflow, don't bother trying to do | that and instead always use __builtin_add_overflow since you | can't fuck it up. Trying to do these sorts of edge checks are | incredibly hard and it has lead to numerous security | vulnerabilities due to checks being optimized out. The builtins | do exactly what they say and you don't have to worry about UB | blowing your foot off. | jcelerier wrote: | The problem is that for unsigned one of the boundaries is 0, | which is an extremely common number - there isn't a couple | months where I don't find a bug due to a size-1 somewhere | 10000truths wrote: | Subtraction would presumably fall under the arithmetic | behavior that OP was talking about. | pantalaimon wrote: | The compiler will also warn about these cases where you better | be explicit to avoid undesired behavior. | Nokinside wrote: | The first rule of C good programming "Treat warnings as errors". | Compile with all warnings enabled, selectively ignore only | warnings you know for sure are not important for program | semantics. | | Simple MISRA C with all warnings on is already close to language | with strict type checking. C compilers give you a choice, use it. | If you are programming for money, good static analyzer makes it | possible to write safety critical code. | pjmlp wrote: | Worth noting Dennis own words, | | "Although the first edition of K&R described most of the rules | that brought C's type structure to its present form, many | programs written in the older, more relaxed style persisted, | and so did compilers that tolerated it. To encourage people to | pay more attention to the official language rules, to detect | legal but suspicious constructions, and to help find interface | mismatches undetectable with simple mechanisms for separate | compilation, Steve Johnson adapted his pcc compiler to produce | lint [Johnson 79b], which scanned a set of files and remarked | on dubious constructions." | | Dennis M. Ritchie -- https://www.bell- | labs.com/usr/dmr/www/chist.html | | Unfortunately too many think they know better than the language | authors themselves. | protomikron wrote: | What is the history of "undefined behavior" [in C compilers | and the standard] in general? I suppose originally it was | supposed to guide compiler engineers, but we all know that | backfired, as many compilers try to exploit undefined | behavior to optimize code, but that can be problematic in | security sensitive code (e.g. if uninitialized memory is | optimized away) - there have been discussions between | security engineers, kernel developers and GCC hackers about | how to implement/interpret the standard. | | Would it be possible to have a standard, where undefined | behavior is just a compile error? What would we lose - apart | from legacy compatibility? | jcranmer wrote: | > Would it be possible to have a standard, where undefined | behavior is just a compile error? | | No [if you're aiming for something in the same vein as C]. | Undefined behavior is ultimately an inherently dynamic | property--certain values could make a statement execute | undefined behavior, and consequently, virtually every | statement could potentially cause undefined behavior. Note | that this remains true even in languages like Rust: Rust | has _loads_ of undefined behavior, but you do have to wrap | code in unsafe blocks to potentially cause undefined | behavior. | | > What would we lose - apart from legacy compatibility? | | In particular, it is clear at this point that if you want | to permit converting integers to pointers, you will either | have to live with undefined behavior (via pointer | provenance) or forgo basically _all_ optimization | whatsoever. | pjmlp wrote: | C sucked on 8 and 16 bit home computers, to be fair, all | high level systems programming languages had their own set | of issues regarding optimal code generation, thus Assembly | was the name of the name for ultimate performance. | | UB started as means to not kick out computer architectures | that would otherwise not be able to be targeted by fully | compliant ISO C compilers. | | Given that C prefers to be a kind of portable macro | assembler than care about security, it was only a matter of | time until those escape hatches started to be taken | advantage for optimizations. | | Same applies to other languages, however since their | communities tend to prefer security before ultimate | performance, some optimization paths are not considered as | that would hurt their safety goals. | | In what concerns C, C++ and Objective-C, dropping UB | optimizations would mean going back to the 1990's in terms | of code quality. | robryk wrote: | Would int foo(int x, int y) { return x+y; | } | | compile? After all, this function can be called in a way | that causes UB. | Veliladon wrote: | That's pretty much what Rust was created to do. | pjmlp wrote: | Like many others before it, hopefully it gets more | adoption this time. | wyldfire wrote: | Unfortunately some of this stuff just can't be detected | statically. So while warnings are an excellent starting point, | I recommend also building and testing with UBSan+ASan enabled. | veltas wrote: | I do agree about compiler warnings, but MISRA C imposes a lot | of unnecessary rules, a lot of unhelpful rules on how you write | expressions, and tries to also act like C's type system works a | different way than it does. In practice I have found it to | actually create bugs. Read the MISRA rules and appendices on | their effective type model and the C standard, they have gaps | where MISRA actually forces you to write code that looks | correct and doesn't work with C's model. I strongly recommend | against using MISRA, even in automotive or aviation code | (although it may unfortunately be a requirement on any such | project). | lizardactivist wrote: | I really hate C. | qznc wrote: | I made a similar test for floating point: | https://beza1e1.tuxen.de/no_real_numbers.html | | Since most languages use the same IEEE-754, you can hate them | all. | lizardactivist wrote: | Unbiased, equal hate, fully in line with today's political | correctness? Fine, I really hate programming languages. | ananonymoususer wrote: | I have an issue with this one: Assume x has type int . Is the | expression x<<32 ... | | Defined for all values of x | | Defines for some values of x | | Defined for no values of x | | (chose second answer) Wrong answer | | Shifting (in either direction) by an amount equalling or | exceeding the bitwidth of the promoted operand is an error in | C99. | | So according to Wikipedia: | https://en.wikipedia.org/wiki/C_data_types | | int signed signed int Basic signed integer type. Capable of | containing at least the [-32,767, +32,767] range.[3][a] | | So a minimum of 16 bits is used for int, but no maximum is | specified. Thus, if my C compiler on my 64-bit architecture uses | 64 bits for int, this is perfectly allowed by the specification | and my answer is correct. | nayuki wrote: | The top of the page says: | | > All other things being equal, assume GCC/LLVM x86/x64 | implementation-defined behaviors. | | However, you're right to point out that the quiz would be | better if you can't make any more assumptions than guaranteed | by the basic language standard. ___________________________________________________________________ (page generated 2022-09-04 23:00 UTC)