[HN Gopher] Mildly interesting quirks of C ___________________________________________________________________ Mildly interesting quirks of C Author : goranmoomin Score : 206 points Date : 2022-11-20 11:54 UTC (11 hours ago) (HTM) web link (gist.github.com) (TXT) w3m dump (gist.github.com) | photochemsyn wrote: | Related: "A primer on some C obfuscation tricks" | | https://news.ycombinator.com/item?id=22961054 | guenthert wrote: | > UB is impossible | | What? UB is clearly _undesirable_ , but assuming it is impossible | and deducing other outcomes must be meant are clearly wrong | assumptions by the compiler writer. | | More sensible compilers (including older version of clang) do the | right thing (TM) here and yield a compiler error. | | There were earlier attempts at do-what-i-mean programming | languages. They are rightfully buried in history. | MauranKilom wrote: | > UB is clearly undesirable, but assuming it is impossible and | deducing other outcomes must be meant are clearly wrong | assumptions by the compiler writer. | | Compilers can and absolutely do assume that UB is impossible in | this code (no integer overflow) and deduce other outcomes must | be meant (the loop operates on contiguous memory): | void foo(char* arr, int32_t end) { for (int32_t i | = 0; i != end; ++i) arr[i] = 0; } | | (Based on code from the gist comments.) | cvoss wrote: | UB is not impossible; I think the author is being a little | cheeky there. But the standard does grant compilers extreme | liberties as far as how they deal with programs which can | execute UB. LLVM's choice of what to do with that liberty, in | this case, seems to be to assume the UB is unreachable and | continue legally optimizing the program under that assumption. | That's not a wrong assumption according to the definition of C. | | It's debatable whether it's a _good_ assumption. But not wrong. | antirez wrote: | The top comment in the gist looks like from "Hacker News Parody | Thread". | xigoi wrote: | The parody thread should include a comment that references | another comment by its position, not realizing that it might | change. | hoosieree wrote: | 8. Modifiers to array sizes in parameter definitions | [https://godbolt.org/z/FnwYUs] void foo(int arr[static | const restrict volatile 10]) { // static: the array | contains at least 10 elements // const, volatile and | restrict all apply to the array type. } | | I imagine most of these depend on the C version, but this one | specifically bit me because one tool only supported c99 and the | other was c11 or something later. | teddyh wrote: | See also the _comp.lang.c Frequently Asked Questions_ : | https://c-faq.com/ | marcthe12 wrote: | C quirks. This is interesting. I have used some of the tricks | myself, #1,#2,#4,#5. | | #2 and #5 can be combined to make and interesting hack. When | combinded with memcpy you can do int *a = | memcpy(&(int){0}, b, sizeof *b); | | C23 typeof makes this even more interesting | | If you what an challenge here is a standard compliant c code. Try | to undestand it. If can understand you are a master of c's type | system static int* (*const *(*restrict | x)[5])(volatile union {struct{int a;int b;};}[static const | restrict 5], register enum{HELLO,WORLD} a) = {0}; | flohofwoe wrote: | Another one, adhoc struct declaration in the return type of a | function: struct bla_t { int a, b, c, d; } | make_bla(void) { return (struct bla_t){ .a=1, .b=2, | .c=3, .d=4 }; } | | https://www.godbolt.org/z/Pha7dPzeq | | Also to be pedantic: "= {};" is not valid C (at least until C23) | and fails to compile on MSVC - GCC and Clang accept it as a non- | standard language extension though (the proper form would be "= | {0};"). | sbf501 wrote: | The switch/case anywhere looks equally useful and dangerous, and | is so close to assembly that it really illustrates the low-level | capabilities of C. | Teknoman117 wrote: | I consider the array pointer stuff a bit of a foot-gun in C. I've | seen too many examples of people mixing up uint8_t[][] and | uint8_t**. | | The "compound literals are lvalues" pattern I've seen many times | for inline initializing a struct that's only going to be around | as a parameter to a single function call. | PointyFluff wrote: | As someone who's moved on to Rust, I see this as one long list of | nightmares. | tmtvl wrote: | I also see a fair few elements on that list as being | problematic, to say the least. Can't stand Rust, though, so for | those times I really need high performance I try and keep my C | knowledge sharp-ish. | | Fortunately GCC has a whole bucket-list of warnings that can be | enabled (I like compiling with -Wall -Wextra -pedantic, myself) | which can, combined with proper tooling, catch many issues. | rahen wrote: | I don't think the "I use Rust btw" comments contribute much to | the discussion. | | C and Rust don't perfectly overlap, especially since Rust is | more a replacement to C++ than C. | xcdzvyn wrote: | While a few of these were interesting I'd love to see a short | technical explanation of each quirk for the feeble high-level | programmer (me). The first one for example, is foo initialised? | How so? | coliveira wrote: | The reason is that a struct doesn't generate a new scope, like | in C++. If you define something inside a struct it will also be | available outside of the struct. | veltas wrote: | I think it's aimed at C programmers. foo is a struct, so it's a | type, it's not a variable. The point is just that struct bar is | also defined by the definition of struct foo. | unnouinceput wrote: | Quote: "4. Flexible array members ..... int elems[]; // <-- | flexible array member" | | TIL that a dynamic array is also called flexible. This | generation, out of boringness, is trying to redefine well | established paradigms? Because, for me, a 90's formed developer, | "flexible" means maybe inheritance, or even better polymorphism. | There is nothing flexible about a dynamic array. Its structure is | well defined in the stack/heap, and with current compiler | optimizations can even be demoted to a simple static array for | faster access within CPU registries. | Jorengarenar wrote: | "Dynamic array" refers to block of memory allocated via | malloc() which you just happen to use as array. | | "Flexible array member" [0] is when you have a _struct_ and its | last member is an array with unspecified size. | | An example: #include <stdio.h> #include | <stdlib.h> struct Foo { int len; | int* arr; // dynamic "array" }; struct Bar { | int len; int arr[]; // FAM }; int | main() { const int n = 12; // | have to allocate myself; no guarante it will be nearby the rest | of struct struct Foo* a = malloc(sizeof a); | a->arr = malloc(n * sizeof *(a->arr)); // | array is part of the memory allocated for struct | struct Bar* x = malloc((sizeof x) + n*(sizeof *(x->arr))); | return 0; } | | [0]: https://en.wikipedia.org/wiki/Flexible_array_member | unnouinceput wrote: | >"Dynamic array" refers to block of memory allocated via | malloc() which you just happen to use as array.< | | No. A dynamic array is an array which can be expanded or | shrinked during its runtime life. The fact that C/C++ uses | malloc for that (and btw, it's not the only way to do it) | it's her problem. In other languages you have dynamic arrays | that can be expanded/shrinked without using an extra line - | main reason why nowadays Rust is a replacement for C/C++ | | >[0]< From you own wiki reference: "the flexible array member | must be last" | | LMAO, really? Well, that indeed is a bigger C quirk. In | Pascal, as an example, I can have it anywhere inside the | record (struct equivalent of C), and it can be just as | "flexible". | blep_ wrote: | It has to be last because it's not a pointer to the array, | it _is_ the array. The array elements are immediately after | the struct in memory. You can 't resize it without | reallocating the whole struct. | kazinator wrote: | Regarding 12, alignment of bitfields, how I believe it works is | that when the bitfield of type long is laid out, then the | structure so far is considered to be a vector of storage cells | whose size and alignment are those of long: | struct foo { char a; long b: 16; char c; | }; | | So, _a_ has been laid into the structure, so the current offset | is 1 byte. This is considered to be occupying a portion of an | existing _long_ type bitfield cell. In other words _a_ is | essentially taken to be an 8-bit field in the first _long_ -sized | cell of the structure. That cell looks like it has 56 bits left | in it (if we assume 64 bit long). Since 56 > 16, the new bitfield | _b_ is placed into that cell. When that field is placed, the | placement offset becomes 3. The type of _c_ being char, that | offset is acceptable for _c_. | | I've painstakingly reverse engineered the rules when developing | the FFI for TXR Lisp: 1> (sizeof (struct foo (a | char) (b (bit 16 long)) (c char))) 8 2> (alignof | (struct foo (a char) (b (bit 16 long)) (c char))) 8 | 3> (offsetof (struct foo (a char) (b (bit 16 long)) (c char)) a) | 0 4> (offsetof (struct foo (a char) (b (bit 16 long)) (c | char)) b) ** ffi-offsetof: b is a bitfield in #<ffi-type | (struct foo (a char) (b (bit 16 long)) (c char))> 4> | (offsetof (struct foo (a char) (b (bit 16 long)) (c char)) c) | 3 | | I've summarized my empirically-obtained understanding for the | benefit of users and anyone else doing similar work in a | different project. | | https://www.nongnu.org/txr/txr-manpage.html#N-027D075C | plq wrote: | Whenever the subject of C/C++ quirks is brought up, I always like | to point out the Deep C/C++ presentation: | | http://www.pvv.org/~oma/DeepC_slides_oct2011.pdf | | Source: https://freecomputerbooks.com/Deep-C-and- | Cpp.html#downloadLi... | | Previous discussion: https://news.ycombinator.com/item?id=3093323 | | It could be considered a bit dated at this point (It's before | C++11) but I find it still both entertaining and educating. | lanorienne wrote: | [deleted] | sureglymop wrote: | That was very interesting! | andrepd wrote: | Loved that, read it start to finish. C is already a minefield | and it looks positively tame when compared to C++! | veltas wrote: | "Flat Initializer Lists" is given as an example in K&R C I think, | at least the first edition, when writing those extra braces to | fill out an initializer must have felt very redundant. | | These days many compilers will warn if you do this, however, as | it is rare people do this and usually indicates a | misunderstanding of the type used. | | I think it's quite readable though, so it's a shame it causes | warnings. What do you think? struct { const char | *name; int age; } records[] = { "John", 20, | "Bertha", 40, "Andrew", 30, }; | blep_ wrote: | I find it slightly worse to read. It's C, so my brain is in | "newlines don't matter" reading mode, so I see an array of 6 | things and then have to mentally split them back up. | fb03 wrote: | My favorite C "quirk": If you have an array and you want to | access an item of it, you can swap the variable and the index | number (put the variable name inside brackets and the number | outside): a[5] | | is the same as: 5[a] | | why? a[5] is actually sugar for *(a + 5), so by | commutative property, you can also do *(5 + a) to access the same | memory position :-) | FartyMcFarter wrote: | One funny variant is this expression: "abcde"[4] | whoopdedo wrote: | You mean: 4["abcde"] | FartyMcFarter wrote: | Actually yes, oops :) | jwilk wrote: | That's #15 on the list. | brookst wrote: | *15# | cptnapalm wrote: | &(list + 15) = https://gist.github.com/fay59/5ccbe684e6e56a7d | f8815c3486568f... | the_svd_doctor wrote: | *(list + 14) ? | yccs27 wrote: | That's #list on the 15. | [deleted] | [deleted] | camel-cdr wrote: | Here are two of my favorite obscure quirks of C: | struct X { char x[8]; }; struct X awoo(void); | printf("%s\n", awoo().x); | | The above is UB in <= C99 and valid in >= C11. [0] | struct X { char b[8]; } foo(); int *b = foo().b; | printf("%s\n", b); | | The above is UB in >= C11 and valid in <= C99. [1] | | [0] | https://wiki.sei.cmu.edu/confluence/plugins/servlet/mobile?c... | | [1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1285.htm | Asooka wrote: | I really wish both would be valid in C11. Or rather I wish I | had "systems-C" where all the undefined behaviour added for | high performance computing was filed off and defined as | "whatever the platform does". | masklinn wrote: | > all the undefined behaviour added for high performance | computing | | UBs were added for cross-incompatibilities, where operations | were too "core" (and / or untestable) for IBs to be | acceptable. The reason was not performance (aside from not | imposing a runtime check where that would have been possible) | but portability: | | > 3.4.3 undefined behavior behavior, upon use of a | nonportable or erroneous program construct or of erroneous | data, for which this International Standard imposes no | requirements | | Those UBs were leveraged later on by optimising compilers, | because they provide constraints compensating for C's useless | type system. | | So you can just use a non-optimising compiler or one which | only does simple optimisations (e.g. tcc), and see what the | compiler generates from your UBs. | tsimionescu wrote: | The standard also has implementation-defined behavior, | doesn't it? | rwmj wrote: | Reminds me a bit of _" Who Says C is Simple?"_ written by the | people who wrote a C parser & analyser in OCaml (CIL): | | https://cil-project.github.io/cil/doc/html/cil/cil016.html | | Also: https://cil-project.github.io/cil/doc/html/cil/cil012.html | still_grokking wrote: | > "Who Says C is Simple?" | | People who don't know what "simple" means and confuse it with | "easy". | | https://www.entropywins.wtf/blog/2017/01/02/simple-is-not-ea... | | https://www.infoq.com/presentations/Simple-Made-Easy/ | | "Easy" things almost always lead to astonishing complexity. | | Also it's easy to see just how complex C is: Have a look at a | formal description of it! (And compare to a truly simple | language like e.g. LISP). | | https://github.com/kframework/c-semantics/tree/master/semant... | | In contrast some basic Lambda calculus language semantics fit | 0.5 of a page in K. | | https://www.youtube.com/watch?v=eSaIKHQOo4c | | https://www.youtube.com/watch?v=y5Tf1EZVj8E | owl_vision wrote: | +1 for simple is not easy, yet with enough thinking and | ingenious ideas, it is achievable. Thank for the links. | | "simplicity is the ultimate sophistication." -- Leonardo da | Vinci | 752963e64 wrote: | still_grokking wrote: | After learning about a few of these I started to understand why | people coming from C always said that PHP is a well designed | language... | | But OK, I understand that my mind is just not made for the | complexity of C. Most likely I'm not a real programmer. | | I get instantly knots in my brain and start to bang my head | against the wall when I need to look for too long on C code. | Actually even C documentation is enough to trigger this. (I get | mad every time I have to look on a Linux system man page). | | This is highly subjective of course. Other people seem to love C! | | I'm more of a grug brain1, who mostly only understands plain pure | functions. | | Input in, output out. No magic. Everything else's too taxing. | | 1 https://grugbrain.dev/ | [deleted] | beyonddream wrote: | Can someone explain how "A constant-expression macro that tells | you if an expression is an integer constant" works ? | scatters wrote: | If `x` is a constant, `(x) * 0l` is a zero constant, so | `(void*)((x) * 0l)` is a null pointer. When a null void pointer | is one branch of a ternary conditional, the expression takes | the (pointer) type of the other branch. | | If `x` is not a constant, `(void*)((x) * 0l)` is a void pointer | to address 0 (which may not even be a null pointer at runtime, | since null may have a runtime address distinct from zero!). The | ternary conditional then unifies the types of the branches, | resulting in `void*`. | beyonddream wrote: | My understanding of how it works is, with constant value, the | compiler replaces (x) with the constant 0 and converts (void *) | into (int *) which makes the size equality to return true. But | I am not entire sure :) | ainar-g wrote: | Are there any practical cases where you'd want "extern void foo"? | veltas wrote: | You could use it for getting an address that will be linked in | later. On GCC I get a warning (which I don't think I can mask) | for taking the address of such an object, because its | expression is type void. A better way of achieving this is | usually to declare something like extern unsigned char foo[] | instead, but that has a type other than void*. ___________________________________________________________________ (page generated 2022-11-20 23:00 UTC)