[HN Gopher] Mildly interesting quirks of C
       ___________________________________________________________________
        
       Mildly interesting quirks of C
        
       Author : goranmoomin
       Score  : 206 points
       Date   : 2022-11-20 11:54 UTC (11 hours ago)
        
 (HTM) web link (gist.github.com)
 (TXT) w3m dump (gist.github.com)
        
       | photochemsyn wrote:
       | Related: "A primer on some C obfuscation tricks"
       | 
       | https://news.ycombinator.com/item?id=22961054
        
       | guenthert wrote:
       | > UB is impossible
       | 
       | What? UB is clearly _undesirable_ , but assuming it is impossible
       | and deducing other outcomes must be meant are clearly wrong
       | assumptions by the compiler writer.
       | 
       | More sensible compilers (including older version of clang) do the
       | right thing (TM) here and yield a compiler error.
       | 
       | There were earlier attempts at do-what-i-mean programming
       | languages. They are rightfully buried in history.
        
         | MauranKilom wrote:
         | > UB is clearly undesirable, but assuming it is impossible and
         | deducing other outcomes must be meant are clearly wrong
         | assumptions by the compiler writer.
         | 
         | Compilers can and absolutely do assume that UB is impossible in
         | this code (no integer overflow) and deduce other outcomes must
         | be meant (the loop operates on contiguous memory):
         | void foo(char* arr, int32_t end)       {         for (int32_t i
         | = 0; i != end; ++i)           arr[i] = 0;       }
         | 
         | (Based on code from the gist comments.)
        
         | cvoss wrote:
         | UB is not impossible; I think the author is being a little
         | cheeky there. But the standard does grant compilers extreme
         | liberties as far as how they deal with programs which can
         | execute UB. LLVM's choice of what to do with that liberty, in
         | this case, seems to be to assume the UB is unreachable and
         | continue legally optimizing the program under that assumption.
         | That's not a wrong assumption according to the definition of C.
         | 
         | It's debatable whether it's a _good_ assumption. But not wrong.
        
       | antirez wrote:
       | The top comment in the gist looks like from "Hacker News Parody
       | Thread".
        
         | xigoi wrote:
         | The parody thread should include a comment that references
         | another comment by its position, not realizing that it might
         | change.
        
       | hoosieree wrote:
       | 8. Modifiers to array sizes in parameter definitions
       | [https://godbolt.org/z/FnwYUs]         void foo(int arr[static
       | const restrict volatile 10]) {             // static: the array
       | contains at least 10 elements             // const, volatile and
       | restrict all apply to the array type.         }
       | 
       | I imagine most of these depend on the C version, but this one
       | specifically bit me because one tool only supported c99 and the
       | other was c11 or something later.
        
       | teddyh wrote:
       | See also the _comp.lang.c Frequently Asked Questions_ :
       | https://c-faq.com/
        
       | marcthe12 wrote:
       | C quirks. This is interesting. I have used some of the tricks
       | myself, #1,#2,#4,#5.
       | 
       | #2 and #5 can be combined to make and interesting hack. When
       | combinded with memcpy you can do                   int *a =
       | memcpy(&(int){0}, b, sizeof *b);
       | 
       | C23 typeof makes this even more interesting
       | 
       | If you what an challenge here is a standard compliant c code. Try
       | to undestand it. If can understand you are a master of c's type
       | system                   static int* (*const *(*restrict
       | x)[5])(volatile union {struct{int a;int b;};}[static const
       | restrict 5], register enum{HELLO,WORLD} a) = {0};
        
       | flohofwoe wrote:
       | Another one, adhoc struct declaration in the return type of a
       | function:                   struct bla_t { int a, b, c, d; }
       | make_bla(void) {             return (struct bla_t){ .a=1, .b=2,
       | .c=3, .d=4 };         }
       | 
       | https://www.godbolt.org/z/Pha7dPzeq
       | 
       | Also to be pedantic: "= {};" is not valid C (at least until C23)
       | and fails to compile on MSVC - GCC and Clang accept it as a non-
       | standard language extension though (the proper form would be "=
       | {0};").
        
       | sbf501 wrote:
       | The switch/case anywhere looks equally useful and dangerous, and
       | is so close to assembly that it really illustrates the low-level
       | capabilities of C.
        
       | Teknoman117 wrote:
       | I consider the array pointer stuff a bit of a foot-gun in C. I've
       | seen too many examples of people mixing up uint8_t[][] and
       | uint8_t**.
       | 
       | The "compound literals are lvalues" pattern I've seen many times
       | for inline initializing a struct that's only going to be around
       | as a parameter to a single function call.
        
       | PointyFluff wrote:
       | As someone who's moved on to Rust, I see this as one long list of
       | nightmares.
        
         | tmtvl wrote:
         | I also see a fair few elements on that list as being
         | problematic, to say the least. Can't stand Rust, though, so for
         | those times I really need high performance I try and keep my C
         | knowledge sharp-ish.
         | 
         | Fortunately GCC has a whole bucket-list of warnings that can be
         | enabled (I like compiling with -Wall -Wextra -pedantic, myself)
         | which can, combined with proper tooling, catch many issues.
        
         | rahen wrote:
         | I don't think the "I use Rust btw" comments contribute much to
         | the discussion.
         | 
         | C and Rust don't perfectly overlap, especially since Rust is
         | more a replacement to C++ than C.
        
       | xcdzvyn wrote:
       | While a few of these were interesting I'd love to see a short
       | technical explanation of each quirk for the feeble high-level
       | programmer (me). The first one for example, is foo initialised?
       | How so?
        
         | coliveira wrote:
         | The reason is that a struct doesn't generate a new scope, like
         | in C++. If you define something inside a struct it will also be
         | available outside of the struct.
        
         | veltas wrote:
         | I think it's aimed at C programmers. foo is a struct, so it's a
         | type, it's not a variable. The point is just that struct bar is
         | also defined by the definition of struct foo.
        
       | unnouinceput wrote:
       | Quote: "4. Flexible array members ..... int elems[]; // <--
       | flexible array member"
       | 
       | TIL that a dynamic array is also called flexible. This
       | generation, out of boringness, is trying to redefine well
       | established paradigms? Because, for me, a 90's formed developer,
       | "flexible" means maybe inheritance, or even better polymorphism.
       | There is nothing flexible about a dynamic array. Its structure is
       | well defined in the stack/heap, and with current compiler
       | optimizations can even be demoted to a simple static array for
       | faster access within CPU registries.
        
         | Jorengarenar wrote:
         | "Dynamic array" refers to block of memory allocated via
         | malloc() which you just happen to use as array.
         | 
         | "Flexible array member" [0] is when you have a _struct_ and its
         | last member is an array with unspecified size.
         | 
         | An example:                 #include <stdio.h>       #include
         | <stdlib.h>              struct Foo {           int len;
         | int* arr; // dynamic "array"       };              struct Bar {
         | int len;           int arr[]; // FAM       };              int
         | main()       {           const int n = 12;                  //
         | have to allocate myself; no guarante it will be nearby the rest
         | of struct           struct Foo* a = malloc(sizeof a);
         | a->arr = malloc(n * sizeof *(a->arr));                  //
         | array is part of the memory allocated for struct
         | struct Bar* x = malloc((sizeof x) + n*(sizeof *(x->arr)));
         | return 0;       }
         | 
         | [0]: https://en.wikipedia.org/wiki/Flexible_array_member
        
           | unnouinceput wrote:
           | >"Dynamic array" refers to block of memory allocated via
           | malloc() which you just happen to use as array.<
           | 
           | No. A dynamic array is an array which can be expanded or
           | shrinked during its runtime life. The fact that C/C++ uses
           | malloc for that (and btw, it's not the only way to do it)
           | it's her problem. In other languages you have dynamic arrays
           | that can be expanded/shrinked without using an extra line -
           | main reason why nowadays Rust is a replacement for C/C++
           | 
           | >[0]< From you own wiki reference: "the flexible array member
           | must be last"
           | 
           | LMAO, really? Well, that indeed is a bigger C quirk. In
           | Pascal, as an example, I can have it anywhere inside the
           | record (struct equivalent of C), and it can be just as
           | "flexible".
        
             | blep_ wrote:
             | It has to be last because it's not a pointer to the array,
             | it _is_ the array. The array elements are immediately after
             | the struct in memory. You can 't resize it without
             | reallocating the whole struct.
        
       | kazinator wrote:
       | Regarding 12, alignment of bitfields, how I believe it works is
       | that when the bitfield of type long is laid out, then the
       | structure so far is considered to be a vector of storage cells
       | whose size and alignment are those of long:
       | struct foo {         char a;         long b: 16;         char c;
       | };
       | 
       | So, _a_ has been laid into the structure, so the current offset
       | is 1 byte. This is considered to be occupying a portion of an
       | existing _long_ type bitfield cell. In other words _a_ is
       | essentially taken to be an 8-bit field in the first _long_ -sized
       | cell of the structure. That cell looks like it has 56 bits left
       | in it (if we assume 64 bit long). Since 56 > 16, the new bitfield
       | _b_ is placed into that cell. When that field is placed, the
       | placement offset becomes 3. The type of _c_ being char, that
       | offset is acceptable for _c_.
       | 
       | I've painstakingly reverse engineered the rules when developing
       | the FFI for TXR Lisp:                 1> (sizeof (struct foo (a
       | char) (b (bit 16 long)) (c char)))       8       2> (alignof
       | (struct foo (a char) (b (bit 16 long)) (c char)))       8
       | 3> (offsetof (struct foo (a char) (b (bit 16 long)) (c char)) a)
       | 0       4> (offsetof (struct foo (a char) (b (bit 16 long)) (c
       | char)) b)       ** ffi-offsetof: b is a bitfield in #<ffi-type
       | (struct foo (a char) (b (bit 16 long)) (c char))>       4>
       | (offsetof (struct foo (a char) (b (bit 16 long)) (c char)) c)
       | 3
       | 
       | I've summarized my empirically-obtained understanding for the
       | benefit of users and anyone else doing similar work in a
       | different project.
       | 
       | https://www.nongnu.org/txr/txr-manpage.html#N-027D075C
        
       | plq wrote:
       | Whenever the subject of C/C++ quirks is brought up, I always like
       | to point out the Deep C/C++ presentation:
       | 
       | http://www.pvv.org/~oma/DeepC_slides_oct2011.pdf
       | 
       | Source: https://freecomputerbooks.com/Deep-C-and-
       | Cpp.html#downloadLi...
       | 
       | Previous discussion: https://news.ycombinator.com/item?id=3093323
       | 
       | It could be considered a bit dated at this point (It's before
       | C++11) but I find it still both entertaining and educating.
        
         | lanorienne wrote:
        
         | [deleted]
        
         | sureglymop wrote:
         | That was very interesting!
        
         | andrepd wrote:
         | Loved that, read it start to finish. C is already a minefield
         | and it looks positively tame when compared to C++!
        
       | veltas wrote:
       | "Flat Initializer Lists" is given as an example in K&R C I think,
       | at least the first edition, when writing those extra braces to
       | fill out an initializer must have felt very redundant.
       | 
       | These days many compilers will warn if you do this, however, as
       | it is rare people do this and usually indicates a
       | misunderstanding of the type used.
       | 
       | I think it's quite readable though, so it's a shame it causes
       | warnings. What do you think?                 struct { const char
       | *name; int age; } records[] = {           "John",   20,
       | "Bertha", 40,           "Andrew", 30,       };
        
         | blep_ wrote:
         | I find it slightly worse to read. It's C, so my brain is in
         | "newlines don't matter" reading mode, so I see an array of 6
         | things and then have to mentally split them back up.
        
       | fb03 wrote:
       | My favorite C "quirk": If you have an array and you want to
       | access an item of it, you can swap the variable and the index
       | number (put the variable name inside brackets and the number
       | outside):                   a[5]
       | 
       | is the same as:                   5[a]
       | 
       | why?                   a[5] is actually sugar for *(a + 5), so by
       | commutative property, you can also do *(5 + a) to access the same
       | memory position :-)
        
         | FartyMcFarter wrote:
         | One funny variant is this expression: "abcde"[4]
        
           | whoopdedo wrote:
           | You mean: 4["abcde"]
        
             | FartyMcFarter wrote:
             | Actually yes, oops :)
        
         | jwilk wrote:
         | That's #15 on the list.
        
           | brookst wrote:
           | *15#
        
           | cptnapalm wrote:
           | &(list + 15) = https://gist.github.com/fay59/5ccbe684e6e56a7d
           | f8815c3486568f...
        
             | the_svd_doctor wrote:
             | *(list + 14) ?
        
           | yccs27 wrote:
           | That's #list on the 15.
        
         | [deleted]
        
       | [deleted]
        
       | camel-cdr wrote:
       | Here are two of my favorite obscure quirks of C:
       | struct X { char x[8]; };         struct X awoo(void);
       | printf("%s\n", awoo().x);
       | 
       | The above is UB in <= C99 and valid in >= C11. [0]
       | struct X { char b[8]; } foo();         int *b = foo().b;
       | printf("%s\n", b);
       | 
       | The above is UB in >= C11 and valid in <= C99. [1]
       | 
       | [0]
       | https://wiki.sei.cmu.edu/confluence/plugins/servlet/mobile?c...
       | 
       | [1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1285.htm
        
         | Asooka wrote:
         | I really wish both would be valid in C11. Or rather I wish I
         | had "systems-C" where all the undefined behaviour added for
         | high performance computing was filed off and defined as
         | "whatever the platform does".
        
           | masklinn wrote:
           | > all the undefined behaviour added for high performance
           | computing
           | 
           | UBs were added for cross-incompatibilities, where operations
           | were too "core" (and / or untestable) for IBs to be
           | acceptable. The reason was not performance (aside from not
           | imposing a runtime check where that would have been possible)
           | but portability:
           | 
           | > 3.4.3 undefined behavior behavior, upon use of a
           | nonportable or erroneous program construct or of erroneous
           | data, for which this International Standard imposes no
           | requirements
           | 
           | Those UBs were leveraged later on by optimising compilers,
           | because they provide constraints compensating for C's useless
           | type system.
           | 
           | So you can just use a non-optimising compiler or one which
           | only does simple optimisations (e.g. tcc), and see what the
           | compiler generates from your UBs.
        
             | tsimionescu wrote:
             | The standard also has implementation-defined behavior,
             | doesn't it?
        
       | rwmj wrote:
       | Reminds me a bit of _" Who Says C is Simple?"_ written by the
       | people who wrote a C parser & analyser in OCaml (CIL):
       | 
       | https://cil-project.github.io/cil/doc/html/cil/cil016.html
       | 
       | Also: https://cil-project.github.io/cil/doc/html/cil/cil012.html
        
         | still_grokking wrote:
         | > "Who Says C is Simple?"
         | 
         | People who don't know what "simple" means and confuse it with
         | "easy".
         | 
         | https://www.entropywins.wtf/blog/2017/01/02/simple-is-not-ea...
         | 
         | https://www.infoq.com/presentations/Simple-Made-Easy/
         | 
         | "Easy" things almost always lead to astonishing complexity.
         | 
         | Also it's easy to see just how complex C is: Have a look at a
         | formal description of it! (And compare to a truly simple
         | language like e.g. LISP).
         | 
         | https://github.com/kframework/c-semantics/tree/master/semant...
         | 
         | In contrast some basic Lambda calculus language semantics fit
         | 0.5 of a page in K.
         | 
         | https://www.youtube.com/watch?v=eSaIKHQOo4c
         | 
         | https://www.youtube.com/watch?v=y5Tf1EZVj8E
        
           | owl_vision wrote:
           | +1 for simple is not easy, yet with enough thinking and
           | ingenious ideas, it is achievable. Thank for the links.
           | 
           | "simplicity is the ultimate sophistication." -- Leonardo da
           | Vinci
        
       | 752963e64 wrote:
        
       | still_grokking wrote:
       | After learning about a few of these I started to understand why
       | people coming from C always said that PHP is a well designed
       | language...
       | 
       | But OK, I understand that my mind is just not made for the
       | complexity of C. Most likely I'm not a real programmer.
       | 
       | I get instantly knots in my brain and start to bang my head
       | against the wall when I need to look for too long on C code.
       | Actually even C documentation is enough to trigger this. (I get
       | mad every time I have to look on a Linux system man page).
       | 
       | This is highly subjective of course. Other people seem to love C!
       | 
       | I'm more of a grug brain1, who mostly only understands plain pure
       | functions.
       | 
       | Input in, output out. No magic. Everything else's too taxing.
       | 
       | 1 https://grugbrain.dev/
        
         | [deleted]
        
       | beyonddream wrote:
       | Can someone explain how "A constant-expression macro that tells
       | you if an expression is an integer constant" works ?
        
         | scatters wrote:
         | If `x` is a constant, `(x) * 0l` is a zero constant, so
         | `(void*)((x) * 0l)` is a null pointer. When a null void pointer
         | is one branch of a ternary conditional, the expression takes
         | the (pointer) type of the other branch.
         | 
         | If `x` is not a constant, `(void*)((x) * 0l)` is a void pointer
         | to address 0 (which may not even be a null pointer at runtime,
         | since null may have a runtime address distinct from zero!). The
         | ternary conditional then unifies the types of the branches,
         | resulting in `void*`.
        
         | beyonddream wrote:
         | My understanding of how it works is, with constant value, the
         | compiler replaces (x) with the constant 0 and converts (void *)
         | into (int *) which makes the size equality to return true. But
         | I am not entire sure :)
        
       | ainar-g wrote:
       | Are there any practical cases where you'd want "extern void foo"?
        
         | veltas wrote:
         | You could use it for getting an address that will be linked in
         | later. On GCC I get a warning (which I don't think I can mask)
         | for taking the address of such an object, because its
         | expression is type void. A better way of achieving this is
         | usually to declare something like extern unsigned char foo[]
         | instead, but that has a type other than void*.
        
       ___________________________________________________________________
       (page generated 2022-11-20 23:00 UTC)