[HN Gopher] Lesser known tricks, quirks and features of C
       ___________________________________________________________________
        
       Lesser known tricks, quirks and features of C
        
       Author : jiripospisil
       Score  : 198 points
       Date   : 2023-07-01 13:59 UTC (9 hours ago)
        
 (HTM) web link (jorengarenar.github.io)
 (TXT) w3m dump (jorengarenar.github.io)
        
       | dundarious wrote:
       | The register keyword is still useful in compiler specific
       | contexts. e.g., for GCC-ish compilers like gcc, clang:
       | long result;         register long nr __asm__("rax") =
       | __NR_close;         register long rfd __asm__("rdi") = fd;
       | __asm__ volatile ("syscall\n" : "=a"(result) : "r"(rfd), "0"(nr)
       | : "rcx", "r11", "memory", "cc");
       | 
       | The above is basically how you might implement the close(int)
       | syscall on x86-64.
       | 
       | You don't need to be doing embedded programming to find it useful
       | to dip down into assembly like that (though syscalls are perhaps
       | a bad example, even for a syscall not provided by your C library
       | -- that library probably provides a `syscall` function/macro that
       | does all this in a platform agnostic way).
       | 
       | Also, "%.*" is extremely useful with strings, i.e., "%.*s". Your
       | code base should be using length delimited strings (basically
       | `struct S { int len; char* str; };`) throughout, in which case
       | you can do `printf("%.*s\n", s.len, s.str);`
        
         | messe wrote:
         | Um, no. You could could remove the register/__asm__("reg")
         | qualifiers entirely and just specify "D" (rfd) as an input
         | parameter and the code would work fine. There is absolutely no
         | need for register there.
         | 
         | The only good use of register these days is for project-wide
         | globals in embedded contexts. IIRC one example of this is the
         | decompilation of Mario 64, where a certain register ALWAYS
         | contains the floating point value 1.0.
        
           | dundarious wrote:
           | Both are valid, but I much prefer the `register` method I
           | gave (documented in https://gcc.gnu.org/onlinedocs/gcc/Local-
           | Register-Variables....), as it is far more self-explanatory.
           | GCC's extended asm syntax has too many inscrutable constraint
           | and modifier codes even excluding these GCC-ext-asm-specific
           | codes to reference machine-specific registers by name. As
           | such, I totally disagree with your statement about "the only
           | good use". Given that I first learned about that method of
           | specifying registers by reading linux kernel source code, I
           | think others would disagree as well.
        
             | Dwedit wrote:
             | I think the GCC ASM syntax for specifying inputs and
             | outputs is quite clear enough, and doesn't require a
             | variable declaration with unusual syntax.
        
               | dundarious wrote:
               | I'm merely referring to the fact that I specified rdi by
               | writing "rdi" not "D". I can specify r10 by writing
               | "r10", but I can't remember how to specify that directly
               | in the inputs/outputs constraints -- a glance at
               | https://gcc.gnu.org/onlinedocs/gcc/Machine-
               | Constraints.html didn't show me how either, but I'm
               | guessing it's there.
               | 
               | [Edit: Although, on second glance, from
               | https://gcc.gnu.org/onlinedocs/gcc/Extended-
               | Asm.html#Input-O...:
               | 
               | > If you must use a specific register, but your Machine
               | Constraints do not provide sufficient control to select
               | the specific register you want, local register variables
               | may provide a solution (see Specifying Registers for
               | Local Variables).
               | 
               | indicates in the r10 case maybe you _must_ use the syntax
               | I gave?]
               | 
               | My preference is for the syntax that requires looking up
               | fewer tables in GCC docs, but as I said, the version you
               | prefer is fine too.
        
               | LegionMammal978 wrote:
               | Indeed, the register variable syntax is necessary for
               | many of the registers; there are only so many of them
               | that have been stuffed into the one-letter constraint.
               | I've used it before for making raw x86-64 Linux syscalls
               | (which can use r10, r8, and r9) without going through
               | errno, as part of a silly little project ([0]) to read
               | from a file descriptor without writing to process memory.
               | 
               | [0] https://pastebin.com/mepsedCC
        
               | dundarious wrote:
               | Nice. Yes, linux/tools/include/nolibc has syscall macros
               | that look near identical.
        
             | asveikau wrote:
             | One of the reasons for gcc's inline asm syntax being so
             | verbose is it tells the compiler which registers are used
             | and how. There is no indication in your last asm() that the
             | value of rax before the asm() is read from. This means the
             | compiler could assume it's safe to clobber rax just before.
             | After all, you assigned a value into rax and never read it,
             | a reasonable optimizer might say, why emit that first
             | assignment at all?
        
               | dundarious wrote:
               | I think you should read
               | https://gcc.gnu.org/onlinedocs/gcc/Extended-
               | Asm.html#Input-O... and the link I gave earlier, if you
               | think my example is something novel of my own
               | construction. It's basically straight from the GCC docs.
               | 
               | If your point is that I don't use result, that's because
               | it's a snippet written into Hacker News. I didn't write
               | the code to convert it into an errno and return -1 on
               | error, etc., but doing so would be perfectly valid, and
               | safe from your reasonable optimizer concerns.
        
               | asveikau wrote:
               | I see absolutely no examples in that link there where
               | they assign into a register via C code, not using it
               | anywhere else, and then assume you can read the same
               | value back from that register in an asm() statement
               | without declaring it as an input.
        
               | dundarious wrote:
               | I referenced both links (the latter links to the former,
               | and the former, which was in the first comment of mine
               | you replied to, contains the following examples:
               | register int *p1 asm ("r0") = ...;         register int
               | *p2 asm ("r1") = ...;         register int *result asm
               | ("r0");         asm ("sysint" : "=r" (result) : "0" (p1),
               | "r" (p2));
               | 
               | and                   int t1 = ...;         register int
               | *p1 asm ("r0") = ...;         register int *p2 asm ("r1")
               | = t1;         register int *result asm ("r0");
               | asm ("sysint" : "=r" (result) : "0" (p1), "r" (p2));
               | 
               | In my example, nr is rax and listed in the input section,
               | rfd is rdi and also listed in the input section, and
               | result is rax and listed in the output section (I even
               | used your preferred syntax for specifying rax here).
               | Using result after the syscall asm statement is perfectly
               | valid.
        
           | gpderetta wrote:
           | Even on x86, There are some registers where there isn't a
           | corresponding exact input operand modifier, so register is
           | the only option. But I forgot which register.
        
         | pm215 wrote:
         | clang does not support specific-register variables for
         | registers which are allocatable by its register allocator
         | (https://clang.llvm.org/docs/UsersManual.html#gcc-
         | extensions-...) so this is gcc-only. If you care about
         | portability between gcc and clang you'll need some other
         | approach...
        
         | WalterBright wrote:
         | D's inline assembler:                   asm {           mov
         | RAX,__NR_close;           mov RDI, fd;           call syscall;
         | }
         | 
         | The compiler automatically keeps track of which registers were
         | modified.
        
       | schemescape wrote:
       | I learned a few new things, but it would be more helpful if this
       | had info on whether these are standard and, if so, which standard
       | they are a part of.
        
       | thaliaarchi wrote:
       | The trick for preserving array lengths in function signatures
       | looks quite useful (e.g., `void foo(int p[static 1]);` for a non-
       | null pointer p). Unfortunately, I think the overloaded use of
       | `const` and `static` somewhat obfuscates its semantics.
        
       | asicsp wrote:
       | This article was already discussed here:
       | https://news.ycombinator.com/item?id=34855331 _(410 points | 4
       | months ago | 176 comments)_
       | 
       | But that link no longer works.
        
       | bryancoxwell wrote:
       | > I did a sloppy job of gathering some of them in this post (in
       | no particular order) with even sloppier short explanations
       | 
       | I wonder why developers tend to be so self deprecating
        
         | tyre wrote:
         | I think it's in part a preemptive defense against the endless
         | nitpicking from their audience.
        
           | camel-cdr wrote:
           | Yeah, I think it's this. IIRC one of the first places it was
           | posted is on the C_Programming reddit, where the bar for
           | "lesser known" is quite high.
        
         | badtension wrote:
         | Being afraid of failure and the impostor syndrome. You mark the
         | territory and lower the expectations to look better in the end
         | (even if only in your own eyes). A ton of people do it, it's
         | hard to get out of it.
        
         | dmvdoug wrote:
         | The earlier comments were so cynical I felt like I needed to
         | offer another possibility: maybe they set out to do this in a
         | more systematic way, then got so deep in the weeds they
         | realized it would take them forever to put it into more
         | systematic form, but they didn't want to just leave it sitting
         | there sight unseen. So they acknowledge it's sloppier than they
         | would like it to be, but hey, at least it's something. That's
         | not really self deprecating so much as just... being
         | transparent?
        
         | wongarsu wrote:
         | I assume it's just that being self-deprecating or humble
         | correlates with many traits that make you a good developer, so
         | people with those traits are more likely to end up in this
         | career path and stick around in it.
         | 
         | Just like being a sales person doesn't automatically make you
         | overconfident, but being overconfident makes you a good sales
         | person.
        
         | bmacho wrote:
         | For me it is what its written there: doing a less than
         | satisfactory job, and not wanting to do it correctly.
        
       | jnspts wrote:
       | C11 added _Generic to language, but turns out metaprogramming by
       | inhumanely abusing the preporcessor is possible even in pure C99:
       | meet Metalang99 library.
       | 
       | I'm actually working on a library doing just that! It's still in
       | very (very) early development, but maybe someone may find it to
       | be interesting. [1]
       | 
       | Link [2] is the implementation of a vector. Link [3] is a test
       | file implementing a vector of strings.
       | 
       | [1]: https://github.com/jenspots/libwheel
       | 
       | [2]:
       | https://github.com/jenspots/libwheel/blob/main/include/wheel...
       | 
       | [3]:
       | https://github.com/jenspots/libwheel/blob/main/tests/impl/st...
        
         | burstmode wrote:
         | C++ Metaprogramming is also just a bunch sugarcoated
         | preprocessor macros and it was never someting else.
        
           | idispatch wrote:
           | This is plainly not true
        
         | cempaka wrote:
         | The downvotes make me laugh, I did something pretty similar not
         | long after the _Generic keyword came out and remember getting a
         | pretty icy reception even though I was pretty up front about
         | how painful and crufty it is.
         | 
         | https://abissell.com/2014/01/16/c11s-_generic-keyword-macro-...
        
       | buserror wrote:
       | I "abuse" unions and anonymous unions all the time, it's very
       | practical, and make the 'user' code a lot clearer as you can
       | access the small 'namespace' as convenient. Here for example I
       | can access it as named coordinates, x,y points or a vector.
       | typedef union c2_rect_t {          struct {             c2_pt_t
       | tl, br;          };          c2_coord_t v[4];          struct {
       | c2_coord_t l,t,r,b;          };        } c2_rect_t;
        
         | synergy20 wrote:
         | confused by the code,can you elaborate more
        
         | trentnelson wrote:
         | 100% agree! I use whatever tool in the C toolbox results in the
         | easiest-to-(read|grok) code, which means tons of anonymous
         | union/struct "abuse", bit fields, function pointer typedefs,
         | strictly adhering to "Cutler Normal Form".
        
       | projektfu wrote:
       | I used to use the comma operator to return a status code in the
       | same line as an operation, but for some reason nobody liked my
       | style.                  if (error_condition)            return
       | *result=0, ERR_CODE;
       | 
       | So, back to writing lots of statements.
        
         | gpderetta wrote:
         | I have used it c++ to take a scoped lock without naming it and
         | returning a mutex protected value:                  return
         | scoped_lock{foo_mux_}, return foo_;
         | 
         | Also nobody likes it.
        
           | Joker_vD wrote:
           | Deosn't such an unnamed variable get's immediately
           | destructed, right at the comma? I am pretty certain I'd hit
           | exactly that problem and had to switch to __LINE__-macro to
           | name such scoped locks.
        
           | _kst_ wrote:
           | I think you mean:                   return
           | scoped_lock{foo_mux_}, foo_;
        
         | enriquto wrote:
         | I love this style (avoiding multiple-statement blocks) !
         | 
         | Sometimes you can still avoid multiple statements by
         | rearranging your code otherwise. For example, in your case, you
         | can set *result=0 at the beginning of the function. Other
         | times, you can also cram the assignment inside the condition
         | using short circuit evaluation; this trick somehow seems more
         | palatable to normies than the comma operator.
        
           | projektfu wrote:
           | Yeah, I felt that in cases (like when doing COM programming)
           | where the true result is almost always returned in a
           | parameter and the status code as the return value, it made a
           | lot of sense to me to combine those into one line wherever
           | they appeared. But, like the article says, this is a lesser-
           | known operator. In libc style, a similar thing makes sense,
           | e.g.                  return errno=ENOENT, (FILE*)0;
           | 
           | I don't know if anyone uses this style.
           | 
           | In K&R, they say that it's mostly used in for loops, such as
           | for (i = 0, j = strlen(s) - 1; i < j; i++, j--) ...
           | 
           | So that is where I use it now.
        
       | omoikane wrote:
       | %n format specifier was lesser known until an IOCCC winner made
       | it famous.
       | 
       | https://news.ycombinator.com/item?id=23445546 - Tic-Tac-Toe in a
       | single call to printf
        
       | Croftengea wrote:
       | > Compound literals are lvalues
       | 
       | > ((struct Foo){}).x = 4;
       | 
       | Do such lvalues have any real use?
        
         | eqvinox wrote:
         | The key thing about it being an lvalue is that you can take its
         | address -- you can only take the address of lvalues. Other than
         | that, no, no real use.
        
         | kzrdude wrote:
         | You could pass them to a function where something non-null is
         | required but you don't want to use it, like : `f(&(struct
         | Foo){0})`
        
           | [deleted]
        
       | eqvinox wrote:
       | The %.* example is so close to hitting its single most useful
       | application:                 char *something;  /* no null
       | termination */       size_t something_length;
       | printf("%.*s", (int)something_length, something);
       | 
       | Unfortunately, the .* argument has type (int), not size_t, and
       | it's signed... but if that's not a problem this is a great way to
       | format non-\0-terminated strings.
       | 
       | (And of course you can also use it to print a substring without
       | copying it around first.)
        
         | jrpelkonen wrote:
         | In this simple case, if the int cast is a problem, fwrite would
         | be an adequate alternative, don't you think?
        
           | PaulDavisThe1st wrote:
           | not for s(n)printf ...
        
             | LegionMammal978 wrote:
             | For that, the analogue would be memcpy; but both
             | alternatives lose the ease of surrounding the string with
             | other text, since you either have to do the length
             | calculations or define helper functions.
        
         | thaliaarchi wrote:
         | Somewhat related to this, printf alone in a loop is Turing-
         | complete, by using %-directives like that. It was introduced in
         | "Control-Flow Bending: On the Effectiveness of Control-Flow
         | Integrity" (Carlini, et al. 2015) and the authors have
         | implemented Brainfuck and an obfuscated tic-tac-toe with it.
         | 
         | [0]: https://nebelwelt.net/publications/files/15SEC.pdf
         | 
         | [1]: https://github.com/HexHive/printbf
         | 
         | [2]: https://github.com/carlini/printf-tac-toe
        
       | EPWN3D wrote:
       | Casting to and from void is a "lesser known" feature of C?
        
       | chrishill89 wrote:
       | > Multi-character constants
       | 
       | I asked on SO why C characters use `'` on both ends instead of
       | just one (e.g. why not just `'a` instead of `'a'`?). This seems
       | to have been the biggest reason.
        
         | qsort wrote:
         | The main reason for that is practicality. '\n' and '\0' are
         | also characters. You could somehow still parse it, but it would
         | be less clear and possibly need more escaping.
         | 
         | Multi-character constants are historical baggage.
        
           | dfox wrote:
           | > Multi-character constants are historical baggage.
           | 
           | It is more that it is an implementation detail of some
           | compilers that was then (ab)used by certain platforms.
        
           | chrishill89 wrote:
           | I don't see the issue. If it's a literal character it's one
           | character; if it's `\n` or or `\0` then it's two; if it is an
           | octal escape it's four; and so on.
           | 
           | You have to parse them the same way in a character literal as
           | in a string literal, anyway.
        
             | chrishill89 wrote:
             | > if it is an octal escape it's four;
             | 
             | I just figured out that
             | 
             | 1. `\0` and octal numbers share the same prefix
             | 
             | 2. Octal numbers can have 1-3 digits (not fixed)
             | 
             | So maybe it's more tricky than I thought.
        
               | _kst_ wrote:
               | An octal-escape-sequence is a backslash followed by 1, 2,
               | or 3 octal digits.
               | 
               | '\0' is just another octal escape sequence, not a
               | special-case syntax for the null character.
               | 
               | "\0", "\00", and "\000" all represent the same value;
               | "\0000" is a string literal containing a null character
               | followed by the digit '0'.
               | 
               | Hexadecimal escape sequences can be arbitrarily long. If
               | you need a string containing character 0x12 followed by
               | the digit '3', you can write "\x12" "3".
        
         | Ontonator wrote:
         | Not relevant to C, of course, but Ruby supports something like
         | this with `?a` being equivalent to `"a"` (both of which are
         | strings, since Ruby doesn't distinguish strings from
         | characters). From what I've seen, it is recommended against in
         | most styles, I assume because it is harder to read for most
         | people.
        
           | djur wrote:
           | In older days before Ruby had encoding-aware strings, ?a
           | would return the ASCII integer value of 'a'. It made sense in
           | that context but is now pretty much a quirky fossil.
        
       | bluetomcat wrote:
       | Fun fact: the order of type qualifiers (const, volatile,
       | restrict), type specifiers (char, int, long, short, float,
       | double, signed, unsigned) and storage-class specifiers (auto,
       | register, static, extern, typedef) is not enforced at the current
       | indirection level. This means that the following declarations are
       | identical:                   long long int x;         int long
       | long x;         long int long x;              typedef int myint;
       | int typedef myint;              const char *s;         char const
       | *s;              const char * const volatile restrict *ss;
       | const char * volatile const restrict *ss;
        
         | qsort wrote:
         | But preference in ordering immediately qualifies you as coming
         | from the east or the west const.
        
           | dmvdoug wrote:
           | I like the idea of someone sitting down and looking at
           | someone else's code, leaning back with satisfaction after
           | they notice the programmer's preference. "I like the cut of
           | their jib."
        
             | jfghi wrote:
             | I remember some really nice macro usage.
        
       | thumbuddy wrote:
       | This is the sprinkles on the icing of the five teir cake why C
       | scares me. Thanks for sharing this, I'm sure it will help someone
       | but I sincerely hope the never write C again.
        
         | lelanthran wrote:
         | > This is the sprinkles on the icing of the five teir cake why
         | C scares me. Thanks for sharing this, I'm sure it will help
         | someone but I sincerely hope the never write C again.
         | 
         | I looked through this list, and I gotta ask, which items
         | exactly do you find scary? Most other popular languages have
         | similar, if not worse, quirks than the ones in this particular
         | list.
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-07-01 23:00 UTC)