[HN Gopher] Lesser known tricks, quirks and features of C ___________________________________________________________________ Lesser known tricks, quirks and features of C Author : jiripospisil Score : 198 points Date : 2023-07-01 13:59 UTC (9 hours ago) (HTM) web link (jorengarenar.github.io) (TXT) w3m dump (jorengarenar.github.io) | dundarious wrote: | The register keyword is still useful in compiler specific | contexts. e.g., for GCC-ish compilers like gcc, clang: | long result; register long nr __asm__("rax") = | __NR_close; register long rfd __asm__("rdi") = fd; | __asm__ volatile ("syscall\n" : "=a"(result) : "r"(rfd), "0"(nr) | : "rcx", "r11", "memory", "cc"); | | The above is basically how you might implement the close(int) | syscall on x86-64. | | You don't need to be doing embedded programming to find it useful | to dip down into assembly like that (though syscalls are perhaps | a bad example, even for a syscall not provided by your C library | -- that library probably provides a `syscall` function/macro that | does all this in a platform agnostic way). | | Also, "%.*" is extremely useful with strings, i.e., "%.*s". Your | code base should be using length delimited strings (basically | `struct S { int len; char* str; };`) throughout, in which case | you can do `printf("%.*s\n", s.len, s.str);` | messe wrote: | Um, no. You could could remove the register/__asm__("reg") | qualifiers entirely and just specify "D" (rfd) as an input | parameter and the code would work fine. There is absolutely no | need for register there. | | The only good use of register these days is for project-wide | globals in embedded contexts. IIRC one example of this is the | decompilation of Mario 64, where a certain register ALWAYS | contains the floating point value 1.0. | dundarious wrote: | Both are valid, but I much prefer the `register` method I | gave (documented in https://gcc.gnu.org/onlinedocs/gcc/Local- | Register-Variables....), as it is far more self-explanatory. | GCC's extended asm syntax has too many inscrutable constraint | and modifier codes even excluding these GCC-ext-asm-specific | codes to reference machine-specific registers by name. As | such, I totally disagree with your statement about "the only | good use". Given that I first learned about that method of | specifying registers by reading linux kernel source code, I | think others would disagree as well. | Dwedit wrote: | I think the GCC ASM syntax for specifying inputs and | outputs is quite clear enough, and doesn't require a | variable declaration with unusual syntax. | dundarious wrote: | I'm merely referring to the fact that I specified rdi by | writing "rdi" not "D". I can specify r10 by writing | "r10", but I can't remember how to specify that directly | in the inputs/outputs constraints -- a glance at | https://gcc.gnu.org/onlinedocs/gcc/Machine- | Constraints.html didn't show me how either, but I'm | guessing it's there. | | [Edit: Although, on second glance, from | https://gcc.gnu.org/onlinedocs/gcc/Extended- | Asm.html#Input-O...: | | > If you must use a specific register, but your Machine | Constraints do not provide sufficient control to select | the specific register you want, local register variables | may provide a solution (see Specifying Registers for | Local Variables). | | indicates in the r10 case maybe you _must_ use the syntax | I gave?] | | My preference is for the syntax that requires looking up | fewer tables in GCC docs, but as I said, the version you | prefer is fine too. | LegionMammal978 wrote: | Indeed, the register variable syntax is necessary for | many of the registers; there are only so many of them | that have been stuffed into the one-letter constraint. | I've used it before for making raw x86-64 Linux syscalls | (which can use r10, r8, and r9) without going through | errno, as part of a silly little project ([0]) to read | from a file descriptor without writing to process memory. | | [0] https://pastebin.com/mepsedCC | dundarious wrote: | Nice. Yes, linux/tools/include/nolibc has syscall macros | that look near identical. | asveikau wrote: | One of the reasons for gcc's inline asm syntax being so | verbose is it tells the compiler which registers are used | and how. There is no indication in your last asm() that the | value of rax before the asm() is read from. This means the | compiler could assume it's safe to clobber rax just before. | After all, you assigned a value into rax and never read it, | a reasonable optimizer might say, why emit that first | assignment at all? | dundarious wrote: | I think you should read | https://gcc.gnu.org/onlinedocs/gcc/Extended- | Asm.html#Input-O... and the link I gave earlier, if you | think my example is something novel of my own | construction. It's basically straight from the GCC docs. | | If your point is that I don't use result, that's because | it's a snippet written into Hacker News. I didn't write | the code to convert it into an errno and return -1 on | error, etc., but doing so would be perfectly valid, and | safe from your reasonable optimizer concerns. | asveikau wrote: | I see absolutely no examples in that link there where | they assign into a register via C code, not using it | anywhere else, and then assume you can read the same | value back from that register in an asm() statement | without declaring it as an input. | dundarious wrote: | I referenced both links (the latter links to the former, | and the former, which was in the first comment of mine | you replied to, contains the following examples: | register int *p1 asm ("r0") = ...; register int | *p2 asm ("r1") = ...; register int *result asm | ("r0"); asm ("sysint" : "=r" (result) : "0" (p1), | "r" (p2)); | | and int t1 = ...; register int | *p1 asm ("r0") = ...; register int *p2 asm ("r1") | = t1; register int *result asm ("r0"); | asm ("sysint" : "=r" (result) : "0" (p1), "r" (p2)); | | In my example, nr is rax and listed in the input section, | rfd is rdi and also listed in the input section, and | result is rax and listed in the output section (I even | used your preferred syntax for specifying rax here). | Using result after the syscall asm statement is perfectly | valid. | gpderetta wrote: | Even on x86, There are some registers where there isn't a | corresponding exact input operand modifier, so register is | the only option. But I forgot which register. | pm215 wrote: | clang does not support specific-register variables for | registers which are allocatable by its register allocator | (https://clang.llvm.org/docs/UsersManual.html#gcc- | extensions-...) so this is gcc-only. If you care about | portability between gcc and clang you'll need some other | approach... | WalterBright wrote: | D's inline assembler: asm { mov | RAX,__NR_close; mov RDI, fd; call syscall; | } | | The compiler automatically keeps track of which registers were | modified. | schemescape wrote: | I learned a few new things, but it would be more helpful if this | had info on whether these are standard and, if so, which standard | they are a part of. | thaliaarchi wrote: | The trick for preserving array lengths in function signatures | looks quite useful (e.g., `void foo(int p[static 1]);` for a non- | null pointer p). Unfortunately, I think the overloaded use of | `const` and `static` somewhat obfuscates its semantics. | asicsp wrote: | This article was already discussed here: | https://news.ycombinator.com/item?id=34855331 _(410 points | 4 | months ago | 176 comments)_ | | But that link no longer works. | bryancoxwell wrote: | > I did a sloppy job of gathering some of them in this post (in | no particular order) with even sloppier short explanations | | I wonder why developers tend to be so self deprecating | tyre wrote: | I think it's in part a preemptive defense against the endless | nitpicking from their audience. | camel-cdr wrote: | Yeah, I think it's this. IIRC one of the first places it was | posted is on the C_Programming reddit, where the bar for | "lesser known" is quite high. | badtension wrote: | Being afraid of failure and the impostor syndrome. You mark the | territory and lower the expectations to look better in the end | (even if only in your own eyes). A ton of people do it, it's | hard to get out of it. | dmvdoug wrote: | The earlier comments were so cynical I felt like I needed to | offer another possibility: maybe they set out to do this in a | more systematic way, then got so deep in the weeds they | realized it would take them forever to put it into more | systematic form, but they didn't want to just leave it sitting | there sight unseen. So they acknowledge it's sloppier than they | would like it to be, but hey, at least it's something. That's | not really self deprecating so much as just... being | transparent? | wongarsu wrote: | I assume it's just that being self-deprecating or humble | correlates with many traits that make you a good developer, so | people with those traits are more likely to end up in this | career path and stick around in it. | | Just like being a sales person doesn't automatically make you | overconfident, but being overconfident makes you a good sales | person. | bmacho wrote: | For me it is what its written there: doing a less than | satisfactory job, and not wanting to do it correctly. | jnspts wrote: | C11 added _Generic to language, but turns out metaprogramming by | inhumanely abusing the preporcessor is possible even in pure C99: | meet Metalang99 library. | | I'm actually working on a library doing just that! It's still in | very (very) early development, but maybe someone may find it to | be interesting. [1] | | Link [2] is the implementation of a vector. Link [3] is a test | file implementing a vector of strings. | | [1]: https://github.com/jenspots/libwheel | | [2]: | https://github.com/jenspots/libwheel/blob/main/include/wheel... | | [3]: | https://github.com/jenspots/libwheel/blob/main/tests/impl/st... | burstmode wrote: | C++ Metaprogramming is also just a bunch sugarcoated | preprocessor macros and it was never someting else. | idispatch wrote: | This is plainly not true | cempaka wrote: | The downvotes make me laugh, I did something pretty similar not | long after the _Generic keyword came out and remember getting a | pretty icy reception even though I was pretty up front about | how painful and crufty it is. | | https://abissell.com/2014/01/16/c11s-_generic-keyword-macro-... | buserror wrote: | I "abuse" unions and anonymous unions all the time, it's very | practical, and make the 'user' code a lot clearer as you can | access the small 'namespace' as convenient. Here for example I | can access it as named coordinates, x,y points or a vector. | typedef union c2_rect_t { struct { c2_pt_t | tl, br; }; c2_coord_t v[4]; struct { | c2_coord_t l,t,r,b; }; } c2_rect_t; | synergy20 wrote: | confused by the code,can you elaborate more | trentnelson wrote: | 100% agree! I use whatever tool in the C toolbox results in the | easiest-to-(read|grok) code, which means tons of anonymous | union/struct "abuse", bit fields, function pointer typedefs, | strictly adhering to "Cutler Normal Form". | projektfu wrote: | I used to use the comma operator to return a status code in the | same line as an operation, but for some reason nobody liked my | style. if (error_condition) return | *result=0, ERR_CODE; | | So, back to writing lots of statements. | gpderetta wrote: | I have used it c++ to take a scoped lock without naming it and | returning a mutex protected value: return | scoped_lock{foo_mux_}, return foo_; | | Also nobody likes it. | Joker_vD wrote: | Deosn't such an unnamed variable get's immediately | destructed, right at the comma? I am pretty certain I'd hit | exactly that problem and had to switch to __LINE__-macro to | name such scoped locks. | _kst_ wrote: | I think you mean: return | scoped_lock{foo_mux_}, foo_; | enriquto wrote: | I love this style (avoiding multiple-statement blocks) ! | | Sometimes you can still avoid multiple statements by | rearranging your code otherwise. For example, in your case, you | can set *result=0 at the beginning of the function. Other | times, you can also cram the assignment inside the condition | using short circuit evaluation; this trick somehow seems more | palatable to normies than the comma operator. | projektfu wrote: | Yeah, I felt that in cases (like when doing COM programming) | where the true result is almost always returned in a | parameter and the status code as the return value, it made a | lot of sense to me to combine those into one line wherever | they appeared. But, like the article says, this is a lesser- | known operator. In libc style, a similar thing makes sense, | e.g. return errno=ENOENT, (FILE*)0; | | I don't know if anyone uses this style. | | In K&R, they say that it's mostly used in for loops, such as | for (i = 0, j = strlen(s) - 1; i < j; i++, j--) ... | | So that is where I use it now. | omoikane wrote: | %n format specifier was lesser known until an IOCCC winner made | it famous. | | https://news.ycombinator.com/item?id=23445546 - Tic-Tac-Toe in a | single call to printf | Croftengea wrote: | > Compound literals are lvalues | | > ((struct Foo){}).x = 4; | | Do such lvalues have any real use? | eqvinox wrote: | The key thing about it being an lvalue is that you can take its | address -- you can only take the address of lvalues. Other than | that, no, no real use. | kzrdude wrote: | You could pass them to a function where something non-null is | required but you don't want to use it, like : `f(&(struct | Foo){0})` | [deleted] | eqvinox wrote: | The %.* example is so close to hitting its single most useful | application: char *something; /* no null | termination */ size_t something_length; | printf("%.*s", (int)something_length, something); | | Unfortunately, the .* argument has type (int), not size_t, and | it's signed... but if that's not a problem this is a great way to | format non-\0-terminated strings. | | (And of course you can also use it to print a substring without | copying it around first.) | jrpelkonen wrote: | In this simple case, if the int cast is a problem, fwrite would | be an adequate alternative, don't you think? | PaulDavisThe1st wrote: | not for s(n)printf ... | LegionMammal978 wrote: | For that, the analogue would be memcpy; but both | alternatives lose the ease of surrounding the string with | other text, since you either have to do the length | calculations or define helper functions. | thaliaarchi wrote: | Somewhat related to this, printf alone in a loop is Turing- | complete, by using %-directives like that. It was introduced in | "Control-Flow Bending: On the Effectiveness of Control-Flow | Integrity" (Carlini, et al. 2015) and the authors have | implemented Brainfuck and an obfuscated tic-tac-toe with it. | | [0]: https://nebelwelt.net/publications/files/15SEC.pdf | | [1]: https://github.com/HexHive/printbf | | [2]: https://github.com/carlini/printf-tac-toe | EPWN3D wrote: | Casting to and from void is a "lesser known" feature of C? | chrishill89 wrote: | > Multi-character constants | | I asked on SO why C characters use `'` on both ends instead of | just one (e.g. why not just `'a` instead of `'a'`?). This seems | to have been the biggest reason. | qsort wrote: | The main reason for that is practicality. '\n' and '\0' are | also characters. You could somehow still parse it, but it would | be less clear and possibly need more escaping. | | Multi-character constants are historical baggage. | dfox wrote: | > Multi-character constants are historical baggage. | | It is more that it is an implementation detail of some | compilers that was then (ab)used by certain platforms. | chrishill89 wrote: | I don't see the issue. If it's a literal character it's one | character; if it's `\n` or or `\0` then it's two; if it is an | octal escape it's four; and so on. | | You have to parse them the same way in a character literal as | in a string literal, anyway. | chrishill89 wrote: | > if it is an octal escape it's four; | | I just figured out that | | 1. `\0` and octal numbers share the same prefix | | 2. Octal numbers can have 1-3 digits (not fixed) | | So maybe it's more tricky than I thought. | _kst_ wrote: | An octal-escape-sequence is a backslash followed by 1, 2, | or 3 octal digits. | | '\0' is just another octal escape sequence, not a | special-case syntax for the null character. | | "\0", "\00", and "\000" all represent the same value; | "\0000" is a string literal containing a null character | followed by the digit '0'. | | Hexadecimal escape sequences can be arbitrarily long. If | you need a string containing character 0x12 followed by | the digit '3', you can write "\x12" "3". | Ontonator wrote: | Not relevant to C, of course, but Ruby supports something like | this with `?a` being equivalent to `"a"` (both of which are | strings, since Ruby doesn't distinguish strings from | characters). From what I've seen, it is recommended against in | most styles, I assume because it is harder to read for most | people. | djur wrote: | In older days before Ruby had encoding-aware strings, ?a | would return the ASCII integer value of 'a'. It made sense in | that context but is now pretty much a quirky fossil. | bluetomcat wrote: | Fun fact: the order of type qualifiers (const, volatile, | restrict), type specifiers (char, int, long, short, float, | double, signed, unsigned) and storage-class specifiers (auto, | register, static, extern, typedef) is not enforced at the current | indirection level. This means that the following declarations are | identical: long long int x; int long | long x; long int long x; typedef int myint; | int typedef myint; const char *s; char const | *s; const char * const volatile restrict *ss; | const char * volatile const restrict *ss; | qsort wrote: | But preference in ordering immediately qualifies you as coming | from the east or the west const. | dmvdoug wrote: | I like the idea of someone sitting down and looking at | someone else's code, leaning back with satisfaction after | they notice the programmer's preference. "I like the cut of | their jib." | jfghi wrote: | I remember some really nice macro usage. | thumbuddy wrote: | This is the sprinkles on the icing of the five teir cake why C | scares me. Thanks for sharing this, I'm sure it will help someone | but I sincerely hope the never write C again. | lelanthran wrote: | > This is the sprinkles on the icing of the five teir cake why | C scares me. Thanks for sharing this, I'm sure it will help | someone but I sincerely hope the never write C again. | | I looked through this list, and I gotta ask, which items | exactly do you find scary? Most other popular languages have | similar, if not worse, quirks than the ones in this particular | list. | [deleted] ___________________________________________________________________ (page generated 2023-07-01 23:00 UTC)