[HN Gopher] When static makes your C code 10 times faster ___________________________________________________________________ When static makes your C code 10 times faster Author : rostayob Score : 142 points Date : 2021-07-04 13:12 UTC (9 hours ago) (HTM) web link (mazzo.li) (TXT) w3m dump (mazzo.li) | kahlonel wrote: | I think a simple "const" would have also done the trick. | Sometimes -O3 is clever enough to figure out that the value is | never written, so it makes it an asm constant. | painchoc wrote: | It is mostly about the fact that if the variable is not static, | then it's non-local to the translation unit and can be modified | from everywhere else. So its value needs to be loaded and a | plain and slow division is applied. | | Having it local or const makes the compiler able to inline it | and do a simple bitwise and with a constant. | | So yes, make your variables static const by default (if you | really need global). | jheriko wrote: | const_cast and linkage make const weaker than static on a | variable that doesn't change. | jokoon wrote: | So the compiler cannot always optimize... | peter_d_sherman wrote: | >"When modulus is static, gcc / clang know that it is private to | the current compilation unit, and therefore they can inline the | value itself. Then, they turn the expensive _div_ into a much | cheaper _and_ - since | | _mod'ing by a power of two -- is equal to bitwise and of that | number minus one!_ | | All you need to do is keep the bits lower than that power of two, | which is what the and will do." | EdSchouten wrote: | Back in 2012 I observed exactly the same thing. I actually | managed to get a warning for this added to Clang, called | -Wmissing-variable-declarations. When set, warnings are generated | in case a non-static global variable is defined without an | external declaration that precedes it. | | I worked on this as part of FreeBSD, which is why their base | system is nowadays built with that flag enabled. | rostayob wrote: | Thanks, that's good to know! | Hello71 wrote: | static can also make your code 10 times smaller: note that in the | linked godbolt, there are actually two copies of both loop | functions: one regular, and one inlined. this is because the | compiler wants to inline the function, but is required to | generate an additional one in case someone else will be calling | it. what's more, at least on Linux, this copy cannot be removed | from the final executable even if nobody calls it, unless a) the | compilation is done with -ffunction-sections and the linking is | done with --gc-sections, or b) LTO is enabled. adding static to | the function declaration resolves this issue. | | the situation is even worse with ELF dynamic libraries due to the | interaction of two rules: a) by default, all functions are | exported, and b) by default, all functions can be interposed, | e.g. by LD_PRELOAD. here, if you specify -fPIC in the compilation | arguments (as is required to produce a modern dynamic library), | _inlining is totally disabled_. for small functions, the call | overhead can be substantial. | alok-g wrote: | I think this is beyond simply making the variable | static/constant. It is the specific value of the constant that is | allowing the division to be substituted with bitwise AND, which | then makes it so much faster. I wonder how much the speedup would | be if some other near-random value is there for the constant | (which is likely beyond the purpose at hand). | ddulaney wrote: | This can be accomplished more simply and reliably by marking | modulus as const. The compiler currently has to reason about the | whole compilation unit to determine that modulus is not modified, | which works. However, if future code modifies modulus (either on | purpose or accidentally) or something changes that prevents the | compiler from performing global reasoning, the optimization will | be lost. By marking the actual intention, any modifications of | modulus turn into compiler errors. Plus, if it becomes important | to expose modulus to another compilation unit, now that's | possible. | | This is a common issue with C code (including lots of code I've | written). It's really easy to forget to const something, which | forces the compiler to do global reasoning or to generate worse | code. I've gotten into the habit of making things const unless I | know I plan on mutating them, but I wish there was tooling that | encouraged it. (BTW, this is something Rust does well by making | things constant by default and requiring "mut" if it's mutable.) | forrestthewoods wrote: | > It's really easy to forget to const something, which forces | the compiler to do global reasoning or to generate worse code. | | Global mutable state is pure evil. Don't write globals. | | It shouldn't be easy to forget const on a global because a | mutable global should produce immediate revulsion and nausea. | | (I don't really consider a const global to be "a global". So | ordinarily I'd just say globals are evil don't write globals. | But I'm trying to be explicit here.) | krapht wrote: | This is one of those sayings that I don't think is helpful. | What it should be is: limit variable scope to the smallest | thing it can be. | | Nobody is being fooled about global state when you have a | singleton database connection, event bus router, or network | stack. I don't think your program is better when you pass in | i/o functionality to every single class context in the | constructor. | | Similarly a mega-class that encapsulates everything your | program does is also a code smell. There's no point to a | private variable when everything can access it. | forrestthewoods wrote: | I respectfully disagree on both points. | | > I don't think your program is better when you pass in i/o | functionality to every single class context in the | constructor. | | Abstracting over I/O transport is an excellent thing to do. | This allows you to do things like easily record and replay | a network stream. Which is useful for both debugging and | automated tests. | | I/O comes in a kazillion flavors. Networked, interprocess, | serial port, file, synthetic, etc etc. It's definitely | something that should be abstracted around and not doing so | is something I've deeply regretted in the past. | | > a mega-class that encapsulates everything your program | does is also a code smell | | Ok I agree it can have a foul odor. But even this can be | advantageous. | | Once upon a time Blizzard gave a GDC presentation about | Overwatch. Kill-cam replays are notoriously difficult in | video games. | | Blizzard's solution to this was delightfully elegant. They | made two copies of their world. One perpetually runs on | latest. One takes snapshots of the world every N frames. | When a player dies their viewport switches to the old | snapshot which then simulates and renders for ~6-10 | seconds. When the replay finishes or skips the viewport | switches back to the main game, which never stopped | receiving updates. This was a relatively trivial | implementation given the complete lack of globals and | singletons. | | A mega-class lets you run parallel instances of your | "world". It's also a nice pattern when you want to build-up | and tear-down your world in-process and guarantee no stale | state. For example when running tests you likely want | certain tests to "start clean". It's nice to be able to do | this without restarting the entire process. | | I'll double-down that globals are evil. They are a | sometimes necessary evil. Or the least bad choice. But my | experience is that not using globals is almost always | simpler, more elegant, more flexible, and ultimately | preferable. | wuxb wrote: | I started to use const whenever possible after being familiar | with some compiler optimizations and the Haskell pl. The point | is knowing how to give the compiler an easier job. | rostayob wrote: | Author here -- I agree that const is better. The perf | difference I encountered was due to the static keyword though, | which is why the blog post talks about that specific issue. | 95014_refugee wrote: | The perf difference was due to the compiler being able to | infer 'const' by way of 'static'. | | Advocating 'static' when your actual intent is 'const' does | less experienced readers a disservice; they will assume that | 'static' is meant to make things faster, and be disappointed | when it doesn't work for non-constant values. | rostayob wrote: | I am not advocating static. I'm advocating for looking at | what the compiler outputs when surprising behavior is | encountered. The example is extracted from a larger piece | of code, and I reported the minimal case as-is. | | If anything, the title is meant to be read as "isn't it | amusing that something apparently unrelated such as | `static` causes a performance improvement". | | That said, I have added a note clarifying this at top of | the post now. | wyldfire wrote: | If there were any expressions that took the address of the | variable, then even both `static const` qualifiers wouldn't | work for a sufficiently paranoid compiler. | giomasce wrote: | Are you sure? Isn't it UB to modify a const object? It will | probably end up in a non-writable memory page. | | EDIT: gcc seems to agree with me: you can see the optimized | version here[1] and the unoptimzed version if you remove | "const". | | [1] https://godbolt.org/z/KWrW45rK8 | Snoonan999 wrote: | Const is to state that this module is not allowed to | modify. Think on what'const volatile int foo' means. | | Mostly seen in embedded space. | not2b wrote: | This is why language standards specify what the compiler | can assume and call out some behavior as undefined, exactly | so compilers don't have to be paranoid and produce code | that sucks. If an underlying object is const, the compiler | is allowed to assume that it does not change (it is valid | to cast away const on a pointer or reference, but not if | the object itself was declared const). | toast0 wrote: | > (it is valid to cast away const on a pointer or | reference, but not if the object itself was declared | const). | | Isn't it valid to cast to non-const for a const, but only | invalid to modify the const through the casted pointer? | [deleted] | why_only_15 wrote: | Interestingly GCC will still optimize the loop function | even if you have code that modifies the modulus. | https://godbolt.org/z/EE9PnrY7s | [deleted] | jheriko wrote: | i think this is most compilers. | kevin_b_er wrote: | I agree. Static alone was the incorrect choice. If the global | were being modified elsewhere in the file then this | optimization would also not be possible. | | The core optimization is modulus % constant (and a power-of-2 | as well). The static just enabled the optimizer to do better | heavy lifting to get there. A const would've made the intent | clear to the human reader and the compiler. | | static const would've been best. | daneel_w wrote: | Use a const instead. It's the right tool for the job. | fallingfrog wrote: | Wouldn't making that const or using #define be a bit cleaner? | | Honestly if I as the programmer knew that I was really trying to | select bits from a number I'd just use a binary and directly. In | that specific situation I think the intent is more clear that | way. Like: | | //select the bottom 8 bits | | unsigned bottom8 = val & 0xff; | sdfdf4434r34r wrote: | As an embedded programmer working on small micros I make every | single function static (including third party code, which I | modify and bring into the source tree and curate myself), gives | you global/link time optimizations and dead code elimination for | free, leads to better code even at -O0, -Og and -O1, static const | configuration structs/values gets optimized away, and so on. | | Really wish it was the default. | [deleted] | kevin_thibedeau wrote: | It also gives you a nice private namespace for the translation | unit so you can confidently modify any of the static objects | without concern for impacting external code. | | I once had a gray beard chew me out over changing a function | signature when revising things. I had to point out politely | that it was static and that anyone who managed to link to the | function had to be breaking a lot of rules to do so. | GlitchMr wrote: | Without `static`, compiler exports a symbol. $ | cat value.c int value = 42; int get_value() | { return value; } $ make value.o | gcc -c -o value.o value.c $ nm value.o | 0000000000000000 T get_value U | _GLOBAL_OFFSET_TABLE_ 0000000000000000 D value | | This symbol, not being `const` can be modified by any other | compilation unit. $ cat main.c #include | <stdio.h> int value; int get_value(); | int main() { value = 123456789; | printf("%d\n", get_value()); } $ make main.o | gcc -c -o main.o main.c $ cc value.o main.o | $ ./a.out 123456789 | | Compiler when generating an object file has to assume the value | of exported non-const symbol can change. It's necessary to tell | the compiler that the value cannot change, either by not | exporting the symbol by using `static` or making the value of it | `const`. In example provided in your article `static` makes sense | (or even `static const`) as I don't think there is a reason to | export this global. | rostayob wrote: | Yes, see last few paragraphs of the post, in which I also | speculate why GCC doesn't infer that there is only one | compilation unit. | makapuf wrote: | I quite like C syntax, and I sometimes long for a language that | would change a few elements (default to const and static, | different/safer standard api for strings). Some might call it | Go but a simple slightly updated C would be nice. | mhh__ wrote: | That is pretty much D (assuming you only want to use the | C-like bits) - it's not const by default but const is much | easier to use and much more violent | | And arrays are bounds checked by default, no more issues with | them (and strings are arrays) | nicoburns wrote: | Have you seen zig? It's a modern language that tries to keep | the simplicity and explicitness of C (it does change the | syntax a bit though). | api wrote: | Division is slow, which is something most programmers don't know. | If you can binary AND instead of MOD this can be a huge win. | | Multiplication is also very fast, usually one or two cycles on | larger chips. | jeffbee wrote: | If your compiler doesn't do this for divisors known at compile | time, get your money back. | IvanK_net wrote: | I think every programmer should know, that logical AND is many | times faster than Modulus (which is at least as hard as a | division), and use & instead of % right in his code for powers of | two (and not expect it to be done by a compiler). | toast0 wrote: | That's bitwise AND. | Rompect wrote: | This is like the easiest thing for a compiler to detect and | optimize. | codeflo wrote: | True, but something to be aware of: If the compiler uses | bitwise operations to implement %, it emits special handling | of negative values (it's really more of a remainder operator | than a modulus). You can avoid that either by using unsigned | numbers, or & as suggested: | | https://godbolt.org/z/vdz59q994 | simias wrote: | If I could travel back in time I'd tell Dennis to make "static" | the implicit default, and have a special keyword like "public" or | "export" for items that are meant to be accessible from outside | the compilation unit. | | I'd also ask him to make "switch" break by default. | | Then I'd go kill Hitler or something. | glouwbug wrote: | Because, admittedly, non-default switch breaks are on par with | Hitler | phaemon wrote: | No, even if you could kill Hitler, then WW2 doesn't happen, | computers aren't invented yet, so time travel isn't invented | yet so you can't head back to kill him. Read your briefing | notes. | | Also | https://tvtropes.org/pmwiki/pmwiki.php/Main/HitlersTimeTrave... | wyldfire wrote: | Apparently everyone kills Hitler on their first trip [1]. | | [1] https://www.tor.com/2011/08/31/wikihistory/ | andi999 wrote: | But then you would need 'unbreak' or just 'goto' to the next | case. No duffs device no cigar. | | Probably a module/namespace system would be the biggest | improvement. | jcranmer wrote: | The only real case I use switch fallthrough for is when you | have two cases with the same code, e.g.: | switch (foo) { case 1: case 2: /* common | body */ break; case 3: /* body 3 */ | break; } | | It's not cognitively hard to make an empty case body | fallthrough to the next run, but include an implicit break at | the end of every nontrivial body. | | Supporting Duff's Device is not a compelling feature to | support--irreducible loops are basically going to destroy any | hope of optimization you might accrue. | jackewiehose wrote: | > The only real case | | which is probably the case in 90% of all my switch'es. | Worst use of a time machine ever. | pjmlp wrote: | > But then you would need 'unbreak' or just 'goto' to the | next case. No duffs device no cigar. | | From Algol's linage point of view, a very good outcome. | jimsmart wrote: | FWIW: Go's switch statements are break by default, with an | explicit 'fallthrough' keyword if one wishes to override that | behaviour. | cpeterso wrote: | clang and gcc have a -Wimplicit-fallthrough warning flag | that will warn about switch cases that fall through without | either C++17's [[fallthrough]] attribute or a /* | fallthrough */ comment. | | I made Firefox's code base able to compile with -Wimplicit- | fallthrough. About one hundred fall through cases needed to | be annotated and about 2-3 were actual bugs (though minor). | Someone wrote: | Yes, but you would need way fewer of such phrases than you | need now. Making the exceptional case take more code is the | way to go, IMO. | | Many languages also support something like | case 1,2,4: | | or even case 1-10, 20-22, 34: | | That further decreases the need for falling through. | wyldfire wrote: | We have learned so much over the decades of using C. Lowest- | visibility-by-default is just one of the many good choices of | Rust and other C successors. The example here is for codegen | benefits but reducing visibility benefits encapsulation too. | nyc_pizzadev wrote: | My first C intuition would be defining this value as a macro. If | I was in C++, then a const would make sense. | lr4444lr wrote: | Great write up: a precise problem that digs into the internals | pointedly to teach a simple concept. This is exactly how I tell | the junior devs where I work to do lunch talks that give people | some concrete, memorable piece of learning that makes them better | in their practical work. | rostayob wrote: | Author here -- just to be clear, I agree that marking the | variable as const is the "right" thing to do here. I reported the | investigation as-is because removing a static declaration made | the code slower, which I then narrowed down to the isolated bit | of code. | codeflo wrote: | I agree with many of the sibling comments, static vs. non-static | is almost a complete red-herring. | | Static only means that the variable is local to the translation | unit (the C file). The relevant difference in the example is | actually the const-ness of the variable, which you may put | explicitly, but which a powerful compiler can also infer here in | the static case. | | Other than this optimization possibility, const and static are | orthogonal concepts. I'm not sure to what extent the article | author is aware of this. | | So the lesson should be: use const (or #define) if you mean to | have a constant. It's still a good idea to also make things | static, but the real reason for that is to avoid name collisions | with variables in other C files. | jheriko wrote: | this is totally wrong in my experience, although some of the | understanding is right. by using static to limit to the | compilation unit the compiler can workout the value is | constant. | | const doesn't do this thanks to const_cast etc. maybe things | have changed, but const on a file level variable doesn't do | this reliably, or at least hasn't for considerable lengths of | time. | | of course 'static const' is the better answer. :) | codeflo wrote: | I hope your comment wasn't intended to be as hostile as it | reads to me, but I do wonder what your experience was | exactly, because there might be a misinterpretation. | | From a technical standpoint, using const_cast to modify a | constant is Undefined Behavior. This was explicitly specified | that way to make constants inlineable by the compiler. | _Every_ compiler that I 've ever used does this, it's a | trivial but very effective optimization. | | Proof by Godbolt: https://godbolt.org/z/djEdvee4s | | Observe how GCC completely eliminates the contents of | undefined_mutate_modulus (which it's allowed to -- UB means | it can do anything with that function, and it chooses the | simplest possible thing) rather than de-optimizing mod4_const | like you suggested. Compilers are smart. | nicetryguy wrote: | Const / static allows for "immediate" ASM instruction generation: | which means the value is known at compile time so it can compare | it directly inline as opposed to the overhead of comparing it to | a labeled memory address. It's generally good practice whenever | possible. | omegalulw wrote: | Isn't that constexpr? | jheriko wrote: | what?!?! | | static doesn't imply the value can be known, only in some cases | where its not modified in that compilation unit (which is | trivial to detect with SSA) | | const is something else. | jheriko wrote: | really? people mentioned const? | | you can tell how much low-level optimisation they have done if | they think its gonna change codegen reliably, or at ll. | einpoklum wrote: | See my other comment here regarding const. | arthur2e5 wrote: | Link time optimization would enable a similar change even without | a code edit. Using static is good, but it's a good idea to figure | out how to just let other people's code run fast too. | synergy20 wrote: | use const is a better choice, I changed static to const, the | result is the same here. | malkia wrote: | In ideal world const, constexpr, explicit (for constructors), and | no default implicit conversions (and others that I'v missed) | should've been the default... | jheriko wrote: | the biggest win here is informing the compiler sufficiently to | swap out div for and. | | the use of static is just a tool to inform the compiler that the | value is a constant (which const /might/) | [deleted] | einpoklum wrote: | Title rephrase: | | When the compiler can assume your values don't change magically, | it can optimize their use. | | This is true for restricted pointers, for global-scope variables | which can only be accessed in the same translation unit, for | stuff in inlined functions (often), etc. | | -------------------------------------------- | | const is a bit shifty. const makes the compiler restrict what it | allows you to write, but it can still not really assume other | functions don't break constness via casting: void | i_can_change_x_yeah_i_can_just_watch_me(const int* x) { | *(int*) x = x + 1; } | | now, if the compiler sees the code, then fine (maybe), but when | all you see is: void sly(const int* x); | | You can't assume the value pointed to by x can change. See this | on GodBolt: https://godbolt.org/z/fGEMj9Meo | | and it could well be the same for constants too. But somehow it | isn't: | | https://godbolt.org/z/fqGzh7o8z | missblit wrote: | Specifically it's well-defined behavior to mutate an object | after const_cast-ing away constness if the object wasn't const | to begin with (const references or const pointers can refer to | non-const objects). | | In your first example you have `int x = 1;` which isn't const, | so the compiler has to assume that `f` may mutate it after | const casting. | | In your second example you have `const int x = 1;` which is | const so the compiler can assume the value will never change. | einpoklum wrote: | You must be right. But - it's so easy to forget that! Or - | never to be told that in the first place. ___________________________________________________________________ (page generated 2021-07-04 23:00 UTC)