[HN Gopher] When static makes your C code 10 times faster
       ___________________________________________________________________
        
       When static makes your C code 10 times faster
        
       Author : rostayob
       Score  : 142 points
       Date   : 2021-07-04 13:12 UTC (9 hours ago)
        
 (HTM) web link (mazzo.li)
 (TXT) w3m dump (mazzo.li)
        
       | kahlonel wrote:
       | I think a simple "const" would have also done the trick.
       | Sometimes -O3 is clever enough to figure out that the value is
       | never written, so it makes it an asm constant.
        
         | painchoc wrote:
         | It is mostly about the fact that if the variable is not static,
         | then it's non-local to the translation unit and can be modified
         | from everywhere else. So its value needs to be loaded and a
         | plain and slow division is applied.
         | 
         | Having it local or const makes the compiler able to inline it
         | and do a simple bitwise and with a constant.
         | 
         | So yes, make your variables static const by default (if you
         | really need global).
        
         | jheriko wrote:
         | const_cast and linkage make const weaker than static on a
         | variable that doesn't change.
        
       | jokoon wrote:
       | So the compiler cannot always optimize...
        
       | peter_d_sherman wrote:
       | >"When modulus is static, gcc / clang know that it is private to
       | the current compilation unit, and therefore they can inline the
       | value itself. Then, they turn the expensive _div_ into a much
       | cheaper _and_ - since
       | 
       |  _mod'ing by a power of two -- is equal to bitwise and of that
       | number minus one!_
       | 
       | All you need to do is keep the bits lower than that power of two,
       | which is what the and will do."
        
       | EdSchouten wrote:
       | Back in 2012 I observed exactly the same thing. I actually
       | managed to get a warning for this added to Clang, called
       | -Wmissing-variable-declarations. When set, warnings are generated
       | in case a non-static global variable is defined without an
       | external declaration that precedes it.
       | 
       | I worked on this as part of FreeBSD, which is why their base
       | system is nowadays built with that flag enabled.
        
         | rostayob wrote:
         | Thanks, that's good to know!
        
       | Hello71 wrote:
       | static can also make your code 10 times smaller: note that in the
       | linked godbolt, there are actually two copies of both loop
       | functions: one regular, and one inlined. this is because the
       | compiler wants to inline the function, but is required to
       | generate an additional one in case someone else will be calling
       | it. what's more, at least on Linux, this copy cannot be removed
       | from the final executable even if nobody calls it, unless a) the
       | compilation is done with -ffunction-sections and the linking is
       | done with --gc-sections, or b) LTO is enabled. adding static to
       | the function declaration resolves this issue.
       | 
       | the situation is even worse with ELF dynamic libraries due to the
       | interaction of two rules: a) by default, all functions are
       | exported, and b) by default, all functions can be interposed,
       | e.g. by LD_PRELOAD. here, if you specify -fPIC in the compilation
       | arguments (as is required to produce a modern dynamic library),
       | _inlining is totally disabled_. for small functions, the call
       | overhead can be substantial.
        
       | alok-g wrote:
       | I think this is beyond simply making the variable
       | static/constant. It is the specific value of the constant that is
       | allowing the division to be substituted with bitwise AND, which
       | then makes it so much faster. I wonder how much the speedup would
       | be if some other near-random value is there for the constant
       | (which is likely beyond the purpose at hand).
        
       | ddulaney wrote:
       | This can be accomplished more simply and reliably by marking
       | modulus as const. The compiler currently has to reason about the
       | whole compilation unit to determine that modulus is not modified,
       | which works. However, if future code modifies modulus (either on
       | purpose or accidentally) or something changes that prevents the
       | compiler from performing global reasoning, the optimization will
       | be lost. By marking the actual intention, any modifications of
       | modulus turn into compiler errors. Plus, if it becomes important
       | to expose modulus to another compilation unit, now that's
       | possible.
       | 
       | This is a common issue with C code (including lots of code I've
       | written). It's really easy to forget to const something, which
       | forces the compiler to do global reasoning or to generate worse
       | code. I've gotten into the habit of making things const unless I
       | know I plan on mutating them, but I wish there was tooling that
       | encouraged it. (BTW, this is something Rust does well by making
       | things constant by default and requiring "mut" if it's mutable.)
        
         | forrestthewoods wrote:
         | > It's really easy to forget to const something, which forces
         | the compiler to do global reasoning or to generate worse code.
         | 
         | Global mutable state is pure evil. Don't write globals.
         | 
         | It shouldn't be easy to forget const on a global because a
         | mutable global should produce immediate revulsion and nausea.
         | 
         | (I don't really consider a const global to be "a global". So
         | ordinarily I'd just say globals are evil don't write globals.
         | But I'm trying to be explicit here.)
        
           | krapht wrote:
           | This is one of those sayings that I don't think is helpful.
           | What it should be is: limit variable scope to the smallest
           | thing it can be.
           | 
           | Nobody is being fooled about global state when you have a
           | singleton database connection, event bus router, or network
           | stack. I don't think your program is better when you pass in
           | i/o functionality to every single class context in the
           | constructor.
           | 
           | Similarly a mega-class that encapsulates everything your
           | program does is also a code smell. There's no point to a
           | private variable when everything can access it.
        
             | forrestthewoods wrote:
             | I respectfully disagree on both points.
             | 
             | > I don't think your program is better when you pass in i/o
             | functionality to every single class context in the
             | constructor.
             | 
             | Abstracting over I/O transport is an excellent thing to do.
             | This allows you to do things like easily record and replay
             | a network stream. Which is useful for both debugging and
             | automated tests.
             | 
             | I/O comes in a kazillion flavors. Networked, interprocess,
             | serial port, file, synthetic, etc etc. It's definitely
             | something that should be abstracted around and not doing so
             | is something I've deeply regretted in the past.
             | 
             | > a mega-class that encapsulates everything your program
             | does is also a code smell
             | 
             | Ok I agree it can have a foul odor. But even this can be
             | advantageous.
             | 
             | Once upon a time Blizzard gave a GDC presentation about
             | Overwatch. Kill-cam replays are notoriously difficult in
             | video games.
             | 
             | Blizzard's solution to this was delightfully elegant. They
             | made two copies of their world. One perpetually runs on
             | latest. One takes snapshots of the world every N frames.
             | When a player dies their viewport switches to the old
             | snapshot which then simulates and renders for ~6-10
             | seconds. When the replay finishes or skips the viewport
             | switches back to the main game, which never stopped
             | receiving updates. This was a relatively trivial
             | implementation given the complete lack of globals and
             | singletons.
             | 
             | A mega-class lets you run parallel instances of your
             | "world". It's also a nice pattern when you want to build-up
             | and tear-down your world in-process and guarantee no stale
             | state. For example when running tests you likely want
             | certain tests to "start clean". It's nice to be able to do
             | this without restarting the entire process.
             | 
             | I'll double-down that globals are evil. They are a
             | sometimes necessary evil. Or the least bad choice. But my
             | experience is that not using globals is almost always
             | simpler, more elegant, more flexible, and ultimately
             | preferable.
        
         | wuxb wrote:
         | I started to use const whenever possible after being familiar
         | with some compiler optimizations and the Haskell pl. The point
         | is knowing how to give the compiler an easier job.
        
         | rostayob wrote:
         | Author here -- I agree that const is better. The perf
         | difference I encountered was due to the static keyword though,
         | which is why the blog post talks about that specific issue.
        
           | 95014_refugee wrote:
           | The perf difference was due to the compiler being able to
           | infer 'const' by way of 'static'.
           | 
           | Advocating 'static' when your actual intent is 'const' does
           | less experienced readers a disservice; they will assume that
           | 'static' is meant to make things faster, and be disappointed
           | when it doesn't work for non-constant values.
        
             | rostayob wrote:
             | I am not advocating static. I'm advocating for looking at
             | what the compiler outputs when surprising behavior is
             | encountered. The example is extracted from a larger piece
             | of code, and I reported the minimal case as-is.
             | 
             | If anything, the title is meant to be read as "isn't it
             | amusing that something apparently unrelated such as
             | `static` causes a performance improvement".
             | 
             | That said, I have added a note clarifying this at top of
             | the post now.
        
         | wyldfire wrote:
         | If there were any expressions that took the address of the
         | variable, then even both `static const` qualifiers wouldn't
         | work for a sufficiently paranoid compiler.
        
           | giomasce wrote:
           | Are you sure? Isn't it UB to modify a const object? It will
           | probably end up in a non-writable memory page.
           | 
           | EDIT: gcc seems to agree with me: you can see the optimized
           | version here[1] and the unoptimzed version if you remove
           | "const".
           | 
           | [1] https://godbolt.org/z/KWrW45rK8
        
             | Snoonan999 wrote:
             | Const is to state that this module is not allowed to
             | modify. Think on what'const volatile int foo' means.
             | 
             | Mostly seen in embedded space.
        
             | not2b wrote:
             | This is why language standards specify what the compiler
             | can assume and call out some behavior as undefined, exactly
             | so compilers don't have to be paranoid and produce code
             | that sucks. If an underlying object is const, the compiler
             | is allowed to assume that it does not change (it is valid
             | to cast away const on a pointer or reference, but not if
             | the object itself was declared const).
        
               | toast0 wrote:
               | > (it is valid to cast away const on a pointer or
               | reference, but not if the object itself was declared
               | const).
               | 
               | Isn't it valid to cast to non-const for a const, but only
               | invalid to modify the const through the casted pointer?
        
               | [deleted]
        
               | why_only_15 wrote:
               | Interestingly GCC will still optimize the loop function
               | even if you have code that modifies the modulus.
               | https://godbolt.org/z/EE9PnrY7s
        
           | [deleted]
        
           | jheriko wrote:
           | i think this is most compilers.
        
         | kevin_b_er wrote:
         | I agree. Static alone was the incorrect choice. If the global
         | were being modified elsewhere in the file then this
         | optimization would also not be possible.
         | 
         | The core optimization is modulus % constant (and a power-of-2
         | as well). The static just enabled the optimizer to do better
         | heavy lifting to get there. A const would've made the intent
         | clear to the human reader and the compiler.
         | 
         | static const would've been best.
        
       | daneel_w wrote:
       | Use a const instead. It's the right tool for the job.
        
       | fallingfrog wrote:
       | Wouldn't making that const or using #define be a bit cleaner?
       | 
       | Honestly if I as the programmer knew that I was really trying to
       | select bits from a number I'd just use a binary and directly. In
       | that specific situation I think the intent is more clear that
       | way. Like:
       | 
       | //select the bottom 8 bits
       | 
       | unsigned bottom8 = val & 0xff;
        
       | sdfdf4434r34r wrote:
       | As an embedded programmer working on small micros I make every
       | single function static (including third party code, which I
       | modify and bring into the source tree and curate myself), gives
       | you global/link time optimizations and dead code elimination for
       | free, leads to better code even at -O0, -Og and -O1, static const
       | configuration structs/values gets optimized away, and so on.
       | 
       | Really wish it was the default.
        
         | [deleted]
        
         | kevin_thibedeau wrote:
         | It also gives you a nice private namespace for the translation
         | unit so you can confidently modify any of the static objects
         | without concern for impacting external code.
         | 
         | I once had a gray beard chew me out over changing a function
         | signature when revising things. I had to point out politely
         | that it was static and that anyone who managed to link to the
         | function had to be breaking a lot of rules to do so.
        
       | GlitchMr wrote:
       | Without `static`, compiler exports a symbol.                   $
       | cat value.c          int value = 42;              int get_value()
       | {             return value;         }              $ make value.o
       | gcc    -c -o value.o value.c              $ nm value.o
       | 0000000000000000 T get_value                          U
       | _GLOBAL_OFFSET_TABLE_         0000000000000000 D value
       | 
       | This symbol, not being `const` can be modified by any other
       | compilation unit.                   $ cat main.c         #include
       | <stdio.h>              int value;         int get_value();
       | int main() {             value = 123456789;
       | printf("%d\n", get_value());         }              $ make main.o
       | gcc    -c -o main.o main.c              $ cc value.o main.o
       | $ ./a.out         123456789
       | 
       | Compiler when generating an object file has to assume the value
       | of exported non-const symbol can change. It's necessary to tell
       | the compiler that the value cannot change, either by not
       | exporting the symbol by using `static` or making the value of it
       | `const`. In example provided in your article `static` makes sense
       | (or even `static const`) as I don't think there is a reason to
       | export this global.
        
         | rostayob wrote:
         | Yes, see last few paragraphs of the post, in which I also
         | speculate why GCC doesn't infer that there is only one
         | compilation unit.
        
         | makapuf wrote:
         | I quite like C syntax, and I sometimes long for a language that
         | would change a few elements (default to const and static,
         | different/safer standard api for strings). Some might call it
         | Go but a simple slightly updated C would be nice.
        
           | mhh__ wrote:
           | That is pretty much D (assuming you only want to use the
           | C-like bits) - it's not const by default but const is much
           | easier to use and much more violent
           | 
           | And arrays are bounds checked by default, no more issues with
           | them (and strings are arrays)
        
           | nicoburns wrote:
           | Have you seen zig? It's a modern language that tries to keep
           | the simplicity and explicitness of C (it does change the
           | syntax a bit though).
        
       | api wrote:
       | Division is slow, which is something most programmers don't know.
       | If you can binary AND instead of MOD this can be a huge win.
       | 
       | Multiplication is also very fast, usually one or two cycles on
       | larger chips.
        
         | jeffbee wrote:
         | If your compiler doesn't do this for divisors known at compile
         | time, get your money back.
        
       | IvanK_net wrote:
       | I think every programmer should know, that logical AND is many
       | times faster than Modulus (which is at least as hard as a
       | division), and use & instead of % right in his code for powers of
       | two (and not expect it to be done by a compiler).
        
         | toast0 wrote:
         | That's bitwise AND.
        
         | Rompect wrote:
         | This is like the easiest thing for a compiler to detect and
         | optimize.
        
           | codeflo wrote:
           | True, but something to be aware of: If the compiler uses
           | bitwise operations to implement %, it emits special handling
           | of negative values (it's really more of a remainder operator
           | than a modulus). You can avoid that either by using unsigned
           | numbers, or & as suggested:
           | 
           | https://godbolt.org/z/vdz59q994
        
       | simias wrote:
       | If I could travel back in time I'd tell Dennis to make "static"
       | the implicit default, and have a special keyword like "public" or
       | "export" for items that are meant to be accessible from outside
       | the compilation unit.
       | 
       | I'd also ask him to make "switch" break by default.
       | 
       | Then I'd go kill Hitler or something.
        
         | glouwbug wrote:
         | Because, admittedly, non-default switch breaks are on par with
         | Hitler
        
         | phaemon wrote:
         | No, even if you could kill Hitler, then WW2 doesn't happen,
         | computers aren't invented yet, so time travel isn't invented
         | yet so you can't head back to kill him. Read your briefing
         | notes.
         | 
         | Also
         | https://tvtropes.org/pmwiki/pmwiki.php/Main/HitlersTimeTrave...
        
         | wyldfire wrote:
         | Apparently everyone kills Hitler on their first trip [1].
         | 
         | [1] https://www.tor.com/2011/08/31/wikihistory/
        
         | andi999 wrote:
         | But then you would need 'unbreak' or just 'goto' to the next
         | case. No duffs device no cigar.
         | 
         | Probably a module/namespace system would be the biggest
         | improvement.
        
           | jcranmer wrote:
           | The only real case I use switch fallthrough for is when you
           | have two cases with the same code, e.g.:
           | switch (foo) {       case 1:       case 2:         /* common
           | body */         break;       case 3:         /* body 3 */
           | break;       }
           | 
           | It's not cognitively hard to make an empty case body
           | fallthrough to the next run, but include an implicit break at
           | the end of every nontrivial body.
           | 
           | Supporting Duff's Device is not a compelling feature to
           | support--irreducible loops are basically going to destroy any
           | hope of optimization you might accrue.
        
             | jackewiehose wrote:
             | > The only real case
             | 
             | which is probably the case in 90% of all my switch'es.
             | Worst use of a time machine ever.
        
           | pjmlp wrote:
           | > But then you would need 'unbreak' or just 'goto' to the
           | next case. No duffs device no cigar.
           | 
           | From Algol's linage point of view, a very good outcome.
        
           | jimsmart wrote:
           | FWIW: Go's switch statements are break by default, with an
           | explicit 'fallthrough' keyword if one wishes to override that
           | behaviour.
        
             | cpeterso wrote:
             | clang and gcc have a -Wimplicit-fallthrough warning flag
             | that will warn about switch cases that fall through without
             | either C++17's [[fallthrough]] attribute or a /*
             | fallthrough */ comment.
             | 
             | I made Firefox's code base able to compile with -Wimplicit-
             | fallthrough. About one hundred fall through cases needed to
             | be annotated and about 2-3 were actual bugs (though minor).
        
           | Someone wrote:
           | Yes, but you would need way fewer of such phrases than you
           | need now. Making the exceptional case take more code is the
           | way to go, IMO.
           | 
           | Many languages also support something like
           | case 1,2,4:
           | 
           | or even                 case 1-10, 20-22, 34:
           | 
           | That further decreases the need for falling through.
        
       | wyldfire wrote:
       | We have learned so much over the decades of using C. Lowest-
       | visibility-by-default is just one of the many good choices of
       | Rust and other C successors. The example here is for codegen
       | benefits but reducing visibility benefits encapsulation too.
        
       | nyc_pizzadev wrote:
       | My first C intuition would be defining this value as a macro. If
       | I was in C++, then a const would make sense.
        
       | lr4444lr wrote:
       | Great write up: a precise problem that digs into the internals
       | pointedly to teach a simple concept. This is exactly how I tell
       | the junior devs where I work to do lunch talks that give people
       | some concrete, memorable piece of learning that makes them better
       | in their practical work.
        
       | rostayob wrote:
       | Author here -- just to be clear, I agree that marking the
       | variable as const is the "right" thing to do here. I reported the
       | investigation as-is because removing a static declaration made
       | the code slower, which I then narrowed down to the isolated bit
       | of code.
        
       | codeflo wrote:
       | I agree with many of the sibling comments, static vs. non-static
       | is almost a complete red-herring.
       | 
       | Static only means that the variable is local to the translation
       | unit (the C file). The relevant difference in the example is
       | actually the const-ness of the variable, which you may put
       | explicitly, but which a powerful compiler can also infer here in
       | the static case.
       | 
       | Other than this optimization possibility, const and static are
       | orthogonal concepts. I'm not sure to what extent the article
       | author is aware of this.
       | 
       | So the lesson should be: use const (or #define) if you mean to
       | have a constant. It's still a good idea to also make things
       | static, but the real reason for that is to avoid name collisions
       | with variables in other C files.
        
         | jheriko wrote:
         | this is totally wrong in my experience, although some of the
         | understanding is right. by using static to limit to the
         | compilation unit the compiler can workout the value is
         | constant.
         | 
         | const doesn't do this thanks to const_cast etc. maybe things
         | have changed, but const on a file level variable doesn't do
         | this reliably, or at least hasn't for considerable lengths of
         | time.
         | 
         | of course 'static const' is the better answer. :)
        
           | codeflo wrote:
           | I hope your comment wasn't intended to be as hostile as it
           | reads to me, but I do wonder what your experience was
           | exactly, because there might be a misinterpretation.
           | 
           | From a technical standpoint, using const_cast to modify a
           | constant is Undefined Behavior. This was explicitly specified
           | that way to make constants inlineable by the compiler.
           | _Every_ compiler that I 've ever used does this, it's a
           | trivial but very effective optimization.
           | 
           | Proof by Godbolt: https://godbolt.org/z/djEdvee4s
           | 
           | Observe how GCC completely eliminates the contents of
           | undefined_mutate_modulus (which it's allowed to -- UB means
           | it can do anything with that function, and it chooses the
           | simplest possible thing) rather than de-optimizing mod4_const
           | like you suggested. Compilers are smart.
        
       | nicetryguy wrote:
       | Const / static allows for "immediate" ASM instruction generation:
       | which means the value is known at compile time so it can compare
       | it directly inline as opposed to the overhead of comparing it to
       | a labeled memory address. It's generally good practice whenever
       | possible.
        
         | omegalulw wrote:
         | Isn't that constexpr?
        
         | jheriko wrote:
         | what?!?!
         | 
         | static doesn't imply the value can be known, only in some cases
         | where its not modified in that compilation unit (which is
         | trivial to detect with SSA)
         | 
         | const is something else.
        
       | jheriko wrote:
       | really? people mentioned const?
       | 
       | you can tell how much low-level optimisation they have done if
       | they think its gonna change codegen reliably, or at ll.
        
         | einpoklum wrote:
         | See my other comment here regarding const.
        
       | arthur2e5 wrote:
       | Link time optimization would enable a similar change even without
       | a code edit. Using static is good, but it's a good idea to figure
       | out how to just let other people's code run fast too.
        
       | synergy20 wrote:
       | use const is a better choice, I changed static to const, the
       | result is the same here.
        
       | malkia wrote:
       | In ideal world const, constexpr, explicit (for constructors), and
       | no default implicit conversions (and others that I'v missed)
       | should've been the default...
        
       | jheriko wrote:
       | the biggest win here is informing the compiler sufficiently to
       | swap out div for and.
       | 
       | the use of static is just a tool to inform the compiler that the
       | value is a constant (which const /might/)
        
       | [deleted]
        
       | einpoklum wrote:
       | Title rephrase:
       | 
       | When the compiler can assume your values don't change magically,
       | it can optimize their use.
       | 
       | This is true for restricted pointers, for global-scope variables
       | which can only be accessed in the same translation unit, for
       | stuff in inlined functions (often), etc.
       | 
       | --------------------------------------------
       | 
       | const is a bit shifty. const makes the compiler restrict what it
       | allows you to write, but it can still not really assume other
       | functions don't break constness via casting:                 void
       | i_can_change_x_yeah_i_can_just_watch_me(const int* x)       {
       | *(int*) x = x + 1;       }
       | 
       | now, if the compiler sees the code, then fine (maybe), but when
       | all you see is:                 void sly(const int* x);
       | 
       | You can't assume the value pointed to by x can change. See this
       | on GodBolt: https://godbolt.org/z/fGEMj9Meo
       | 
       | and it could well be the same for constants too. But somehow it
       | isn't:
       | 
       | https://godbolt.org/z/fqGzh7o8z
        
         | missblit wrote:
         | Specifically it's well-defined behavior to mutate an object
         | after const_cast-ing away constness if the object wasn't const
         | to begin with (const references or const pointers can refer to
         | non-const objects).
         | 
         | In your first example you have `int x = 1;` which isn't const,
         | so the compiler has to assume that `f` may mutate it after
         | const casting.
         | 
         | In your second example you have `const int x = 1;` which is
         | const so the compiler can assume the value will never change.
        
           | einpoklum wrote:
           | You must be right. But - it's so easy to forget that! Or -
           | never to be told that in the first place.
        
       ___________________________________________________________________
       (page generated 2021-07-04 23:00 UTC)