[HN Gopher] Effortless Performance Improvements in C++: std:vector
       ___________________________________________________________________
        
       Effortless Performance Improvements in C++: std:vector
        
       Author : jandeboevrie
       Score  : 18 points
       Date   : 2023-03-08 20:30 UTC (2 hours ago)
        
 (HTM) web link (julien.jorge.st)
 (TXT) w3m dump (julien.jorge.st)
        
       | cmovq wrote:
       | Tokenizing by storing strings in a vector is almost never what
       | you want for high performance code, as it will result in an
       | allocation for each token.
       | 
       | If you can keep the original source string around, consider using
       | std::vector<std::string_view> with each string_view pointing to
       | part of the original text.
       | 
       | An even better approach is to avoid using an intermediary vector
       | altogether if all you need is to process the tokens one-by-one
       | and store them in a map. You could have `std::string_view
       | parse_next_token(std::string_view *text);` which advances the
       | source text and returns the next token.
        
         | jeffbee wrote:
         | Right? Effortless performance improvements without having to
         | make it a "journey" and write 50 blog posts: absl::StrSplit
         | will stop this vector reallocation _and_ get rid of those
         | string copies, and if you want you can just iterate over it
         | without storing anything.
        
         | Negitivefrags wrote:
         | I like to implement an iterator that returns string_view when
         | dereferenced. You can make the ergonomics quite nice with a
         | helper class.
         | 
         | for( auto value : from_csv( str ) ) { ... }
        
       | beached_whale wrote:
       | With vector for trivial things, push_back is almost always slower
       | than resizing and then trimming down if needed. The memset to
       | zero it out is minuscule to the cost of push_back. Guess if you
       | don't know how many and it will probably be faster. if you run
       | out, do another block
        
       | adzm wrote:
       | I also love using flat_map etc which implements a map as a sorted
       | vector. Look up is blazing fast. And perhaps surprisingly,
       | allocating a new vector and copying everything over is actually
       | pretty fast too.
        
       | vitus wrote:
       | IMO any discussion of std::vector::reserve should be accompanied
       | by warnings that it can actually make your program slower if used
       | improperly.
       | 
       | https://en.cppreference.com/w/cpp/container/vector/reserve
       | 
       | > Correctly using reserve() can prevent unnecessary
       | reallocations, but inappropriate uses of reserve() (for instance,
       | calling it before every push_back() call) may actually increase
       | the number of reallocations (by causing the capacity to grow
       | linearly rather than exponentially) and result in increased
       | computational complexity and decreased performance.
       | 
       | The cost of vector dynamic reallocation has gone down
       | dramatically since C++11 introduced move constructors -- the
       | example of a vector<string> actually uses std::move (which is
       | comparable to copying 3 pointers, as opposed to the pointed-to
       | allocation of the underlying string). Of course, that specific
       | case relies on std::string's move constructor being defined as
       | noexcept. So, when used properly, reserve will speed up your
       | program, just often not as much as you might expect.
        
         | einpoklum wrote:
         | > The cost of vector dynamic reallocation has gone down
         | dramatically since C++11 introduced move constructors
         | 
         | 1. It's gone down, but it's still very high.
         | 
         | 2. It hasn't gone down for types types like std::string_view,
         | for which moving and copying take about the same amount of
         | effort.
        
       | peterept wrote:
       | To avoid memory allocations, and if you can modify the source
       | string in place, then an alternative is to return
       | std::vector<char*> and modify the string to replace the
       | separators with '\0'.
       | 
       | Of course, as that post suggests, use reserve() to encourage
       | having the vector itself as optimal as possible. (In my strsplit
       | call I pass it in as optional so each caller can optimize it).
        
         | jeffbee wrote:
         | That's just strtok, and programming C++ as if you are an
         | unreformed C programmer is always a mistake. If you want to not
         | copy the strings, string_view. We also have std::split and
         | std::views::split etc.
        
       ___________________________________________________________________
       (page generated 2023-03-08 23:00 UTC)