[HN Gopher] Effortless Performance Improvements in C++: std:vector ___________________________________________________________________ Effortless Performance Improvements in C++: std:vector Author : jandeboevrie Score : 18 points Date : 2023-03-08 20:30 UTC (2 hours ago) (HTM) web link (julien.jorge.st) (TXT) w3m dump (julien.jorge.st) | cmovq wrote: | Tokenizing by storing strings in a vector is almost never what | you want for high performance code, as it will result in an | allocation for each token. | | If you can keep the original source string around, consider using | std::vector<std::string_view> with each string_view pointing to | part of the original text. | | An even better approach is to avoid using an intermediary vector | altogether if all you need is to process the tokens one-by-one | and store them in a map. You could have `std::string_view | parse_next_token(std::string_view *text);` which advances the | source text and returns the next token. | jeffbee wrote: | Right? Effortless performance improvements without having to | make it a "journey" and write 50 blog posts: absl::StrSplit | will stop this vector reallocation _and_ get rid of those | string copies, and if you want you can just iterate over it | without storing anything. | Negitivefrags wrote: | I like to implement an iterator that returns string_view when | dereferenced. You can make the ergonomics quite nice with a | helper class. | | for( auto value : from_csv( str ) ) { ... } | beached_whale wrote: | With vector for trivial things, push_back is almost always slower | than resizing and then trimming down if needed. The memset to | zero it out is minuscule to the cost of push_back. Guess if you | don't know how many and it will probably be faster. if you run | out, do another block | adzm wrote: | I also love using flat_map etc which implements a map as a sorted | vector. Look up is blazing fast. And perhaps surprisingly, | allocating a new vector and copying everything over is actually | pretty fast too. | vitus wrote: | IMO any discussion of std::vector::reserve should be accompanied | by warnings that it can actually make your program slower if used | improperly. | | https://en.cppreference.com/w/cpp/container/vector/reserve | | > Correctly using reserve() can prevent unnecessary | reallocations, but inappropriate uses of reserve() (for instance, | calling it before every push_back() call) may actually increase | the number of reallocations (by causing the capacity to grow | linearly rather than exponentially) and result in increased | computational complexity and decreased performance. | | The cost of vector dynamic reallocation has gone down | dramatically since C++11 introduced move constructors -- the | example of a vector<string> actually uses std::move (which is | comparable to copying 3 pointers, as opposed to the pointed-to | allocation of the underlying string). Of course, that specific | case relies on std::string's move constructor being defined as | noexcept. So, when used properly, reserve will speed up your | program, just often not as much as you might expect. | einpoklum wrote: | > The cost of vector dynamic reallocation has gone down | dramatically since C++11 introduced move constructors | | 1. It's gone down, but it's still very high. | | 2. It hasn't gone down for types types like std::string_view, | for which moving and copying take about the same amount of | effort. | peterept wrote: | To avoid memory allocations, and if you can modify the source | string in place, then an alternative is to return | std::vector<char*> and modify the string to replace the | separators with '\0'. | | Of course, as that post suggests, use reserve() to encourage | having the vector itself as optimal as possible. (In my strsplit | call I pass it in as optional so each caller can optimize it). | jeffbee wrote: | That's just strtok, and programming C++ as if you are an | unreformed C programmer is always a mistake. If you want to not | copy the strings, string_view. We also have std::split and | std::views::split etc. ___________________________________________________________________ (page generated 2023-03-08 23:00 UTC)