[HN Gopher] In C++, is empty() faster than comparing the size wi... ___________________________________________________________________ In C++, is empty() faster than comparing the size with zero? Author : ibobev Score : 93 points Date : 2021-10-27 08:44 UTC (1 days ago) (HTM) web link (lemire.me) (TXT) w3m dump (lemire.me) | sillycross wrote: | > Amazingly, we find that the GCC compilers is able to compile | Travis' is_empty C++ function to constant-time code. | | It's actually an interesting example where undefined behavior | allowed compiler optimization: | | (1) dereferencing an invalid pointer is UD | | (2) signed integer overflow is UD. | | This allows the compiler to assume that the program never crashes | and the counter never overflows. The loop is then optimized out | knowing that it is read-only thus has no side-effects. | em3rgent0rdr wrote: | I suppose it is a safe assumption that the counter will never | overlow _if_ memory space is not large enough to possibly hold | the maximum positive integer number of data structs. Though if | say was using a smaller int for size than what the hypothetical | size that virtual memory could handle, such as a short int in a | 32-bit memory system, then that assumption may not be true. But | on a 64-bit linux system which only allows up to 128 TiB of | virtual address space for individual processes, a 64-bit signed | int could be as large as 2^63, which would be larger than the | hypothetical maximum size that a 128 TiB virtual memory could | reference, so the assumption that the size counter could never | overflow would be safe. | joosters wrote: | Maybe I'm splitting hairs, but it's not specifically the | _presence_ of 'undefined behavior' that allows the compiler | optimization. Instead, it's the language specification. The C | spec says that integers cannot be relied upon to overflow. The | result of this is that compilers are then free to assume that | the program they are compiling has _NO_ undefined behavior in | it, and so the optimization is possible. | | EDIT: To make it clearer, you could imagine an alternate | version of C that aborted the program if an integer overflowed. | Then there would be no undefined behavior at all - but the | optimization is still possible. It's not the UB that helps us | here, it's the language spec telling us what behavior is | reliable and what is not. | adrian_b wrote: | You need not imagine an alternate version of C, such a | version of C is provided by any decent C compiler. | | For example, with gcc you can use either the option "-ftrapv" | or the better option "-fsanitize=undefined -fsanitize- | undefined-trap-on-error" and the program will abort on | integer overflows (with the second option it will also abort | for many other error conditions, e.g. access out of bounds). | joosters wrote: | Sure, but I understand why the C spec does not define this | - because not all processors can trap these events, at all | times and in all situations, without added costs. And C is | very much of the opinion that it doesn't want added costs - | but it lets you pick and choose what you want, hence the | compiler options. | | I don't think there's an obvious 'good' and 'bad' choice | here. | admax88qqq wrote: | I think the "good" choice is to remove undefined | behaviour and accept that C will be slighlty slower on | esoteric hardware. | bregma wrote: | > You need not imagine an alternate version of C, such a | version of C is provided by any decent C compiler. | | Classic misunderstanding of undefined behaviour. In this | case it's still undefined behaviour, but the vendor has | said "this is what my compiler will do with this particular | undefined behaviour under these circumstances". Vendors are | allowed to do anything they want when code containing | undefined behaviour is submitted to their compiler, | including doing something you might consider sane. | joosters wrote: | Saying 'this is what I will do when an int can't hold the | assigned value' is _defining_ the behavior, ergo it is no | longer undefined! | | _Vendors are allowed to do anything they want when code | containing undefined behaviour is submitted to their | compiler_ | | Yes, and his compiler is saying that it will trap these | events. Are you suggesting that the compiler will | secretly ignore this setting and then go on to do all | kinds of insane things? | bregma wrote: | > Saying 'this is what I will do when an int can't hold | the assigned value' is defining the behavior, ergo it is | no longer undefined! | | Defining the behaviour of a particular compiler under | specific circumstances does not equate to defining the | behaviour of the C language. | joosters wrote: | You are reading words that nobody wrote. No-one claimed | that they were redefining the 'C spec' by passing flags | to their compiler - the standards board's job is still | safe! | adrian_b wrote: | I agree that leaving such important behavior as undefined | is a failure for a programming language standard, because | the standard fails to guarantee that a program will do | the right things independently of what compiler has been | used and with what compiler options. | | Nevertheless, in such cases it is the responsibility of | the programmer to either write programs in which it is | guaranteed that events for which the program behavior is | undefined will never happen or to choose compiler options | to handle such events. | | If the programmers do their job right, then the compiler | is free to optimize like when the undefined cases do not | exist. | | Unlike the case with integer overflow, there are cases | where the programming language standards correctly leave | some behavior as undefined, e.g. the order of evaluation | for function arguments. For that kind of undefined | behavior the programmer must take care to write only | programs whose effects do not depend on it. | | Data dependent events like integer overflow cannot always | be avoided, so compilers must have options to generate | exceptions when they happen. | umanwizard wrote: | The behavior is defined, but by the compiler | documentation, not the language standard. Different | documents can define different things. | IshKebab wrote: | > To make it clearer, you could imagine an alternate version | of C that aborted the program if an integer overflowed. Then | there would be no undefined behavior at all - but the | optimization is still possible. | | The optimisation wouldn't be possible in that case because | then the program wouldn't abort when the integer overflowed. | It would break the defined behaviour that overflow=abort. | minitech wrote: | > Then there would be no undefined behavior at all - but the | optimization is still possible. | | But then an optimization could change the defined behavior of | aborting to not aborting, which is essentially what undefined | behavior means and really bad if you don't treat it exactly | like undefined behavior. | Noughmad wrote: | > It's actually an interesting example where undefined behavior | allowed compiler optimization | | That is literally reason why any behavior is considered | undefined. So that the compiler can skip checks to produce | better optimized code. | secondcoming wrote: | And so you get an optimal, but broken, program. This helps | nobody. | thamer wrote: | It's interesting that size() _has_ to be constant-time. From what | I understand of the standard linked from the blog post, it looks | like size() being constant-time is a property of Container and | not each individual method (SS 23.2.1, page 747). | | What does this mean for implementations that want to have these | properties but have internal structures that make this inherently | difficult? Do they have to recompute a cached size value during | other operations, and amortize the cost of size() on insert, for | example? | | I can see it being tricky in some situations: imagine a | collection with expiring data, for example. You might want to | prune the expired items in the background, but still have size() | return the number of elements currently not expired. How would | someone do this while maintaining an exact size() value when the | background clean-up will inherently take some time? | plorkyeran wrote: | The standard's requirements for the standard library types | don't imply any corresponding requirement for types you define | yourself. If you want to write a container which can't | reasonably do O(1) size() then you just don't do O(1) size(). | If such a container was added to the standard library, the | standard library's Container requirements will be adjusted to | permit that. | chakkepolja wrote: | Exactly, most STL algorithms, for example, can be applied | using iterators as well. But anyway if GP has specific | requirements iterator invalidation might be a problem too, | they wouldn't probably want to constrain their interface to | that of STL containers. | tialaramex wrote: | Wait, isn't the _point_ of the STL algorithms that they use | iterators? | | For sure the iterator over a vector may be very cheap to | use, but it is still an iterator, even if you tacitly know | it's just pointer arithmetic under the hood. | brandmeyer wrote: | Once upon a time, size() wasn't required to be constant-time. | This showed up in some standard library implementations of | std::list::size which actually did count all the things. | | The painful ABI break at C++11 is a big part of why | implementors were so opposed to allowing another ABI break in | C++20. | jeffbee wrote: | It's also why other people in the C++ community are opposed | to ABIs themselves. | brandmeyer wrote: | The live-at-head model really only works well in a cloud | computing or JIT environments. It really shouldn't have | been a surprise to find that this model is unpopular | outside of those narrow use-cases. | jcelerier wrote: | in my personal experience it also works great for desktop | software | fhrow4484 wrote: | > What does this mean for implementations that want to have | these properties but have internal structures that make this | inherently difficult | | I'm not sure I can think of an example of such a data | structure? | | my understanding is that whatever your data structure is, you | can keep a "size_t current_size;" private counter which you | atomically increment/decrement on every insertion/deletion. | size() is simply: size_t size() { return | current_size; } | | In your background clean-up example structure, if the | background thread is removing N elements, it must decrease | current_size by N. Atomically! (i.e. remove the element and | decrease the current_size using a mutex::lock/unlock) | thamer wrote: | My point was that the background thread would remove elements | periodically, by checking something like: | if (it->expiry_time < now) erase(it); | | Where erase() could certainly remove the item and update the | size, but it would remove elements _after_ they have expired. | So if size() must return the actual number of unexpired | elements _present right now_ , the cached counter you mention | would not be up to date. | | This is what I meant by "have size() return the number of | elements currently not expired". What you're describing is | "size returns the number of elements currently not expired OR | expired but not yet cleaned up", which is different. | klyrs wrote: | > I'm not sure I can think of an example of such a data | structure? | | C style strings are the classical example. Computing the size | (strlen) takes linear time. Not saying that this is a good | idea or anything, but it's extremely common. | | One could imagine a binary tree implemented in a similar | manner. It costs a little time and memory to keep track of | the current size, but it almosy always pays off. | johntb86 wrote: | Some std::list::splice(const_iterator pos, | list& other, const_iterator first, const_iterator last) | | implementations used to be constant-time, since they could | just change some pointers in the spliced nodes. Nowadays they | take linear time, since they need to count how many nodes are | transferred so they can update the sizes correctly. | ape4 wrote: | Interesting, a good case for another std::list<> like | std::non_counting_list<> | jfrunyon wrote: | Wouldn't this still be constant-time today? It doesn't need | to count the nodes because the other list already has a | count of its nodes. | hermitdev wrote: | No, because you'd still need book keeping around the | length of [first, last), even if the implementation does | under-the-hood O(1) movement of nodes in [first, last) | from other to *this. Computing the length of [first, | last) is still an O(n) operation for forward iterators. | | Maybe you're assuming [first, last) is the entirety of | other? For that case, there are overloads for splicing an | entire list in O(1) time: void splice( | const_iterator pos, list& other ); void splice( | const_iterator pos, list&& other ); | jfrunyon wrote: | This seems like an extremely niche use-case, compared to | the prevalence of use-cases needing to check the size of | a list. | MauranKilom wrote: | No, because you can splice in the middle. Given just the | iterators you simply have no other way than counting. | | (The reason why it used to be constant time is that it | didn't have to count because it didn't have to compute | the new size). | tialaramex wrote: | > What does this mean for implementations that want to have | these properties but have internal structures that make this | inherently difficult? | | Such implementations are de facto forbidden in the STL. | | Although its containers have vague names like "forward_list" | today the C++ STL's containers are in practice very specific | data structures that have the specified properties because | other data structures would miss one or more of the | requirements. | | This suits "stability at all costs" C++ programmers fine. | Better that every C++ program is 10% slower than that their | buggy library from 1995 has to be fixed to actually do what the | standard says rather than whatever worked in 1995. | | If you care about this but you like C++ you just don't use the | STL. Third parties are free to make containers that use data | structures that aren't old enough to stand for President of the | United States of America, some of them aren't even old enough | to legally _drive_ in the United States of America. | loup-vaillant wrote: | If you're _actually_ at the point where you care about that kind | of performance, you may want to shy away from the STL, and use | (or write) something simpler instead. | pansa2 wrote: | On the other hand, if you're _not_ at the point where you care | about maximum performance, you may want to shy away from C++ | altogether. | | This implies that there are very few situations in which C++ | with STL is the right choice - which seems contrary to how | widely it's actually used. | TillE wrote: | I've always hated this bit of conventional wisdom, which has | become less and less applicable over the past couple decades. | STL implementations are now quite highly optimized within their | spec. With the one little caveat that they're generally | optimized for CPU cycles rather than per object memory usage. | | But you should _understand_ the performance characteristics of | the STL. A std::vector is super fast for basically everything | except for dynamic resizing. If performance is a concern, your | first approach should be to see if you can avoid reallocation. | Often it 's very simple to just preallocate a reasonable | maximum, or calculate an exact maximum based on your input. | | Finally, if you truly need to do weird stuff with resizing, you | can look at other containers or write your own. | glandium wrote: | I thought the conventional wisdom was to avoid the STL if you | need consistent performance across platforms, because they | all have different pitfalls. Things might be different with | libc++, now, though, since you can use it in most cases. | jb1991 wrote: | This was just discussed yesterday also: | https://news.ycombinator.com/item?id=29007821 | ufo wrote: | The most impressive example I've ever seen of this kind of | compiler cleverness is the following function from the | "convoluted isEven" competition: bool | isEven(int n) { return n == 0 || !isEven(n + 1); | } | | Clang somehow optimizes it to a constant-time implementation, | even if we use "unsigned"! https://gcc.godbolt.org/z/nGbn1T | whimsicalism wrote: | Is this code not relying on UB? Seems like it could/should be | optimized to `return true` | poizan42 wrote: | It's UB for positive integers, so the compiler can assume it | never happens. It's correct for negative integers, so the | compiler just optimizes it for negative integers and ignores | the positive ones - in this case with the end result that it | works for positive integers as well. | | If the argument is changed to unsigned int then it seems to | be correct and portable because the C standard[1] requires | UINT_MAX to be of the form 2^n - 1. | | So the one with signed argument is UB for positive integers | but happens to work by accident (but the nasal demons may | still lurk just around the corner, e.g. the compiler may also | assume that it's never called with a positive argument which | may affect the compilation of callers). The one with unsigned | argument is correct because unsigned overflow is well-defined | to wrap around to zero, combined with the restrictions on max | unsigned int. | | [1]: From C17 6.2.6.2: | | For unsigned integer types other than unsigned char, the bits | of the object representation shall be divided into two | groups: value bits and padding bits (there need not be any of | the latter). If there are N value bits, each bit shall | represent a different power of 2 between 1 and 2 _(N-1), so | that objects of that type shall be capable of representing | values from 0 to 2_ (N-1) using a pure binary representation; | this shall be known as the value representation. | carnitine wrote: | What's the UB? | [deleted] | whimsicalism wrote: | Calling it on any positive int causes signed integer | overflow, which is UB. | nawgz wrote: | signed integer overflow for positive N | edflsafoiewq wrote: | It's only UB for positive n. For negative n, it is an is-even | test, so it can't be optimized to return true. | whimsicalism wrote: | Oh yeah - negative numbers exist, doh! | joosters wrote: | Isn't the unsigned version the correct one? Presumably the int | version will hit the same problem as in the original article, | in that the C spec says that ints are not treated as | overflowing? (i.e. the compiler cannot assume that adding one | will eventually turn negative and then later on reach 0) | BeeOnRope wrote: | Well the int version returns the correct answer for "empty()" | unlike unsigned. The size of a list of 2^32-1 elements should | not be reported as zero, nor as "empty". | | In this case, the compiler has to enforce wrap semantics for | unsigned, which is an applicaton-level level bug if it | occurs, and doesn't need to enforce it for int, which happens | to lead to the right answer for empty (i.e., it checks if it | has at least one element, which continues to return the | correct "not empty" answer for 2^32-1 or 2^31-1 elements). | | Both get the wrong answer for size() (over different | domains), if you allow lists that large: they can't even | represent very large lists. | DSMan195276 wrote: | Yes that's right - unsigned is defined to wrap, but ints have | undefined behavior if you add to the max value (or do other | silly things with them). Thus the int version given could be | optimized to do whatever when a positive integer is passed, | since it only doesn't have UB when negatives or zero are | passed. | TheCoelacanth wrote: | > the compiler cannot assume that adding one will eventually | turn negative and then later on reach 0 | | Signed overflow is undefined behavior, so it is allowed to | assume that. It's also allowed to assume anything else that | it wants to. | | Once you cause a signed overflow, all guarantees are out the | window and the compiler can do anything it wants to. | spatulon wrote: | Is the LLVM optimisation pass that does this hyper-specialised | for this one particular trick, and, if so, is it valuable in | practice? | | Here's an example where (I think) the two functions are | equivalent, so I'm a little disappointed it couldn't make the | same optimisation on the second one. (At least it still | optimised the tail call to a jump.) | | https://godbolt.org/z/K9rMajsjK | Someone wrote: | No, it is very generic. See | https://kristerw.blogspot.com/2019/04/how-llvm-optimizes- | geo..., https://blog.matthieud.me/2020/exploring-clang-llvm- | optimiza... (discussed at | https://news.ycombinator.com/item?id=28207207) | | (I remember reading a page with more examples, but cannot | find it) | papito wrote: | In a Wooooorld, where newer generation of developers does not | grasp the concept of database indexes, and network latency is | assumed to be zero, one man..... ___________________________________________________________________ (page generated 2021-10-28 23:00 UTC)