[HN Gopher] In C++, is empty() faster than comparing the size wi...
       ___________________________________________________________________
        
       In C++, is empty() faster than comparing the size with zero?
        
       Author : ibobev
       Score  : 93 points
       Date   : 2021-10-27 08:44 UTC (1 days ago)
        
 (HTM) web link (lemire.me)
 (TXT) w3m dump (lemire.me)
        
       | sillycross wrote:
       | > Amazingly, we find that the GCC compilers is able to compile
       | Travis' is_empty C++ function to constant-time code.
       | 
       | It's actually an interesting example where undefined behavior
       | allowed compiler optimization:
       | 
       | (1) dereferencing an invalid pointer is UD
       | 
       | (2) signed integer overflow is UD.
       | 
       | This allows the compiler to assume that the program never crashes
       | and the counter never overflows. The loop is then optimized out
       | knowing that it is read-only thus has no side-effects.
        
         | em3rgent0rdr wrote:
         | I suppose it is a safe assumption that the counter will never
         | overlow _if_ memory space is not large enough to possibly hold
         | the maximum positive integer number of data structs. Though if
         | say was using a smaller int for size than what the hypothetical
         | size that virtual memory could handle, such as a short int in a
         | 32-bit memory system, then that assumption may not be true. But
         | on a 64-bit linux system which only allows up to 128 TiB of
         | virtual address space for individual processes, a 64-bit signed
         | int could be as large as 2^63, which would be larger than the
         | hypothetical maximum size that a 128 TiB virtual memory could
         | reference, so the assumption that the size counter could never
         | overflow would be safe.
        
         | joosters wrote:
         | Maybe I'm splitting hairs, but it's not specifically the
         | _presence_ of  'undefined behavior' that allows the compiler
         | optimization. Instead, it's the language specification. The C
         | spec says that integers cannot be relied upon to overflow. The
         | result of this is that compilers are then free to assume that
         | the program they are compiling has _NO_ undefined behavior in
         | it, and so the optimization is possible.
         | 
         | EDIT: To make it clearer, you could imagine an alternate
         | version of C that aborted the program if an integer overflowed.
         | Then there would be no undefined behavior at all - but the
         | optimization is still possible. It's not the UB that helps us
         | here, it's the language spec telling us what behavior is
         | reliable and what is not.
        
           | adrian_b wrote:
           | You need not imagine an alternate version of C, such a
           | version of C is provided by any decent C compiler.
           | 
           | For example, with gcc you can use either the option "-ftrapv"
           | or the better option "-fsanitize=undefined -fsanitize-
           | undefined-trap-on-error" and the program will abort on
           | integer overflows (with the second option it will also abort
           | for many other error conditions, e.g. access out of bounds).
        
             | joosters wrote:
             | Sure, but I understand why the C spec does not define this
             | - because not all processors can trap these events, at all
             | times and in all situations, without added costs. And C is
             | very much of the opinion that it doesn't want added costs -
             | but it lets you pick and choose what you want, hence the
             | compiler options.
             | 
             | I don't think there's an obvious 'good' and 'bad' choice
             | here.
        
               | admax88qqq wrote:
               | I think the "good" choice is to remove undefined
               | behaviour and accept that C will be slighlty slower on
               | esoteric hardware.
        
             | bregma wrote:
             | > You need not imagine an alternate version of C, such a
             | version of C is provided by any decent C compiler.
             | 
             | Classic misunderstanding of undefined behaviour. In this
             | case it's still undefined behaviour, but the vendor has
             | said "this is what my compiler will do with this particular
             | undefined behaviour under these circumstances". Vendors are
             | allowed to do anything they want when code containing
             | undefined behaviour is submitted to their compiler,
             | including doing something you might consider sane.
        
               | joosters wrote:
               | Saying 'this is what I will do when an int can't hold the
               | assigned value' is _defining_ the behavior, ergo it is no
               | longer undefined!
               | 
               |  _Vendors are allowed to do anything they want when code
               | containing undefined behaviour is submitted to their
               | compiler_
               | 
               | Yes, and his compiler is saying that it will trap these
               | events. Are you suggesting that the compiler will
               | secretly ignore this setting and then go on to do all
               | kinds of insane things?
        
               | bregma wrote:
               | > Saying 'this is what I will do when an int can't hold
               | the assigned value' is defining the behavior, ergo it is
               | no longer undefined!
               | 
               | Defining the behaviour of a particular compiler under
               | specific circumstances does not equate to defining the
               | behaviour of the C language.
        
               | joosters wrote:
               | You are reading words that nobody wrote. No-one claimed
               | that they were redefining the 'C spec' by passing flags
               | to their compiler - the standards board's job is still
               | safe!
        
               | adrian_b wrote:
               | I agree that leaving such important behavior as undefined
               | is a failure for a programming language standard, because
               | the standard fails to guarantee that a program will do
               | the right things independently of what compiler has been
               | used and with what compiler options.
               | 
               | Nevertheless, in such cases it is the responsibility of
               | the programmer to either write programs in which it is
               | guaranteed that events for which the program behavior is
               | undefined will never happen or to choose compiler options
               | to handle such events.
               | 
               | If the programmers do their job right, then the compiler
               | is free to optimize like when the undefined cases do not
               | exist.
               | 
               | Unlike the case with integer overflow, there are cases
               | where the programming language standards correctly leave
               | some behavior as undefined, e.g. the order of evaluation
               | for function arguments. For that kind of undefined
               | behavior the programmer must take care to write only
               | programs whose effects do not depend on it.
               | 
               | Data dependent events like integer overflow cannot always
               | be avoided, so compilers must have options to generate
               | exceptions when they happen.
        
               | umanwizard wrote:
               | The behavior is defined, but by the compiler
               | documentation, not the language standard. Different
               | documents can define different things.
        
           | IshKebab wrote:
           | > To make it clearer, you could imagine an alternate version
           | of C that aborted the program if an integer overflowed. Then
           | there would be no undefined behavior at all - but the
           | optimization is still possible.
           | 
           | The optimisation wouldn't be possible in that case because
           | then the program wouldn't abort when the integer overflowed.
           | It would break the defined behaviour that overflow=abort.
        
           | minitech wrote:
           | > Then there would be no undefined behavior at all - but the
           | optimization is still possible.
           | 
           | But then an optimization could change the defined behavior of
           | aborting to not aborting, which is essentially what undefined
           | behavior means and really bad if you don't treat it exactly
           | like undefined behavior.
        
         | Noughmad wrote:
         | > It's actually an interesting example where undefined behavior
         | allowed compiler optimization
         | 
         | That is literally reason why any behavior is considered
         | undefined. So that the compiler can skip checks to produce
         | better optimized code.
        
           | secondcoming wrote:
           | And so you get an optimal, but broken, program. This helps
           | nobody.
        
       | thamer wrote:
       | It's interesting that size() _has_ to be constant-time. From what
       | I understand of the standard linked from the blog post, it looks
       | like size() being constant-time is a property of Container and
       | not each individual method (SS 23.2.1, page 747).
       | 
       | What does this mean for implementations that want to have these
       | properties but have internal structures that make this inherently
       | difficult? Do they have to recompute a cached size value during
       | other operations, and amortize the cost of size() on insert, for
       | example?
       | 
       | I can see it being tricky in some situations: imagine a
       | collection with expiring data, for example. You might want to
       | prune the expired items in the background, but still have size()
       | return the number of elements currently not expired. How would
       | someone do this while maintaining an exact size() value when the
       | background clean-up will inherently take some time?
        
         | plorkyeran wrote:
         | The standard's requirements for the standard library types
         | don't imply any corresponding requirement for types you define
         | yourself. If you want to write a container which can't
         | reasonably do O(1) size() then you just don't do O(1) size().
         | If such a container was added to the standard library, the
         | standard library's Container requirements will be adjusted to
         | permit that.
        
           | chakkepolja wrote:
           | Exactly, most STL algorithms, for example, can be applied
           | using iterators as well. But anyway if GP has specific
           | requirements iterator invalidation might be a problem too,
           | they wouldn't probably want to constrain their interface to
           | that of STL containers.
        
             | tialaramex wrote:
             | Wait, isn't the _point_ of the STL algorithms that they use
             | iterators?
             | 
             | For sure the iterator over a vector may be very cheap to
             | use, but it is still an iterator, even if you tacitly know
             | it's just pointer arithmetic under the hood.
        
         | brandmeyer wrote:
         | Once upon a time, size() wasn't required to be constant-time.
         | This showed up in some standard library implementations of
         | std::list::size which actually did count all the things.
         | 
         | The painful ABI break at C++11 is a big part of why
         | implementors were so opposed to allowing another ABI break in
         | C++20.
        
           | jeffbee wrote:
           | It's also why other people in the C++ community are opposed
           | to ABIs themselves.
        
             | brandmeyer wrote:
             | The live-at-head model really only works well in a cloud
             | computing or JIT environments. It really shouldn't have
             | been a surprise to find that this model is unpopular
             | outside of those narrow use-cases.
        
               | jcelerier wrote:
               | in my personal experience it also works great for desktop
               | software
        
         | fhrow4484 wrote:
         | > What does this mean for implementations that want to have
         | these properties but have internal structures that make this
         | inherently difficult
         | 
         | I'm not sure I can think of an example of such a data
         | structure?
         | 
         | my understanding is that whatever your data structure is, you
         | can keep a "size_t current_size;" private counter which you
         | atomically increment/decrement on every insertion/deletion.
         | size() is simply:                   size_t size() { return
         | current_size; }
         | 
         | In your background clean-up example structure, if the
         | background thread is removing N elements, it must decrease
         | current_size by N. Atomically! (i.e. remove the element and
         | decrease the current_size using a mutex::lock/unlock)
        
           | thamer wrote:
           | My point was that the background thread would remove elements
           | periodically, by checking something like:
           | if (it->expiry_time < now) erase(it);
           | 
           | Where erase() could certainly remove the item and update the
           | size, but it would remove elements _after_ they have expired.
           | So if size() must return the actual number of unexpired
           | elements _present right now_ , the cached counter you mention
           | would not be up to date.
           | 
           | This is what I meant by "have size() return the number of
           | elements currently not expired". What you're describing is
           | "size returns the number of elements currently not expired OR
           | expired but not yet cleaned up", which is different.
        
           | klyrs wrote:
           | > I'm not sure I can think of an example of such a data
           | structure?
           | 
           | C style strings are the classical example. Computing the size
           | (strlen) takes linear time. Not saying that this is a good
           | idea or anything, but it's extremely common.
           | 
           | One could imagine a binary tree implemented in a similar
           | manner. It costs a little time and memory to keep track of
           | the current size, but it almosy always pays off.
        
           | johntb86 wrote:
           | Some                 std::list::splice(const_iterator pos,
           | list& other, const_iterator first, const_iterator last)
           | 
           | implementations used to be constant-time, since they could
           | just change some pointers in the spliced nodes. Nowadays they
           | take linear time, since they need to count how many nodes are
           | transferred so they can update the sizes correctly.
        
             | ape4 wrote:
             | Interesting, a good case for another std::list<> like
             | std::non_counting_list<>
        
             | jfrunyon wrote:
             | Wouldn't this still be constant-time today? It doesn't need
             | to count the nodes because the other list already has a
             | count of its nodes.
        
               | hermitdev wrote:
               | No, because you'd still need book keeping around the
               | length of [first, last), even if the implementation does
               | under-the-hood O(1) movement of nodes in [first, last)
               | from other to *this. Computing the length of [first,
               | last) is still an O(n) operation for forward iterators.
               | 
               | Maybe you're assuming [first, last) is the entirety of
               | other? For that case, there are overloads for splicing an
               | entire list in O(1) time:                   void splice(
               | const_iterator pos, list& other );         void splice(
               | const_iterator pos, list&& other );
        
               | jfrunyon wrote:
               | This seems like an extremely niche use-case, compared to
               | the prevalence of use-cases needing to check the size of
               | a list.
        
               | MauranKilom wrote:
               | No, because you can splice in the middle. Given just the
               | iterators you simply have no other way than counting.
               | 
               | (The reason why it used to be constant time is that it
               | didn't have to count because it didn't have to compute
               | the new size).
        
         | tialaramex wrote:
         | > What does this mean for implementations that want to have
         | these properties but have internal structures that make this
         | inherently difficult?
         | 
         | Such implementations are de facto forbidden in the STL.
         | 
         | Although its containers have vague names like "forward_list"
         | today the C++ STL's containers are in practice very specific
         | data structures that have the specified properties because
         | other data structures would miss one or more of the
         | requirements.
         | 
         | This suits "stability at all costs" C++ programmers fine.
         | Better that every C++ program is 10% slower than that their
         | buggy library from 1995 has to be fixed to actually do what the
         | standard says rather than whatever worked in 1995.
         | 
         | If you care about this but you like C++ you just don't use the
         | STL. Third parties are free to make containers that use data
         | structures that aren't old enough to stand for President of the
         | United States of America, some of them aren't even old enough
         | to legally _drive_ in the United States of America.
        
       | loup-vaillant wrote:
       | If you're _actually_ at the point where you care about that kind
       | of performance, you may want to shy away from the STL, and use
       | (or write) something simpler instead.
        
         | pansa2 wrote:
         | On the other hand, if you're _not_ at the point where you care
         | about maximum performance, you may want to shy away from C++
         | altogether.
         | 
         | This implies that there are very few situations in which C++
         | with STL is the right choice - which seems contrary to how
         | widely it's actually used.
        
         | TillE wrote:
         | I've always hated this bit of conventional wisdom, which has
         | become less and less applicable over the past couple decades.
         | STL implementations are now quite highly optimized within their
         | spec. With the one little caveat that they're generally
         | optimized for CPU cycles rather than per object memory usage.
         | 
         | But you should _understand_ the performance characteristics of
         | the STL. A std::vector is super fast for basically everything
         | except for dynamic resizing. If performance is a concern, your
         | first approach should be to see if you can avoid reallocation.
         | Often it 's very simple to just preallocate a reasonable
         | maximum, or calculate an exact maximum based on your input.
         | 
         | Finally, if you truly need to do weird stuff with resizing, you
         | can look at other containers or write your own.
        
           | glandium wrote:
           | I thought the conventional wisdom was to avoid the STL if you
           | need consistent performance across platforms, because they
           | all have different pitfalls. Things might be different with
           | libc++, now, though, since you can use it in most cases.
        
       | jb1991 wrote:
       | This was just discussed yesterday also:
       | https://news.ycombinator.com/item?id=29007821
        
       | ufo wrote:
       | The most impressive example I've ever seen of this kind of
       | compiler cleverness is the following function from the
       | "convoluted isEven" competition:                   bool
       | isEven(int n) {             return n == 0 || !isEven(n + 1);
       | }
       | 
       | Clang somehow optimizes it to a constant-time implementation,
       | even if we use "unsigned"! https://gcc.godbolt.org/z/nGbn1T
        
         | whimsicalism wrote:
         | Is this code not relying on UB? Seems like it could/should be
         | optimized to `return true`
        
           | poizan42 wrote:
           | It's UB for positive integers, so the compiler can assume it
           | never happens. It's correct for negative integers, so the
           | compiler just optimizes it for negative integers and ignores
           | the positive ones - in this case with the end result that it
           | works for positive integers as well.
           | 
           | If the argument is changed to unsigned int then it seems to
           | be correct and portable because the C standard[1] requires
           | UINT_MAX to be of the form 2^n - 1.
           | 
           | So the one with signed argument is UB for positive integers
           | but happens to work by accident (but the nasal demons may
           | still lurk just around the corner, e.g. the compiler may also
           | assume that it's never called with a positive argument which
           | may affect the compilation of callers). The one with unsigned
           | argument is correct because unsigned overflow is well-defined
           | to wrap around to zero, combined with the restrictions on max
           | unsigned int.
           | 
           | [1]: From C17 6.2.6.2:
           | 
           | For unsigned integer types other than unsigned char, the bits
           | of the object representation shall be divided into two
           | groups: value bits and padding bits (there need not be any of
           | the latter). If there are N value bits, each bit shall
           | represent a different power of 2 between 1 and 2 _(N-1), so
           | that objects of that type shall be capable of representing
           | values from 0 to 2_ (N-1) using a pure binary representation;
           | this shall be known as the value representation.
        
           | carnitine wrote:
           | What's the UB?
        
             | [deleted]
        
             | whimsicalism wrote:
             | Calling it on any positive int causes signed integer
             | overflow, which is UB.
        
             | nawgz wrote:
             | signed integer overflow for positive N
        
           | edflsafoiewq wrote:
           | It's only UB for positive n. For negative n, it is an is-even
           | test, so it can't be optimized to return true.
        
             | whimsicalism wrote:
             | Oh yeah - negative numbers exist, doh!
        
         | joosters wrote:
         | Isn't the unsigned version the correct one? Presumably the int
         | version will hit the same problem as in the original article,
         | in that the C spec says that ints are not treated as
         | overflowing? (i.e. the compiler cannot assume that adding one
         | will eventually turn negative and then later on reach 0)
        
           | BeeOnRope wrote:
           | Well the int version returns the correct answer for "empty()"
           | unlike unsigned. The size of a list of 2^32-1 elements should
           | not be reported as zero, nor as "empty".
           | 
           | In this case, the compiler has to enforce wrap semantics for
           | unsigned, which is an applicaton-level level bug if it
           | occurs, and doesn't need to enforce it for int, which happens
           | to lead to the right answer for empty (i.e., it checks if it
           | has at least one element, which continues to return the
           | correct "not empty" answer for 2^32-1 or 2^31-1 elements).
           | 
           | Both get the wrong answer for size() (over different
           | domains), if you allow lists that large: they can't even
           | represent very large lists.
        
           | DSMan195276 wrote:
           | Yes that's right - unsigned is defined to wrap, but ints have
           | undefined behavior if you add to the max value (or do other
           | silly things with them). Thus the int version given could be
           | optimized to do whatever when a positive integer is passed,
           | since it only doesn't have UB when negatives or zero are
           | passed.
        
           | TheCoelacanth wrote:
           | > the compiler cannot assume that adding one will eventually
           | turn negative and then later on reach 0
           | 
           | Signed overflow is undefined behavior, so it is allowed to
           | assume that. It's also allowed to assume anything else that
           | it wants to.
           | 
           | Once you cause a signed overflow, all guarantees are out the
           | window and the compiler can do anything it wants to.
        
         | spatulon wrote:
         | Is the LLVM optimisation pass that does this hyper-specialised
         | for this one particular trick, and, if so, is it valuable in
         | practice?
         | 
         | Here's an example where (I think) the two functions are
         | equivalent, so I'm a little disappointed it couldn't make the
         | same optimisation on the second one. (At least it still
         | optimised the tail call to a jump.)
         | 
         | https://godbolt.org/z/K9rMajsjK
        
           | Someone wrote:
           | No, it is very generic. See
           | https://kristerw.blogspot.com/2019/04/how-llvm-optimizes-
           | geo..., https://blog.matthieud.me/2020/exploring-clang-llvm-
           | optimiza... (discussed at
           | https://news.ycombinator.com/item?id=28207207)
           | 
           | (I remember reading a page with more examples, but cannot
           | find it)
        
       | papito wrote:
       | In a Wooooorld, where newer generation of developers does not
       | grasp the concept of database indexes, and network latency is
       | assumed to be zero, one man.....
        
       ___________________________________________________________________
       (page generated 2021-10-28 23:00 UTC)