[HN Gopher] Advancing the state of the art for std:unordered_map...
       ___________________________________________________________________
        
       Advancing the state of the art for std:unordered_map
       implementations
        
       Author : fanf2
       Score  : 35 points
       Date   : 2022-06-19 18:12 UTC (4 hours ago)
        
 (HTM) web link (bannalia.blogspot.com)
 (TXT) w3m dump (bannalia.blogspot.com)
        
       | 0x23 wrote:
       | There also has been a lot research lately in the field. For
       | example, [1,2] has a lot of new methods on concurrent or space
       | efficient hash tables.
       | 
       | [1] http://algo2.iti.kit.edu/maier.php
       | 
       | [2] https://github.com/TooBiased/growt
        
       | syspec wrote:
       | Really interesting post, love reading these kind of details into
       | how collection types I take for granted work, and are iterated on
        
       | kccqzy wrote:
       | > Back in the day, open-addressing approaches were not regarded
       | as sufficiently mature, so closed addressing was taken as the
       | safe implementation of choice.
       | 
       | I always chuckle when I read statements like this because it
       | didn't match my own experience as a learner. When I learned
       | computer science, I was taught hash tables right after arrays,
       | but before linked lists or even pointers. So open addressing had
       | always been the default for hash tables for me. I'm curious to
       | hear what other people's learning journeys are like.
        
         | RicoElectrico wrote:
         | I lack experience to tell - does anybody use chaining in
         | production?
        
           | zeroonetwothree wrote:
           | unordered_map uses chaining
        
           | masklinn wrote:
           | IIRC php and go use separate chaining (unless php switched
           | while I wasn't looking).
        
           | ghusbands wrote:
           | It is conceptually simpler, especially once deletion is
           | involved, and is the more common implementation.
        
       | OskarS wrote:
       | The fact that the STL api leaks these kind of implementation
       | details is such a shame. Multiplied globally over all users of
       | unordered_map, the amount of CPU time that has been wasted
       | because of that decision boggles the mind. Has to be one of the
       | biggest library design blunders in the history of the industry
       | (up there with C strings being null-terminated). They should just
       | add a new map to the standard with a better API.
        
         | BeetleB wrote:
         | > The fact that the STL api leaks these kind of implementation
         | details is such a shame.
         | 
         | There were good reasons for it to do so. C++ is often used for
         | performance reasons, and requiring low level access was a means
         | to prevent some implementation being standards compliant yet
         | with horrible performance.
         | 
         | I once had a coworker who had spent some time on the standards
         | committee. He said they would dive into the technical details
         | and tradeoffs to an insane level (utilizing journal papers,
         | theses, and real world benchmarks). He often would quip that
         | the amount of times some coworker came to him with a claim that
         | they have a better implementation than the standard's and were
         | correct, was zero. Sure, if you're a Boost developer, you may
         | have the chops to improve on the standard, but you really need
         | to be at that level.
         | 
         | The other issue, of course, is that when you go into such
         | detail, then yes - the standard is a bit brittle. The solution
         | is for the standard to introduce another unordered_map (with a
         | different name) that allows for open addressing to the
         | standard, and keep the old one for backwards compatibility.
         | Anyone who's spent time with the algorithms library knows that
         | this is a common approach: Adding new functions to improve
         | deficiencies in the previous ones.
         | 
         | Anti-disclaimer: I hate C++. Not a fan.
        
         | forrestthewoods wrote:
         | C++ STL design is broken. The language shouldn't define an API
         | for std, it should provide an implementation. AFAIK no other
         | major language defines the std library the way C++ does. It's
         | so bad and problematic.
        
           | MauranKilom wrote:
           | Sorry, can you elaborate? Some things in the STL require
           | compiler support, so you can't really "provide" an
           | implementation in the narrow sense.
           | 
           | Also, how does providing a concrete/reference implementation
           | prevent the API problems? Hyrum's law [0] means that you
           | would make the situation _worse_ because now people will
           | depend on every detail you can think of. And if you want to
           | improve the implementation in a new version, you now are
           | (most likely) breaking ABI on _every_ platform at once,
           | instead of giving different platforms the ability to find
           | compromises as needed. Just look at the recent  "no ABI break
           | in C++23" discussions to see how contentious that topic is.
           | 
           | [0]: https://www.hyrumslaw.com/
        
             | forrestthewoods wrote:
             | STL should "just" be a library written against the language
             | spec. The fact that there are so many different, unrelated
             | implementations is crazy.
             | 
             | The STL spec is so over designed that they strongly imply,
             | if not mandate, a particular implementation. But they don't
             | actually provide an implementation so every platform has to
             | implement one. However those implementations select
             | different trade-offs which means different behaviors so
             | actually developers wind up writing their own version that
             | is consistent. It's basically the worst outcome.
        
       | olliej wrote:
       | I am always sad that the default STL collections leak so much
       | implementation detail that they are stuck on such abysmally
       | performing algorithms.
       | 
       | That every serious project ends up with its own hashmap, etc is
       | such a shame.
        
       | usefulcat wrote:
       | IME, if you want a fast hash map, the very first thing you should
       | do is ignore anything that uses closed addressing. If you really
       | d I need one of the extra guarantees provided by
       | std::unordered_map, see if you can't get it by using a container
       | of unique_ptr<T> or shared_ptr<T> instead of unordered_map<T>.
        
       ___________________________________________________________________
       (page generated 2022-06-19 23:00 UTC)