[HN Gopher] Advancing the state of the art for std:unordered_map... ___________________________________________________________________ Advancing the state of the art for std:unordered_map implementations Author : fanf2 Score : 35 points Date : 2022-06-19 18:12 UTC (4 hours ago) (HTM) web link (bannalia.blogspot.com) (TXT) w3m dump (bannalia.blogspot.com) | 0x23 wrote: | There also has been a lot research lately in the field. For | example, [1,2] has a lot of new methods on concurrent or space | efficient hash tables. | | [1] http://algo2.iti.kit.edu/maier.php | | [2] https://github.com/TooBiased/growt | syspec wrote: | Really interesting post, love reading these kind of details into | how collection types I take for granted work, and are iterated on | kccqzy wrote: | > Back in the day, open-addressing approaches were not regarded | as sufficiently mature, so closed addressing was taken as the | safe implementation of choice. | | I always chuckle when I read statements like this because it | didn't match my own experience as a learner. When I learned | computer science, I was taught hash tables right after arrays, | but before linked lists or even pointers. So open addressing had | always been the default for hash tables for me. I'm curious to | hear what other people's learning journeys are like. | RicoElectrico wrote: | I lack experience to tell - does anybody use chaining in | production? | zeroonetwothree wrote: | unordered_map uses chaining | masklinn wrote: | IIRC php and go use separate chaining (unless php switched | while I wasn't looking). | ghusbands wrote: | It is conceptually simpler, especially once deletion is | involved, and is the more common implementation. | OskarS wrote: | The fact that the STL api leaks these kind of implementation | details is such a shame. Multiplied globally over all users of | unordered_map, the amount of CPU time that has been wasted | because of that decision boggles the mind. Has to be one of the | biggest library design blunders in the history of the industry | (up there with C strings being null-terminated). They should just | add a new map to the standard with a better API. | BeetleB wrote: | > The fact that the STL api leaks these kind of implementation | details is such a shame. | | There were good reasons for it to do so. C++ is often used for | performance reasons, and requiring low level access was a means | to prevent some implementation being standards compliant yet | with horrible performance. | | I once had a coworker who had spent some time on the standards | committee. He said they would dive into the technical details | and tradeoffs to an insane level (utilizing journal papers, | theses, and real world benchmarks). He often would quip that | the amount of times some coworker came to him with a claim that | they have a better implementation than the standard's and were | correct, was zero. Sure, if you're a Boost developer, you may | have the chops to improve on the standard, but you really need | to be at that level. | | The other issue, of course, is that when you go into such | detail, then yes - the standard is a bit brittle. The solution | is for the standard to introduce another unordered_map (with a | different name) that allows for open addressing to the | standard, and keep the old one for backwards compatibility. | Anyone who's spent time with the algorithms library knows that | this is a common approach: Adding new functions to improve | deficiencies in the previous ones. | | Anti-disclaimer: I hate C++. Not a fan. | forrestthewoods wrote: | C++ STL design is broken. The language shouldn't define an API | for std, it should provide an implementation. AFAIK no other | major language defines the std library the way C++ does. It's | so bad and problematic. | MauranKilom wrote: | Sorry, can you elaborate? Some things in the STL require | compiler support, so you can't really "provide" an | implementation in the narrow sense. | | Also, how does providing a concrete/reference implementation | prevent the API problems? Hyrum's law [0] means that you | would make the situation _worse_ because now people will | depend on every detail you can think of. And if you want to | improve the implementation in a new version, you now are | (most likely) breaking ABI on _every_ platform at once, | instead of giving different platforms the ability to find | compromises as needed. Just look at the recent "no ABI break | in C++23" discussions to see how contentious that topic is. | | [0]: https://www.hyrumslaw.com/ | forrestthewoods wrote: | STL should "just" be a library written against the language | spec. The fact that there are so many different, unrelated | implementations is crazy. | | The STL spec is so over designed that they strongly imply, | if not mandate, a particular implementation. But they don't | actually provide an implementation so every platform has to | implement one. However those implementations select | different trade-offs which means different behaviors so | actually developers wind up writing their own version that | is consistent. It's basically the worst outcome. | olliej wrote: | I am always sad that the default STL collections leak so much | implementation detail that they are stuck on such abysmally | performing algorithms. | | That every serious project ends up with its own hashmap, etc is | such a shame. | usefulcat wrote: | IME, if you want a fast hash map, the very first thing you should | do is ignore anything that uses closed addressing. If you really | d I need one of the extra guarantees provided by | std::unordered_map, see if you can't get it by using a container | of unique_ptr<T> or shared_ptr<T> instead of unordered_map<T>. ___________________________________________________________________ (page generated 2022-06-19 23:00 UTC)