[HN Gopher] A simple hash table in C ___________________________________________________________________ A simple hash table in C Author : todsacerdoti Score : 42 points Date : 2023-06-12 22:58 UTC (1 days ago) (HTM) web link (theleo.zone) (TXT) w3m dump (theleo.zone) | samsquire wrote: | Thank you for this article. | | I have been working on trying to design a contiguous hash table | data structure that has these two mutual requirements: | | a) allows deep clones (due to contiguous memory copyable with | memcpy) | | b) supports arbitrarily nestable hash tables | | The closest library I know about is smolworld ( | https://github.com/snej/smol_world ) but I don't know how easy it | can be cloned. | | These requirements rules out multiple mallocs: I do one single | malloc and expect that to be enough for the entire hashmap. | | I'm not sure if these constraints might also force a fixed number | of buckets and a fixed capacity. | | The clonable property requires interior mutability be limited, | because if you memcpy pointers they would cause structure | sharing, that I'm trying to avoid, hence a deep clone. | | Rationale: these are the use cases I have for such a data | structure. The first is cheap copy on write. I also have a left- | right concurrency control hashmap, but I feel the properties of | this data structure are even better. The first is a sharding in | multithreading design: do a cheap memcpy for each thread and | shard the processing and merge at the end, without any | synchronization cost while the threads are working. I know that | deep cloning is slow in Java and presumably C if you do it with | loops rather than memcpy. Another use case is efficient | serialization for network. | | In other words, a protobuf but easily deep clonable without | loops. | teo_zero wrote: | If you pre-allocate a large enough buffer, you can keep a | pointer to mark the boundary between the part where you have | already stored data and the part still empty. Then re-implement | malloc() to just move such pointer. Of course you must check | for running out of allocated memory, etc. This simple approach | assumes that you seldom delete entries, because it never | reclaims the room occupied by deleted entries. | jmacjmac wrote: | Nice article! I think eventually you need to move to macros to | support multiple key/value types in C. Just leaving some macro | implementations for reference: | | - https://github.com/attractivechaos/klib/blob/master/khash.h | | - https://github.com/tezc/sc/tree/master/map | | - https://github.com/troydhanson/uthash | hot_gril wrote: | I think uthash is the one I used long ago. Fun and easy. | coffeeri wrote: | I've had lately a look at QEMUs internals and saw their thread | safe implementation of a hash table, capable of concurrent reads: | qht [0]. | | If the author sees this, you might want to take a look at it. | | [0] https://github.com/qemu/qemu/blob/master/util/qht.c | torstenvl wrote: | OP doesn't include a license, but OP author's GitHub is mostly | MIT or CC-BY. | | Since QEMU's implementation is GPL, it's probably best if OP | author does _not_ take a look at it. | Retr0id wrote: | That's not how software licenses work. | zabzonk wrote: | if the initial array is going to be of fixed size, why not: | typedef struct HashTable { Entry *buckets[4]; | int nBuckets; } HashTable; | | which removes one layer of dynamic memory management. | the-smug-one wrote: | >resize the buckets array once it gets too full | | Because of that, basically. | zabzonk wrote: | i don't see where in his code this is done. possibly me being | stupid. | the-smug-one wrote: | Oh sorry, it's from "future improvements" :), I assumed | that was the train of thought. You're entirely correct, | however. | jandrese wrote: | I was surprised recently when looking at different hash tables | that have been implemented in C to discover that the standard | library includes its own hash table. They are even part of POSIX. | There is a reason you have never heard of it, or if you have you | have never used it. In true POSIX fashion they are close to | useless. The implementation doesn't allow you to modify the table | after it has been created, you have to pass in all the data when | you create the table. There is no add or delete and you can't | change any value you pull from the table. It also stores the data | in a static area local to the function so you can only use a | single table in a program at a time. It doggedly commits every C | stdlib sin possible. | zabzonk wrote: | there are no hash tables in the Standard C Library - POSIX does | not define Standard C. | | more info: https://stackoverflow.com/questions/6118539/why-are- | there-no... | dvh wrote: | 1. Implement simple hash table in C | | 2. Implement bloom filter | | 3. Download list of Bitcoin wallets with non-zero amount of BTC | | 4. Generate long list of random key pairs | | 5. ??? | | 6. Profit | version_five wrote: | I was looking for something like this a few months ago and found | this which also may be of interest: | | https://news.ycombinator.com/item?id=26590234 | | How to implement a hash table in C (benhoyt.com) 302 points by | benhoyt on March 26, 2021, 158 comments | benhoyt wrote: | Thanks for sharing! I do like the OP's article, but the more | the merrier! My "How to implement a hash table in C" article | [1] is for whatever reason one of my most popular articles. If | only C had a slightly better standard library. :-) | | In my article I use "open addressing" [2] instead of linked | lists for collisions, as 1) it tends to be simpler, as you only | have one data structure to manage (array, rather than array and | linked list), and 2) it tends to be faster, as linked lists are | slow on modern CPUs because they require jumping around in | memory. | | [1] https://benhoyt.com/writings/hash-table-in-c/ | | [2] https://en.wikipedia.org/wiki/Open_addressing | version_five wrote: | Hey, thanks for writing it up! Like I said, I was looking for | exactly this a few months ago, and between the explanation | and the code it was a big help. | [deleted] | the-smug-one wrote: | This hashtable sure mallocs a lot! | | Look here for example: | | Entry *newVal = malloc(sizeof(Entry)); newVal->key = strdup(key); | | Why not: | | Entry* e = malloc(sizeof(Entry) + strlen(key)); | | Now you can also skip the pointer to the key, saving 8 bytes for | each entry. Also, just return the hashtable by value, it's so | small. | | I think that if we're coding in C, C++ or Zig, then it can be fun | to actually use the lower level power to our advantage :). | aloisklink wrote: | Even better, you can use [C99 Flexible array members][1], e.g. | something like: struct KeyedEntry { | struct Entry entry; char key[]; // flexible array | member }; | | It's not too much more useful when just using `char`, but for | other data types, it's a bit cleaner, since it handles | alignment/padding better. | | [1]: https://en.wikipedia.org/wiki/Flexible_array_member, or | https://beej.us/guide/bgc/html/split/structs-ii-more-fun-wit... | | _Edit_: Something like this might not be suitable for the | author, since they did mention how they wanted each `struct` to | have a fixed size. ___________________________________________________________________ (page generated 2023-06-13 23:01 UTC)