[HN Gopher] Fleur - A bloom filter implementation in C ___________________________________________________________________ Fleur - A bloom filter implementation in C Author : adulau Score : 50 points Date : 2022-07-26 12:13 UTC (1 days ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | lambdaloop wrote: | I love that name! Based on the metrics at the bottom, is it right | that it's 3 times faster than the Go implementation? What makes | it so fast? | FreakLegion wrote: | Without looking at the implementations I'd guess it's Go's | bounds checking. | drfuchs wrote: | Could be faster still if it didn't calloc/free a "Fingerprint" | buffer each time you add or query an entry. Since its individual | BloomFilters are not thread-safe anyway, it could allocate the | buffer once at BloomFilter creation time and be done with it. | mitchellh wrote: | If you want a networked version (note: not based on Fleur): | https://github.com/armon/bloomd In the source there is a | "libbloom" as well that you could copy directly into your | codebase if you wanted an embedded version. The networked version | has a simple Redis-like text protocol so you can just use | `telnet` to play around with it, too. | | Also written in C, with great performance (no comparison to this | one, I haven't done it), and has been used in production by many | companies for many years (since ~2012 or 2013). | | Just pointing out other implementations if anyone is curious! | llimllib wrote: | Neat! Because it's a thing I do[1], the hash algorithms in use: | | Fleur (and DCSO/bloom and DCSO/flor): fnv | | bloomd: a combination of SpookyHash and murmur[2] | | [1]: https://llimllib.github.io/bloomfilter-tutorial/ (I'll | update it to add bloomd and spooky) | | [2]: | https://github.com/armon/bloomd/blob/23c19a7f5cbb35d7c3d970b... | | [3]: I have a vague recollection of somebody telling me why | combining two hashes in the way bloomd does for k >= 4 is a | good idea but I can't remember - anybody have a good reference | for me to link to? (edit: nvm, I already link the paper on my | page! sheesh) | FreakLegion wrote: | It's in the source, _Less Hashing, Same Performance: Building | a Better Bloom Filter_ : https://www.eecs.harvard.edu/~michae | lm/postscripts/rsa2008.p... | | I've used this trick at scale on network gear and it works | great. | llimllib wrote: | hah yeah I noticed a minute later, the link is right in the | comments, and I'd already linked the paper in my article! I | feel like this is one of those things I re-learn every 5 | years | ascar wrote: | I vaguely remember having read about multiple hashes for | bloom filters in the past, but I have trouble extracting | the actual approach from the linked paper. | | Could you give a summary? The paper is quite mathematical | and seems to lack a clear description of how to actually | use the two hashes without reading in depth. | jstrieb wrote: | I appreciate the Go-style docstring comments in the code! | | For anyone trying to understand Bloom filters who may be | interested in checking out an alternative C implementation | (designed to be compiled to WebAssembly), I'll link mine below. | | I used it to build a Bloom filter of every Hacker News submission | ever for a privacy-preserving browser extension that tells you if | the page you're currently on has been submitted before. | | https://github.com/jstrieb/hackernews-button/blob/master/blo... ___________________________________________________________________ (page generated 2022-07-27 23:00 UTC)