[HN Gopher] Fleur - A bloom filter implementation in C
       ___________________________________________________________________
        
       Fleur - A bloom filter implementation in C
        
       Author : adulau
       Score  : 50 points
       Date   : 2022-07-26 12:13 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | lambdaloop wrote:
       | I love that name! Based on the metrics at the bottom, is it right
       | that it's 3 times faster than the Go implementation? What makes
       | it so fast?
        
         | FreakLegion wrote:
         | Without looking at the implementations I'd guess it's Go's
         | bounds checking.
        
       | drfuchs wrote:
       | Could be faster still if it didn't calloc/free a "Fingerprint"
       | buffer each time you add or query an entry. Since its individual
       | BloomFilters are not thread-safe anyway, it could allocate the
       | buffer once at BloomFilter creation time and be done with it.
        
       | mitchellh wrote:
       | If you want a networked version (note: not based on Fleur):
       | https://github.com/armon/bloomd In the source there is a
       | "libbloom" as well that you could copy directly into your
       | codebase if you wanted an embedded version. The networked version
       | has a simple Redis-like text protocol so you can just use
       | `telnet` to play around with it, too.
       | 
       | Also written in C, with great performance (no comparison to this
       | one, I haven't done it), and has been used in production by many
       | companies for many years (since ~2012 or 2013).
       | 
       | Just pointing out other implementations if anyone is curious!
        
         | llimllib wrote:
         | Neat! Because it's a thing I do[1], the hash algorithms in use:
         | 
         | Fleur (and DCSO/bloom and DCSO/flor): fnv
         | 
         | bloomd: a combination of SpookyHash and murmur[2]
         | 
         | [1]: https://llimllib.github.io/bloomfilter-tutorial/ (I'll
         | update it to add bloomd and spooky)
         | 
         | [2]:
         | https://github.com/armon/bloomd/blob/23c19a7f5cbb35d7c3d970b...
         | 
         | [3]: I have a vague recollection of somebody telling me why
         | combining two hashes in the way bloomd does for k >= 4 is a
         | good idea but I can't remember - anybody have a good reference
         | for me to link to? (edit: nvm, I already link the paper on my
         | page! sheesh)
        
           | FreakLegion wrote:
           | It's in the source, _Less Hashing, Same Performance: Building
           | a Better Bloom Filter_ : https://www.eecs.harvard.edu/~michae
           | lm/postscripts/rsa2008.p...
           | 
           | I've used this trick at scale on network gear and it works
           | great.
        
             | llimllib wrote:
             | hah yeah I noticed a minute later, the link is right in the
             | comments, and I'd already linked the paper in my article! I
             | feel like this is one of those things I re-learn every 5
             | years
        
             | ascar wrote:
             | I vaguely remember having read about multiple hashes for
             | bloom filters in the past, but I have trouble extracting
             | the actual approach from the linked paper.
             | 
             | Could you give a summary? The paper is quite mathematical
             | and seems to lack a clear description of how to actually
             | use the two hashes without reading in depth.
        
       | jstrieb wrote:
       | I appreciate the Go-style docstring comments in the code!
       | 
       | For anyone trying to understand Bloom filters who may be
       | interested in checking out an alternative C implementation
       | (designed to be compiled to WebAssembly), I'll link mine below.
       | 
       | I used it to build a Bloom filter of every Hacker News submission
       | ever for a privacy-preserving browser extension that tells you if
       | the page you're currently on has been submitted before.
       | 
       | https://github.com/jstrieb/hackernews-button/blob/master/blo...
        
       ___________________________________________________________________
       (page generated 2022-07-27 23:00 UTC)