[HN Gopher] Roaring bitmaps: what they are and how they work
       ___________________________________________________________________
        
       Roaring bitmaps: what they are and how they work
        
       Author : voberoi
       Score  : 122 points
       Date   : 2022-09-02 15:35 UTC (1 days ago)
        
 (HTM) web link (vikramoberoi.com)
 (TXT) w3m dump (vikramoberoi.com)
        
       | celeritascelery wrote:
       | I see this common theme among very fast practical algorithms
       | (like timsort or pdqsort) where there is not some secret math or
       | algorithm, rather they are just a bunch of special cases based on
       | heuristics. Often they involve specific knowledge of the hardware
       | instead of treating software as abstract. To me this is the big
       | difference between "computer science" and "computer engineering".
        
         | coldtea wrote:
         | > _I see this common theme among very fast practical algorithms
         | (like timsort or pdqsort) where there is not some secret math
         | or algorithm, rather they are just a bunch of special cases
         | based on heuristics_
         | 
         | The core here, which is bitmap storage and the basic
         | optimization are simple, but solid and general. Mathematically
         | it takes less space to store data as positions on a bitmap
         | rather than fully spelt associations, and it takes even less
         | space to seggregate the bitmaps in chunks.
         | 
         | This will hold and be smaller and faster in any computer. So,
         | it's not some case of special case based on heuristics, either
         | related to the specific frequencies or sample characteristics
         | of some particular set of data, or of specific CPU
         | peculiarities or whatever.
         | 
         | More exotic optimizations piled on top, sure. But "compressed
         | bitmaps" themselves as a concept, not.
        
         | voberoi wrote:
         | I had the same takeaway when I read these papers. It felt like
         | a game of whack-a-mole.
         | 
         | There's a massive performance benefit to doing this at the cost
         | of implementation complexity. I haven't studied the
         | implementations or tried my hand one, but I get the impression
         | that these are tough to implement correctly in a way that takes
         | full advantage of the hardware.
         | 
         | (In that sense, it's awesome that the researchers also did the
         | legwork to implement and maintain a library!)
        
         | jandrewrogers wrote:
         | Much of this is just threshold effects for algorithm
         | performance, not special cases per se.
         | 
         | You don't even need to have specific knowledge of the hardware
         | as long as you can identify cross-over points where one
         | algorithm starts significantly outperforming others. An old HPC
         | trick is to write software that thoroughly measures several
         | algorithm strategies on your specific hardware environment and
         | then code-gens an algorithm that has an optimal set of
         | strategies and strategy-switching thresholds. The meta-
         | algorithm is mostly focused on cheaply detecting these
         | thresholds at runtime.
        
         | fuckstick wrote:
         | No this is still comp sci. Comp sci doesn't mean that practical
         | considerations in algorithm design are ignored. Computer
         | engineering has to do with the actual design and construction
         | of computing machinery. It's an accredited engineering field
         | and the undergraduate coursework includes electronics,
         | semiconductors, computer architecture and generally a lot of
         | overlap with electrical engineering. I had to take one dumb
         | programming course - all the data structures and algorithms
         | stuff I had to learn elsewhere.
        
       | jerven wrote:
       | I love roaringbitmaps and maybe even over use this library. On
       | the other hand it made a lot of things possible for
       | legacy.uniprot.org over the years that would have been
       | unaffordable otherwise. Quick faceting using permanent filters
       | over 500+ million Documents would have been really hard
       | otherwise.
        
         | voberoi wrote:
         | I'm really curious about this. Are you able to share a bit
         | about your use case and how you use roaring bitmaps outright
         | (vs. through search infra like Solr or ElasticSearch)?
        
       | pvillano wrote:
       | I thought the trick was going to be indirection.
       | 
       | sorted-list/bitmap/runs, in a two level tree. cool.
       | 
       | it's technically missing sorted compliment-list, i.e. only the
       | items that are missing, although worst case runs only use twice
       | as much space, so more complexity without much savings, esp.
       | considering real workloads
       | 
       | performs better with sequential ids than random uuids because you
       | use fewer pages
       | 
       | further research
       | 
       | investigating the effect of adding more layers to the tree
       | 
       | using additional set representations e.g. balanced tree
        
       | nattaylor wrote:
       | I enjoyed this walkthrough. I'm interested in RB because it's
       | used in Lucene but never dug in and assumed (without much
       | thought) that it had to do with consecutive runs of 1s and 0s.
       | Wrong!
        
         | LouisSayers wrote:
         | Compressing consecutive 1's and 0's was the first idea that
         | popped into my head as well. Then when they started to talk
         | about inserting into the 800,000th position my brain went "What
         | if they just reversed the array and stored some metadata as the
         | type of array they're dealing with?"
         | 
         | It's funny that in the end Bitmaps essentially are a modified
         | data structure and process.
         | 
         | The way things are going, I imagine someone will at some point
         | take all the different tricks we have of inserting, sorting,
         | finding, deleting data, take millions of datasets and run some
         | machine learning type process to create libraries that can
         | perform these operations optimally.
        
           | nattaylor wrote:
           | I was thinking a similar thought that 4096 seemed quite magic
           | (although it seems to be chosen based on research) and that
           | RB perf could probably be tuned based on the workload
        
             | nicoburns wrote:
             | I don't think 4096 is arbitrary. It's an array of 16bit
             | integers, and 4096 * 16 = 65536. So 4096 represents the
             | boundary point below which an array of integers uses less
             | memory than a traditional bitmap.
        
               | nattaylor wrote:
               | Oops, thanks for pointing that out
        
       | TacticalCoder wrote:
       | > When a set is sparse, traditional bitmaps compress poorly.
       | 
       | They waste space. But if you were to compress them, they'd
       | compress _very_ well. As in, for example, if you were to compress
       | them using roaring bitmaps.
        
       | nullc wrote:
       | Those who do not know judy1 are doomed to reinvent it.
       | 
       | http://judy.sourceforge.net/
        
         | latchkey wrote:
         | http://nothings.org/computer/judy/
        
         | froh wrote:
         | which possibly is a good thing
         | 
         | the Judy project seems inactive to me:
         | 
         | https://sourceforge.net/p/judy/bugs/
         | 
         | this short HN thread resonates with the intuition of jude1
         | being stalled, it's last optimistic comment points to an
         | orphaned comment in the big list above
         | 
         | https://news.ycombinator.com/item?id=32188204
        
       | Xeoncross wrote:
       | Can someone explain how this is laid out in memory and what it
       | would look like serialized? It's easy to have an array of N size
       | that fits all bits and just write to disk. You can also mmap a
       | file as a large array of bytes and just mutate it and let the OS
       | handle the disk syncs if you're okay with some data loss of
       | recent state changes if the machine crashes.
       | 
       | What would an array of pointers to X containers which are N size
       | each look like on disk?
        
         | extesy wrote:
         | This article has a section titled "how Roaring bitmaps are
         | represented in memory". I suggest you actually read the
         | article, it will likely answer all your questions.
        
       | pvillano wrote:
       | I'm reading this on a runaway with terrible internet and the
       | figures slowly load to reveal... text :/
        
       | superjan wrote:
       | It looks a bit too simplistic: Blocks with up to 4k items are
       | stored as sorted lists. Having to insert in a sorted list up to
       | 4k items seems very expensive to me.
        
         | jasonwatkinspdx wrote:
         | The threshold was chosen empirically. Modern processors are
         | _very_ fast when operating on memory sequentially, as the
         | prefetcher achieves the best case scenario. The cost of a cache
         | line miss is so substantial this ends up dominating things
         | until  "n" is in the 1000s. A lot of programmers' intuition is
         | off the mark on this point.
        
         | jandrewrogers wrote:
         | CPUs have a strong performance bias toward sequential memory
         | access and there are large threshold effects at work here. The
         | block size used is not arbitrary. Improvements in prefetching
         | and cache line utilization can have such large performance
         | benefits that it more than justifies any apparent increase in
         | computational cost because of how the code is organized to
         | obtain those improvements.
         | 
         | Most developers do not have a good intuition for the efficiency
         | and scalability of sequential brute-force on modern
         | architectures. There is a cross-over threshold but I think many
         | people would be surprised at how high it is in practice.
        
         | foota wrote:
         | The 4k items are only 512 bytes though, so expensive yes but
         | not overly so.
        
           | skitter wrote:
           | They are two byte per item; it's the bitmaps (>4k entries)
           | that use 1 bit per item.
        
           | Veserv wrote:
           | No. The person you are replying to is talking about the
           | sparse leaf case where the contained integers are actually
           | being stored (to be more precise the low bits are being
           | stored) in a sorted structure. The switch to the bitmap
           | occurs when the sparse structure takes the same number of
           | bits to directly encode the sparsely contained items. In this
           | case they use a 2^16 bitmap and encode sparse elements using
           | 16-bits, so it occurs at ((2^16 / 16) == 2^12) elements.
        
       ___________________________________________________________________
       (page generated 2022-09-03 23:00 UTC)