[HN Gopher] Cache Oblivious Algorithms
       ___________________________________________________________________
        
       Cache Oblivious Algorithms
        
       Author : tardygrade
       Score  : 158 points
       Date   : 2020-06-27 15:31 UTC (7 hours ago)
        
 (HTM) web link (jiahai-feng.github.io)
 (TXT) w3m dump (jiahai-feng.github.io)
        
       | CogitoCogito wrote:
       | I love cache oblivious algorithms in principle, but I feel they
       | break my ultimate rule: you can't please everyone. At the end of
       | the day, the restrictions presented by your system are
       | everything. Without those restrictions you have no problem to
       | solve. Not taking them into account seems crazy.
       | 
       | However avoiding details as much as is feasible is very powerful
       | and should be kept in mind. I feel like simple approaches like
       | streaming computations (not necessarily cache oblivious) share
       | many of the same underlying principles.
        
       | remcob wrote:
       | I tried cache-oblivious algorithms for some numerical code, but
       | found them underwhelming. The cache-oblivious model does not
       | consider the effects of prefetching and cache line associativity.
       | Both can make a huge difference.
       | 
       | They are a great starting point and work well for the RAM/Swap
       | level of the memory hierarchy, but for the L1/L2 caches manual
       | tuning still pays off. This unfortunately makes the
       | implementation non-oblivious and even depended on the specific
       | CPU used.
        
         | diroussel wrote:
         | Yeah. Thinking of them as a starting point is a good way to
         | think of the approach.
         | 
         | I had some code that was slower than I intuitively expected it
         | to be. I looked around for approaches and came across cache
         | oblivious algorithms. In my case I was thinking of RAM as a
         | cache and some of the inputs would fit in RAM and some were
         | larger than RAM. All inputs and outputs were in network drives.
         | The final output was 4.5 GB. Using cache obvious algorithms as
         | a inspiration I managed to get my runtime down from 5 mins to
         | 20 seconds (numbers are approximate as this was nearly ten
         | years ago now).
         | 
         | I was quite pleased.
        
       | [deleted]
        
       | jwatzman wrote:
       | I'd not heard about van Emde Boas layout before, and it sounds
       | super cool! I wish the author had included a simple example
       | before focussing on the general case -- I find it hard to get an
       | intuition for the general without a specific example! In
       | particular, after some searching, I found the diagram at the top-
       | left of page 4 of https://www.cs.au.dk/~gerth/papers/soda02.pdf
       | to be immensely helpful in getting my head around the concept.
        
       | vld_lzr wrote:
       | I implemented a bunch of cache oblivious search trees and
       | benchmarked them against their classical counterparts. The
       | results showed that while the cache oblivious ones were more
       | cache efficient, they were slower overall.
       | 
       | The implementations and benchmarks can be found here if anyone is
       | curious: https://gitlab.com/VladLazar/cache-oblivious-data-
       | structures
        
         | corysama wrote:
         | Your readme lists the cache-oblivious data structures you
         | tested. But, it would be nice to also list what baselines you
         | tested them against. I'm curious if they were explicitly cache-
         | aware.
        
       | TristanDaCunha wrote:
       | I think there might be a mistake in this article, or at least
       | something requiring clarification.
       | 
       | > For convenience, suppose the binary tree is complete and has
       | height H=2^K.
       | 
       | What is K? It's never stated. I'd usually assume H = log N, if N
       | is the number of nodes in a balanced tree.
        
         | DavidSJ wrote:
         | I think K is just an arbitrary natural number. The author is
         | assuming the height is a power of 2, and giving the name K to
         | the log_2 of that height.
        
       | whoevercares wrote:
       | I remember my professor Michael Bender teaching those M/B stuff
       | and it was interesting (somehow related to origami). Sadly I
       | pretty much forgot everything now.
        
       | legulere wrote:
       | Don't most CPUs have 64 byte cache lines anyway?
        
         | karmakaze wrote:
         | It's not only about the smallest unit but every cache level
         | unit size. Imagine if the packed structure was on SSD and
         | mapped to linear memory addresses. Certainly at the smallest
         | scale the cache line is what matters. But there's also benefits
         | of having likely to be accessed data be contiguous at other
         | scales, e.g. os memory page, SSD i/o unit.
        
         | loeg wrote:
         | All AMD64 anyway.
        
       ___________________________________________________________________
       (page generated 2020-06-27 23:00 UTC)