[HN Gopher] Cache Oblivious Algorithms ___________________________________________________________________ Cache Oblivious Algorithms Author : tardygrade Score : 158 points Date : 2020-06-27 15:31 UTC (7 hours ago) (HTM) web link (jiahai-feng.github.io) (TXT) w3m dump (jiahai-feng.github.io) | CogitoCogito wrote: | I love cache oblivious algorithms in principle, but I feel they | break my ultimate rule: you can't please everyone. At the end of | the day, the restrictions presented by your system are | everything. Without those restrictions you have no problem to | solve. Not taking them into account seems crazy. | | However avoiding details as much as is feasible is very powerful | and should be kept in mind. I feel like simple approaches like | streaming computations (not necessarily cache oblivious) share | many of the same underlying principles. | remcob wrote: | I tried cache-oblivious algorithms for some numerical code, but | found them underwhelming. The cache-oblivious model does not | consider the effects of prefetching and cache line associativity. | Both can make a huge difference. | | They are a great starting point and work well for the RAM/Swap | level of the memory hierarchy, but for the L1/L2 caches manual | tuning still pays off. This unfortunately makes the | implementation non-oblivious and even depended on the specific | CPU used. | diroussel wrote: | Yeah. Thinking of them as a starting point is a good way to | think of the approach. | | I had some code that was slower than I intuitively expected it | to be. I looked around for approaches and came across cache | oblivious algorithms. In my case I was thinking of RAM as a | cache and some of the inputs would fit in RAM and some were | larger than RAM. All inputs and outputs were in network drives. | The final output was 4.5 GB. Using cache obvious algorithms as | a inspiration I managed to get my runtime down from 5 mins to | 20 seconds (numbers are approximate as this was nearly ten | years ago now). | | I was quite pleased. | [deleted] | jwatzman wrote: | I'd not heard about van Emde Boas layout before, and it sounds | super cool! I wish the author had included a simple example | before focussing on the general case -- I find it hard to get an | intuition for the general without a specific example! In | particular, after some searching, I found the diagram at the top- | left of page 4 of https://www.cs.au.dk/~gerth/papers/soda02.pdf | to be immensely helpful in getting my head around the concept. | vld_lzr wrote: | I implemented a bunch of cache oblivious search trees and | benchmarked them against their classical counterparts. The | results showed that while the cache oblivious ones were more | cache efficient, they were slower overall. | | The implementations and benchmarks can be found here if anyone is | curious: https://gitlab.com/VladLazar/cache-oblivious-data- | structures | corysama wrote: | Your readme lists the cache-oblivious data structures you | tested. But, it would be nice to also list what baselines you | tested them against. I'm curious if they were explicitly cache- | aware. | TristanDaCunha wrote: | I think there might be a mistake in this article, or at least | something requiring clarification. | | > For convenience, suppose the binary tree is complete and has | height H=2^K. | | What is K? It's never stated. I'd usually assume H = log N, if N | is the number of nodes in a balanced tree. | DavidSJ wrote: | I think K is just an arbitrary natural number. The author is | assuming the height is a power of 2, and giving the name K to | the log_2 of that height. | whoevercares wrote: | I remember my professor Michael Bender teaching those M/B stuff | and it was interesting (somehow related to origami). Sadly I | pretty much forgot everything now. | legulere wrote: | Don't most CPUs have 64 byte cache lines anyway? | karmakaze wrote: | It's not only about the smallest unit but every cache level | unit size. Imagine if the packed structure was on SSD and | mapped to linear memory addresses. Certainly at the smallest | scale the cache line is what matters. But there's also benefits | of having likely to be accessed data be contiguous at other | scales, e.g. os memory page, SSD i/o unit. | loeg wrote: | All AMD64 anyway. ___________________________________________________________________ (page generated 2020-06-27 23:00 UTC)