[HN Gopher] Constant Time LFU ___________________________________________________________________ Constant Time LFU Author : lukastyrychtr Score : 19 points Date : 2020-08-27 21:19 UTC (1 hours ago) (HTM) web link (arpitbhayani.me) (TXT) w3m dump (arpitbhayani.me) | rrobukef wrote: | Cache sizes are nowhere near the size where a O(log n) factor is | significant. In 1TB ram you can fit at most 2^36 entries (64-bit | key to 64-bit value without additional datastructures). A factor | of 36 is easily dominated by CPU-cache behaviour. This analysis | should not be done with time complexities but with deeper tools. | | While binary heaps are notorious for bad CPU-cache behaviour, | improvements exist. Thus only benchmarks will convince me. | NightMKoder wrote: | In practice I think cache eviction policies have trended to | optimizing _memory_ usage. Storing two pointers per cache entry | for LRU on 64-bit CPUs uses 16 bytes of data per entry - which | adds up quickly! Especially if you're storing e.g. a like count | (so an int64 or float per cache value). This approach to LFU has | an even higher overhead. | | There's a few cool alternatives. The first and most obvious - use | a good dense hash table implementation! For eviction strategies: | a random eviction strategy actually does pretty well in the | general case and requires no extra memory - great if you're | caching an int32->int32 mapping and the cost of recomputing isn't | astronomical. You can also use try the CLOCK algorithm which just | requires 1 bit of storage per cache entry. | | From an abstract perspective, all page replacement algorithms can | also be used as cache eviction policies. There's a whole list of | these at | https://en.wikipedia.org/wiki/Page_replacement_algorithm#Pag... . | ridiculous_fish wrote: | I read it, I will try to summarize the idea with some intuition. | | A classic LRU cache has a map from key to value, with the entries | of the map arranged in a doubly linked list. When a value is | accessed, it is promoted to the front of the list. So the last | item in the list is least-recently accessed. | | In a LFU cache, you might try the same idea, except each node | also has a counter. When an item is accessed, increment its | counter, and then swap with previously-higher-counter items to | keep the list sorted. In this way an infrequently accessed item | does not get promoted. | | However this has a problem for certain access patterns. If every | item has the same counter value, then accessing the last item in | the list will force it to migrate up the whole list - O(N). You | could use a heap instead, and get log time, but that's still not | the constant time that LRU provides. | | What's proposed here is to bucket items by counter. Each bucket | has a linked list of entries, and each entry knows its bucket. | The buckets themselves are arranged in a linked list. When an | item in the N bucket is accessed, it conceptually unhooks itself | from the N bucket and adds itself to the N+1 bucket, creating it | if necessary. | | Incidentally, a tip for writing this sort of cache: make your | list circular, starting with a sentinel node (I call it the | "mouth" since it eats its tail). This eliminates all the null | checks in the linked list manipulation. ___________________________________________________________________ (page generated 2020-08-27 23:00 UTC)