[HN Gopher] Constant Time LFU
       ___________________________________________________________________
        
       Constant Time LFU
        
       Author : lukastyrychtr
       Score  : 19 points
       Date   : 2020-08-27 21:19 UTC (1 hours ago)
        
 (HTM) web link (arpitbhayani.me)
 (TXT) w3m dump (arpitbhayani.me)
        
       | rrobukef wrote:
       | Cache sizes are nowhere near the size where a O(log n) factor is
       | significant. In 1TB ram you can fit at most 2^36 entries (64-bit
       | key to 64-bit value without additional datastructures). A factor
       | of 36 is easily dominated by CPU-cache behaviour. This analysis
       | should not be done with time complexities but with deeper tools.
       | 
       | While binary heaps are notorious for bad CPU-cache behaviour,
       | improvements exist. Thus only benchmarks will convince me.
        
       | NightMKoder wrote:
       | In practice I think cache eviction policies have trended to
       | optimizing _memory_ usage. Storing two pointers per cache entry
       | for LRU on 64-bit CPUs uses 16 bytes of data per entry - which
       | adds up quickly! Especially if you're storing e.g. a like count
       | (so an int64 or float per cache value). This approach to LFU has
       | an even higher overhead.
       | 
       | There's a few cool alternatives. The first and most obvious - use
       | a good dense hash table implementation! For eviction strategies:
       | a random eviction strategy actually does pretty well in the
       | general case and requires no extra memory - great if you're
       | caching an int32->int32 mapping and the cost of recomputing isn't
       | astronomical. You can also use try the CLOCK algorithm which just
       | requires 1 bit of storage per cache entry.
       | 
       | From an abstract perspective, all page replacement algorithms can
       | also be used as cache eviction policies. There's a whole list of
       | these at
       | https://en.wikipedia.org/wiki/Page_replacement_algorithm#Pag... .
        
       | ridiculous_fish wrote:
       | I read it, I will try to summarize the idea with some intuition.
       | 
       | A classic LRU cache has a map from key to value, with the entries
       | of the map arranged in a doubly linked list. When a value is
       | accessed, it is promoted to the front of the list. So the last
       | item in the list is least-recently accessed.
       | 
       | In a LFU cache, you might try the same idea, except each node
       | also has a counter. When an item is accessed, increment its
       | counter, and then swap with previously-higher-counter items to
       | keep the list sorted. In this way an infrequently accessed item
       | does not get promoted.
       | 
       | However this has a problem for certain access patterns. If every
       | item has the same counter value, then accessing the last item in
       | the list will force it to migrate up the whole list - O(N). You
       | could use a heap instead, and get log time, but that's still not
       | the constant time that LRU provides.
       | 
       | What's proposed here is to bucket items by counter. Each bucket
       | has a linked list of entries, and each entry knows its bucket.
       | The buckets themselves are arranged in a linked list. When an
       | item in the N bucket is accessed, it conceptually unhooks itself
       | from the N bucket and adds itself to the N+1 bucket, creating it
       | if necessary.
       | 
       | Incidentally, a tip for writing this sort of cache: make your
       | list circular, starting with a sentinel node (I call it the
       | "mouth" since it eats its tail). This eliminates all the null
       | checks in the linked list manipulation.
        
       ___________________________________________________________________
       (page generated 2020-08-27 23:00 UTC)