[HN Gopher] A 5x reduction in RAM usage with Zoekt memory optimi...
       ___________________________________________________________________
        
       A 5x reduction in RAM usage with Zoekt memory optimizations
        
       Author : janisz
       Score  : 56 points
       Date   : 2021-08-19 18:38 UTC (4 hours ago)
        
 (HTM) web link (about.sourcegraph.com)
 (TXT) w3m dump (about.sourcegraph.com)
        
       | cbm-vic-20 wrote:
       | Not directly related to the linked content, but "N times less"
       | never really made that much sense to me. In this case, I'm
       | guessing "5x reduction" means "80% less"? Or "20% of the previous
       | usage"?
        
         | DistractionRect wrote:
         | N times less usually just means it uses 1/N. So this would mean
         | it uses only 20% of the memory it used originally.
        
           | rusk wrote:
           | Yeah it doesn't make sense as the x in 5x I would have
           | thought repetesents a multiplication rather than (divide)
           | 
           | It would make more sense for me to say a fifth, or 1/5
        
         | ksec wrote:
         | Yes it is the marketing speak of 80% less. And generally
         | speaking works much better than percentage.
        
           | nytgop77 wrote:
           | I saw advertisements "now 20% cheaper!", while price change
           | (original->new) was 30EUR->25EUR.
           | 
           | Marketing always finds a way.. 5/30=16.6%, 5/25=20%
        
         | tyingq wrote:
         | _" We went from 1400KB of RAM per repo to 310KB with no
         | measurable latency changes."_
         | 
         | So, not exactly, but close. ~22% of previous usage.
        
         | [deleted]
        
         | nijaru wrote:
         | Yes that is correct. It's not always the most intuitive
         | language.
         | 
         | 5 times less than 20 would be 4 or (20 * 1/5), the same as 20
         | is 5 times more than 4.
        
       | cinntaile wrote:
       | If anyone else but me wonders where the name comes from, Zoekt
       | means Seek.
       | 
       | Context: "Zoekt, en gij zult spinazie eten" - Jan Eertink
       | 
       | ("seek, and ye shall eat spinach" - My primary school teacher)
       | https://github.com/google/zoekt
        
       | Scaevolus wrote:
       | Author here, ask me anything! :-)
        
       | beff_jesos wrote:
       | Is this available on the self hosted version as well now? I am
       | getting RAM issues on the AWS hosted cluster.
        
       | therealmarv wrote:
       | Related: The last time I've checked a standard Ubuntu does not
       | have RAM compression (ZRAM) enabled by default (unlike current
       | Windows and Mac which have that for years). It helps a lot with
       | programs like browsers.
        
         | klysm wrote:
         | I understand why it's opt in though, not a clear win in all
         | cases.
        
       | kevincox wrote:
       | I'm surprised that they are storing Unicode characters instead of
       | bytes. For example Rust's regex library works on bytes and
       | unicode patterns are compiled into byte patterns which means that
       | it doesn't need to worry about unicode and variable length
       | characters when matching.
       | 
       | You'd think for code where the vast majority is ASCII this would
       | be a huge improvement. I guess the downside is that searches for
       | emoji and other "long" characters would need to look up more
       | index entries. However I would expect that due to the rarity of
       | that it would be beneficial overall.
        
         | Scaevolus wrote:
         | The source code itself is stored as UTF-8, but the trigrams
         | were represented as Unicode codepoints in the index. The last
         | optimization packs the ASCII trigrams for efficiency.
        
           | kevincox wrote:
           | But that's my point. Why not just store bytes, they you don't
           | have to worry about packing, it is always packed.
           | 
           | If I understand correctly they are using 8 bytes for 3
           | codepoints. They could instead use 3 bytes for 3 bytes. This
           | would use significantly less memory and would rarely be less
           | selective. (If that was a concern they could probably
           | consider 4-grams instead of trigrams and still use less
           | memory.)
           | 
           | This also doesn't precude the splitting into the first 2
           | characters and last 1.
        
       ___________________________________________________________________
       (page generated 2021-08-19 23:00 UTC)