[HN Gopher] A 5x reduction in RAM usage with Zoekt memory optimi... ___________________________________________________________________ A 5x reduction in RAM usage with Zoekt memory optimizations Author : janisz Score : 56 points Date : 2021-08-19 18:38 UTC (4 hours ago) (HTM) web link (about.sourcegraph.com) (TXT) w3m dump (about.sourcegraph.com) | cbm-vic-20 wrote: | Not directly related to the linked content, but "N times less" | never really made that much sense to me. In this case, I'm | guessing "5x reduction" means "80% less"? Or "20% of the previous | usage"? | DistractionRect wrote: | N times less usually just means it uses 1/N. So this would mean | it uses only 20% of the memory it used originally. | rusk wrote: | Yeah it doesn't make sense as the x in 5x I would have | thought repetesents a multiplication rather than (divide) | | It would make more sense for me to say a fifth, or 1/5 | ksec wrote: | Yes it is the marketing speak of 80% less. And generally | speaking works much better than percentage. | nytgop77 wrote: | I saw advertisements "now 20% cheaper!", while price change | (original->new) was 30EUR->25EUR. | | Marketing always finds a way.. 5/30=16.6%, 5/25=20% | tyingq wrote: | _" We went from 1400KB of RAM per repo to 310KB with no | measurable latency changes."_ | | So, not exactly, but close. ~22% of previous usage. | [deleted] | nijaru wrote: | Yes that is correct. It's not always the most intuitive | language. | | 5 times less than 20 would be 4 or (20 * 1/5), the same as 20 | is 5 times more than 4. | cinntaile wrote: | If anyone else but me wonders where the name comes from, Zoekt | means Seek. | | Context: "Zoekt, en gij zult spinazie eten" - Jan Eertink | | ("seek, and ye shall eat spinach" - My primary school teacher) | https://github.com/google/zoekt | Scaevolus wrote: | Author here, ask me anything! :-) | beff_jesos wrote: | Is this available on the self hosted version as well now? I am | getting RAM issues on the AWS hosted cluster. | therealmarv wrote: | Related: The last time I've checked a standard Ubuntu does not | have RAM compression (ZRAM) enabled by default (unlike current | Windows and Mac which have that for years). It helps a lot with | programs like browsers. | klysm wrote: | I understand why it's opt in though, not a clear win in all | cases. | kevincox wrote: | I'm surprised that they are storing Unicode characters instead of | bytes. For example Rust's regex library works on bytes and | unicode patterns are compiled into byte patterns which means that | it doesn't need to worry about unicode and variable length | characters when matching. | | You'd think for code where the vast majority is ASCII this would | be a huge improvement. I guess the downside is that searches for | emoji and other "long" characters would need to look up more | index entries. However I would expect that due to the rarity of | that it would be beneficial overall. | Scaevolus wrote: | The source code itself is stored as UTF-8, but the trigrams | were represented as Unicode codepoints in the index. The last | optimization packs the ASCII trigrams for efficiency. | kevincox wrote: | But that's my point. Why not just store bytes, they you don't | have to worry about packing, it is always packed. | | If I understand correctly they are using 8 bytes for 3 | codepoints. They could instead use 3 bytes for 3 bytes. This | would use significantly less memory and would rarely be less | selective. (If that was a concern they could probably | consider 4-grams instead of trigrams and still use less | memory.) | | This also doesn't precude the splitting into the first 2 | characters and last 1. ___________________________________________________________________ (page generated 2021-08-19 23:00 UTC)