hngopher.com

       [HN Gopher] Meow Hash (2018)
       ___________________________________________________________________
        
       Meow Hash (2018)
        
       Author : DeathArrow
       Score  : 327 points
       Date   : 2021-10-29 14:27 UTC (8 hours ago)
        
 (HTM) web link (mollyrocket.com)
 (TXT) w3m dump (mollyrocket.com)
        
       | Cilvic wrote:
       | I know +1 is against the rules, but I think this deserves a
       | positive comment. I enjoyed reading it, thanks for taking the
       | time and giving back to the ecosystem.
        
       | anonova wrote:
       | A detailed analysis of Meow Hash: https://peter.website/meow-
       | hash-cryptanalysis
       | 
       | It's not the highest of quality hash functions (see the SMHasher
       | benchmarks), but it is fast. A great alternative is XXH3
       | (https://cyan4973.github.io/xxHash/), which has seen far more
       | usage in practice.
        
         | thomasahle wrote:
         | XXH3 has issues too. There are certain inputs, independent of
         | the seed, on which it collides much more often than it should.
         | 
         | On the other hand a hash like UMash guarantees low collisions
         | on any input:
         | https://engineering.backtrace.io/2020-08-24-umash-fast-enoug...
        
         | jdcarter wrote:
         | I'm using XXHash3 for verifying data integrity of large (10+MB)
         | blobs. Very fast and appears to work very well in my testing--
         | it's never missed any bit error I've thrown at it.
         | 
         | Aside: when storing hashes, be sure to store the hash type as
         | well so that you can change it later if needed, e.g.
         | "xxh3-[hash value]". RFC-6920 also has things to say about
         | storing hash types and values, although I haven't seen its
         | format in common use.
        
         | DeathArrow wrote:
         | Useless analysys since the author says it's not a cryptographic
         | hash but useful as a fast hash for change detection.
         | 
         | "we wanted a fast, non-cryptographic hash for use in change
         | detection and deduplication"
         | 
         | >A great alternative is XXH3
         | 
         | Meow Hash is twice as fast.
        
           | iratewizard wrote:
           | Meow hash is also written by people who initially thought
           | SHA-1 was acceptable for large scale change hashing.
        
           | MauranKilom wrote:
           | If you had made it one section into the analysis, you would
           | have seen that _at the time_ MeowHash made certain
           | cryptographic claims that the author set out to disprove.
           | 
           | The readme has since been updated. I didn't check whether any
           | algorithmic changes were made on top, but the discussion of
           | the analysis on github didn't point to a lot of low-hanging
           | fruit.
        
             | [deleted]
        
           | LeoPanthera wrote:
           | It's not useless analysis, because even for non-cryptographic
           | hashes you want the likelihood of any arbitrary hash to be
           | roughly equal. A hash function which "prefers" certain
           | outputs has a far higher probability of collision.
        
           | IncRnd wrote:
           | Don't you think asset planting is an attack against a game's
           | pipeline?
           | 
           | The author of the article's page claims the hash is not
           | cryptographic but actually goes on to make security claims
           | about the hash. People who do not understand cryptography
           | should be careful about making such claims. The author appear
           | to understand this more than your comment demonstrates.
           | 
           | For example, a claim about change detection is a
           | cryptographic claim of detecting preimage attacks. In a
           | threat model, a security professional would determine whether
           | a first preimage or a second preimage attack is what should
           | be guarded in attack scenarios. Then, the professional would
           | help with analysis, determining mitigations, defense in
           | depth, and prioritization of fixing the vulnerabilities
           | exposed by how the hash is used.
           | 
           | A hash cannot be considered standalone. It is the
           | architecture and use-case where the hash's security
           | properties are used to determine what security properties of
           | the application are fulfilled.
           | 
           | So, if the author is correct, which seems to be the case,
           | then meowhash should not be used in a production environment
           | outside of the most simplistic checks. It seems faster for
           | its intended use case to simply check for a single bit
           | difference between two images - no hash required.
        
             | null000 wrote:
             | > it seems faster for its intended use...
             | 
             | But then you have to store the entire before & after
             | locally? That's the entire point of using a hash for change
             | detection.
        
               | IncRnd wrote:
               | > But then you have to store the entire before & after
               | locally?
               | 
               | Yes, there is difference between two (as you say) and
               | there is integrity (modification detection). In the case
               | of comparing new assets in a pipeline to those that were
               | created earlier, it sounds plausible both copies would be
               | present.
               | 
               | > That's the entire point of using a hash for change
               | detection.
               | 
               | This is called integrity protection. Change detection is
               | the incorrect term to use here. Please see what I
               | referenced earlier for first and second preimage.
        
         | ilitirit wrote:
         | I can vouch for xxHash (it does what it says on the can), but
         | I'm really curious to hear from who have experience with meow
         | hash.
        
       | kazinator wrote:
       | > _Because we have to process hundreds of gigabytes of art assets
       | to build game packages, we wanted a fast, non-cryptographic hash
       | for use in change detection and deduplication. We had been using
       | a cryptographic hash (SHA-1), but it was unnecessarily slowing
       | things down._
       | 
       | That's because you were doing a potentially silly thing: hashing
       | every byte.
       | 
       | What does a hash tell you? If two hashes are different, it
       | confirms that the objects are different. If two hashes are the
       | same, they could be the same object or they could be different.
       | 
       | For that purpose, you don't have to hash everything: just hash
       | the areas where objects are likely to differ.
       | 
       | If two 15 gigabyte files are likely to have differences in their
       | first kilobyte, then hashing just first kilobyte (negligibly
       | fast) will work just as well as hashing 15 gigabytes.
       | 
       | If two files are of different length, then they are not the same.
       | How about:
       | 
       | hash(<length> + <first kilobyte> + <last kilobyte>)
       | 
       | For deduplication, a 32 bit CRC could probably be reasonably
       | efficient. No matter what hash you use, you have to verify the
       | collisions. If CRC32 is enough to make false positives rare over
       | your type of data set, it's good enough.
        
         | andruc wrote:
         | "just hash the areas where objects are likely to differ"
         | 
         | Correct me if I'm wrong but given these are art assets,
         | wouldn't "determining where the objects are likely to differ"
         | itself be an expensive/hard problem?
        
           | cb321 wrote:
           | It really depends. The early bytes probably contain things
           | like image dimensions which might discriminate a lot - or not
           | at all. They might contain palette-style color information
           | which might discriminate tons - or not at all. Most things
           | with hash functions depend upon distributions of inputs
           | relative to the function, though.
           | 
           | I think it would be accurate to say "Determining where they
           | are likely to differ is not necessarily hard, but it could
           | be." If it happens to be easy then you can radically change
           | the scale of the problem.
           | 
           | File length alone is a weak hash actively maintained by any
           | file system and so available "for free" (or one stat call).
           | And it can work surprisingly well, as the grandparent alludes
           | to.
           | 
           | EDIT: You actually do not need to even open(2) files that are
           | not part of "same size clusters", never mind hash their data.
           | This alone can really squash the problem size over a more
           | naive approach even if you did feel the need to hash the
           | whole file.
           | 
           | EDIT: And in terms of performance, if this data has to
           | actually be read off a drive then the hash need not go any
           | faster than the drive IO which is probably much, much slower
           | than most candidate hash functions. Even PCIe5.0 NVMes are
           | only like 14 GB/s, SATA buses like 0.75 GB/s, etc.
        
             | kazinator wrote:
             | To avoid the hash being entirely oblivious to bytes beyond
             | a small block at the start, we could take additional
             | samples of the file at exponentially increasing offsets:
             | 
             | Hash the first kilobyte.
             | 
             | Hash the 16th kilobyte
             | 
             | Hash the 256th kilobyte
             | 
             | Hash the 4096th kilobyte (4M)
             | 
             | Hash the 65536th kilobyte (64M)
             | 
             | Hash the 1048576th kilobyte (1G)
             | 
             | Then 16G, 256G, ...
             | 
             | It might be worth including the last kilobyte. (I think I
             | mentioned that.) If the file format has trailing metadata,
             | that can help, especially if there is any sort of checksum
             | in it.
        
               | cb321 wrote:
               | Sure. I agree various sampling strategies are a good
               | design, as long as they don't add up to much IO. Making
               | that a "user parameter" to be customizable seems ideal.
               | What you suggest may make a fine default - if adapted to
               | varying file sizes somehow. E.g., that Nim dups program I
               | mentioned elsewhere could just take a sequence of slices
               | instead of only one..maybe just ignoring out of bounds
               | slices. { And for size clusters where the size < N, screw
               | it - just hash the whole thing, probably. ;-) }
        
         | cb321 wrote:
         | This is roughly how
         | https://github.com/c-blake/cligen/blob/master/examples/dups....
         | can work. There is little question that changing the scale of
         | the problem very much lessons performance sensitivity to the
         | hash.
        
         | pepoluan wrote:
         | CRC32 is actually very slow in modern CPUs. So if you can use
         | an algorithm that is both (1) faster _and_ (2) better quality,
         | why not use that algo instead?
        
           | kazinator wrote:
           | > _CRC32 is actually very slow in modern CPUs_
           | 
           | Dd you read my whole comment? If we are hashing just a few
           | kilobytes of a multi-gigabyte file, it almost certainly
           | doesn't matter.
        
           | midnightclubbed wrote:
           | I wouldn't categorize it as 'very slow', at worse 50% slower
           | than 128bit XXH3 hash (and faster than the 32bit XXH hash).
           | 
           | Modern intel processors have the CRC32C instructions and ARM
           | have both CRC32 and CRC32C. There is a reluctance in some of
           | the hash comparisons to use the hardware CRC implementations
           | for compatibility reasons, but most modern CPUs have them.
           | 
           | The main argument for using a hardware CRC32 (and to some
           | extent a software one too) is simplicity. If your use-case
           | needs decent (but not blinding) performance, then there is
           | something to be said for a hash implementation that can be
           | written in ~10 lines of code. You still have to handle the
           | possibility of collisions but any 32bit hash has that
           | restriction, which is fine for many uses.
        
       | ed25519FUUU wrote:
       | > _The Meow hash is a high-speed hash function named after the
       | character Meow in Meow the Infinite._
       | 
       | As someone who doesn't watch cartoons I'm constantly surprised at
       | the influence cartoons and anime is having on modern tech
       | especially with regards to computer science.
        
         | phtrivier wrote:
         | In this case, the comic Meow the Infinite is actually done by
         | the same game shop behing the meow hash.
        
           | caeril wrote:
           | Not sure you can call MR a "game shop". Pretty sure it's just
           | Casey, with Anna on contract.
        
         | arduinomancer wrote:
         | To add some context Casey is one of the creators of Meow the
         | Infinite
        
       | alecco wrote:
       | (2018)
        
       | amenghra wrote:
       | Would be interesting to compare it to siphash.
        
         | adamgordonbell wrote:
         | Found this in the readme:                   Due to recent
         | discoveries by Peter Schmidt-Nielsen, we have decided to
         | reclassify Meow hash 0.5/calico from level 3 to level 1. This
         | means that we recommend not to use this hash for message
         | authentication codes, or for hash tables in scenarios where
         | collision induced denial-of-service attacks are a concern.
         | For level 3/MAC capabilities consider migrating to SipHash.  If
         | the performance of SipHash is not satisfying, continuing to use
         | Meow 0.5 for hash tables is better than migrating to another
         | fast hash.
        
         | CodesInChaos wrote:
         | Siphash has pretty bad performance on large inputs. From what I
         | remember, even MD5 was faster.
        
           | Comevius wrote:
           | Google made faster Siphash variants, and also HighwayHash
           | that's much faster.
           | 
           | https://github.com/google/highwayhash
        
       | kortex wrote:
       | This is great! Many applications (like dedupe) don't need full
       | crypto guarantees.
       | 
       | If you need something fast, and crypto secure, I recommend
       | checking out Blake3/b3sum. I'm just learning about XXH3 in this
       | thread so I cannot comment on how it stacks up but I love b3sum
       | for fast file hashing.
       | 
       | https://github.com/BLAKE3-team/BLAKE3/tree/master/b3sum
        
       | scriptdevil wrote:
       | I recently used seahash in rust for a similar job. This seems to
       | fall in the same niche.
        
       | buildbot wrote:
       | This could be super useful for deduplicating large photo
       | datasets!
       | 
       | It might be worth changing this link to the github, there's a lot
       | more info on the hash there.
        
         | NKosmatos wrote:
         | Came here to say the same thing :-) Anyone know of a free(ish)
         | software based on a very fast hashing algorith for checking
         | large number of images? My intention is to check for bit-rot
         | and other unforeseen data corruptions between backup disks, I
         | assume many others must have a similar use case.
        
         | bityard wrote:
         | > This could be super useful for deduplicating large photo
         | datasets!
         | 
         | That is indeed the use case mentioned in the article. A hash
         | like this is useful for deduplicating large _file_ datasets but
         | for deduplicating images in particular, you really want a
         | perceptual hash. Because two images files can _look_ identical
         | but have slightly (or wildly) different byte streams. The
         | trade-off is that perceptual hashes are notorious for false
         | positives in large datasets.
        
           | kayamon wrote:
           | I once saw a guy dedupe a folder by batch-converting them to
           | low quality JPEG thumbnails then sorting by size.
        
           | buildbot wrote:
           | Yep, just happy someone was looking into it, I currently use
           | xxHash for this.
           | 
           | Perceptual hashing for photos isn't great in my opinion, for
           | example I have many raw photos that look very similar, the
           | only difference may be a slightly higher shutter speed, or a
           | correction of the camera angle. I'm guessing many perceptual
           | hashes would mark these as duplicate. Maybe that's what many
           | want, it's not what I want personally.
        
             | pepoluan wrote:
             | And what's wrong with xxHash?
             | 
             | According to xxHash's performance comparison page, xxHash
             | is faster than MeowHash.
        
       | da_big_ghey wrote:
       | Seahash came out very well last I have seen in it, any
       | comparison? https://docs.rs/seahash/4.1.0/seahash/
        
         | pepoluan wrote:
         | XXHash author last updated the following page on Aug 10 of this
         | year:
         | 
         | https://github.com/Cyan4973/xxHash/wiki/Performance-comparis...
        
       | wiskinator wrote:
       | I love this website because I went to the link, didn't know what
       | a PWA was, went to the root website and _STILL_ don 't know what
       | a PWA is.
       | 
       | I'm going to try the old trick of saying the wrong thing so
       | someone corrects me.
       | 
       | Clearly a PWA is a Pug With Attitude. Is this a Jonathon and
       | Noodle sponsored app?
        
         | PebblesRox wrote:
         | A PWA is a Progressive Web App.
         | 
         | https://developer.mozilla.org/en-US/docs/Web/Progressive_web...
        
         | bowmessage wrote:
         | I think you may have commented on the wrong article.
         | 
         | (Did you mean to comment on "Publish Your PWAs to the iOS App
         | Store" [0])?
         | 
         | [0] https://news.ycombinator.com/item?id=29040793
        
       | apavlo wrote:
       | SMHasher benchmark numbers are here (including Meow Hash):
       | 
       | https://github.com/rurban/smhasher#readme
        
         | CodesInChaos wrote:
         | The quality column on that page seems dubious, since it lists
         | test failures for secure cryptographic hashes.
        
           | pkhuong wrote:
           | Cryptographic hashes mostly look at collisions for the whole
           | output (or precisely truncated outputs). There can still be
           | patterns or clumping in short ranges of the output bits for
           | certain input families, as long as the full outputs differ.
           | 
           | For example, the identity function is extremely resistant to
           | collisions. However, it doesn't mix the input bits, so it's
           | not a great hash function.
        
       | lmilcin wrote:
       | Now the only issue is, I will be an instant subject of ridicule
       | if my team sees me using something called "Meow Hash".
        
         | 0des wrote:
         | Why?
        
         | failrate wrote:
         | Too bad, there's also the very useful Squirrel3 Noise Hash.
        
         | recursive wrote:
         | Not everyone can do it, but I recommend ignoring ridicule
         | that's not well founded.
        
         | jnwatson wrote:
         | You better not use a VCS that's also a common insult then.
        
       | NelsonMinar wrote:
       | I wish there was a consensus non-cryptographic hash algorithm.
       | Something that hashes arbitrary data to 128 or 256 bit keys with
       | good randomness, few collisions on typical input data, and
       | universal implementation.
       | 
       | Most programmers I know reach for SHA-1 (or if we're being
       | honest, MD5). But a cryptohash is really wasteful if you don't
       | need the cryptographic properties.
       | 
       | Last I looked at this Murmurhash was the closest to widely used
       | with some folks saying Metrohash was better. But that was years
       | ago. Judging by this discussion xxHash is the new hotness?
        
         | bmn__ wrote:
         | > good randomness, few collisions on typical input data, and
         | universal implementation
         | 
         | The relevant standards body NIST has issued SHA-3 in 2015.
         | 
         | Spread the word that programmers should not reach for outdated
         | hash algos. <https://valerieaurora.org/hash.html>
        
           | tptacek wrote:
           | The chart on that page has always squicked me out a bit,
           | because SHA-2 is certainly still "considered strong" by
           | cryptography engineers.
        
         | pepoluan wrote:
         | xxHash is not really "new", it's been available since 2014. At
         | least, that's the oldest Python bindings available on PyPI:
         | 
         | https://pypi.org/project/xxhash/0.0.1/
         | 
         | Sure, as it developed, xxHash variants appear. The latest in
         | the family is XXH3 and it's already being used at least since
         | 2019:
         | 
         | http://fastcompression.blogspot.com/2019/03/presenting-xxh3....
        
         | chronogram wrote:
         | You want to have XXH128 for that right? 128 bit, portable,
         | virtually impossible to collide, and only slightly slower than
         | XXH3 while still way faster than most options.
        
         | pspeter3 wrote:
         | I'm curious too about the differences between Murmurhash and
         | xxHash. I've been using Murmur in my side projects.
        
           | pkhuong wrote:
           | The production of seed-independent collisions for various
           | versions of murmurhash (e.g.,
           | http://emboss.github.io/blog/2012/12/14/breaking-murmur-
           | hash...) motivated siphash. In general, when there's no
           | positive proof of collision bounds, I would assume (un)lucky
           | inputs can collide much more often than usual, even when you
           | change the seed.
           | 
           | The difference between murmurhash and xxhash in that regard
           | is that anyone can download code to generate murmurhash
           | collisions, while code to make xxhash cry is still
           | unknown/secret.
           | 
           | XXH3 is also a lot faster for long strings, and I believe
           | comparable to murmurhash for short inputs.
        
         | tromp wrote:
         | > I wish there was a consensus non-cryptographic hash algorithm
         | 
         | I think Siphash serves that role pretty well. It's one of the
         | most if not the most secure among non-cryptographic hashes, and
         | quite speedy, especially with SIMD implementations. One need
         | only consider alternatives if one wants to trade off a bit of
         | security for even higher speed.
        
           | NelsonMinar wrote:
           | Siphash is more like a crypto has right? According to
           | smhasher its performance is somewhere between a non-crypto
           | hash like xxHash or Meow and something like SHA-1.
        
             | tromp wrote:
             | As its Wikipedia page says:
             | 
             | Although designed for use as a hash function to ensure
             | security, SipHash is fundamentally different from
             | cryptographic hash functions like SHA in that it is only
             | suitable as a message authentication code: a keyed hash
             | function like HMAC. That is, SHA is designed so that it is
             | difficult for an attacker to find two messages X and Y such
             | that SHA(X) = SHA(Y), even though anyone may compute
             | SHA(X). SipHash instead guarantees that, having seen Xi and
             | SipHash(Xi, k), an attacker who does not know the key k
             | cannot find (any information about) k or SipHash(Y, k) for
             | any message Y [?] {Xi} which they have not seen before.
        
       | Traubenfuchs wrote:
       | > we have benchmarked all the ones we could find
       | 
       | Those results would increase the value of this article from "very
       | low" to "medium".
        
         | chalst wrote:
         | The README for xxhash has benchmarks covering fast hashes
         | including Meow:
         | 
         | https://github.com/Cyan4973/xxHash/wiki/Performance-comparis...
        
       | cb321 wrote:
       | I have measured (on a Skylake core) the hash from Daniel Lemire,
       | Owen Kaser, "Faster 64-bit universal hashing using carry-less
       | multiplications, Journal of Cryptographic Engineering (to
       | appear)" at t_cycles = (0.00741 +- 0.00098 cycles/byte) * bytes +
       | (37.43 +- 0.48) which is roughly 135 B/cycle (675 GB/s at 5 GHz)
       | or over 2x faster than the claim here. Admittedly, that 37 cycle
       | fixed cost is probably quite a bit higher, and I don't know how
       | well the bit mixing compares.
       | 
       | Parenthetically, a time equation (like t_cycles = coef * bytes +
       | fixedCost) is a more convenient way to summarize performance of
       | this kind of thing than the constant short string/long string
       | mumble-mumble. Yes, loop unrolling/etc. can create step-ish
       | functions and a linear regression is pretty approximate (but my
       | stderrs are pretty small above), but even so..much less tripping
       | over words.
        
         | pkhuong wrote:
         | umash (https://github.com/backtrace-labs/umash) has a similar
         | structure PH block structure (and similar universality
         | guarantees), but was designed for decent bit mixing (enough to
         | satisfy smhasher, unlike CLHASH, which needs an additional
         | finalizer) with a lower fixed time cost: 22 cycles for a one-
         | byte hash on a 2.5 GHz Xeon 8175M.
         | 
         | I'm not sure how one would use that linear regression. What
         | kind of hardware offers 675 GB/s of memory bandwidth? 140
         | bytes/cycle is easily more than twice the L2 read bandwidth
         | offered by any COTS chip I'm aware of. There are also warm up
         | effects past the fixed cost of setup and finalizers that slow
         | down hashing for short input. For what range of input sizes
         | (and hot/cold cache state) would you say the regression is a
         | useful model?
        
           | cb321 wrote:
           | I was only doing short < 1024 byte strings in L1 (like URI
           | lengths-ish). I'm well aware no memory system on typical HW
           | can deliver that (EDIT: "at large space scale"). The article
           | itself mentions a similar calculation (but at their slower
           | scale) which is why I included it.
           | 
           | And yes, warm up absolutely matters. My numbers are for the
           | best case path which is among the only easy things to even
           | _define_ , IMO. Worst and average are sooooo much
           | harder/assumption-riddled. Then you argue a lot about said
           | assumptions.
           | 
           | I do the minimum times of many runs in an inner loop and the
           | regression on those min-time results. So, the min-loops will
           | warm up everything like branch predictors, etc. I'd suspect
           | the regression model would hold up to usable L1 cache (e.g.
           | 30 KiB or so). Beyond that, it's mostly just measuring the
           | memory system not the hash function..and I'm not sure what
           | the point of that is. Separation of concerns would suggest L1
           | perf is most relevant to the "hash calculation part". (IMO,
           | many benchmarks confuse this.)
           | 
           | And, yeah..modern memory systems suck. Simple back of the
           | envelope would suggest some DIMM-oriented workload would be
           | waiting on the DIMMs like 95% of the time, usually much
           | higher on Xeons with their low single core BW (maybe 98%!).
           | The Meow article sort of alludes to that as well with less
           | detail. Single core Xeon BW is always a huge disappointment.
           | Forces you to go (very) parallel just to saturate DIMMs
           | (EDIT: which in turn cannot saturate even a single L1!).
           | 
           | BTW, it sounds like you know all of this (and maybe more) and
           | I don't mean to suggest otherwise, but I thought some
           | numbers/details might assist other passersby.
           | 
           | { EDIT: Also, the stderr on that slope is ~13%. So, it's
           | possible 135 is really 128 (1.05x lower), the 2 cache
           | line/cycle load limit or something like that. This doesn't
           | mean the 640 GB/s rate number is wrong - it just underscores
           | how mismatched memory systems are at their various scales. }
           | 
           | { EDIT: and yes, 128/16 = 8x faster than meow, not 2x. Oops.
           | }
        
         | pbsd wrote:
         | 1/135 cycles per byte on Skylake is just plain impossible, even
         | if the hash consisted of simply one xor per 32 bytes of input.
         | 
         | The lower bound for CLHASH would be the cost of one carryless
         | multiplication per 16 bytes of input, or in other words 1/16 ~
         | 0.0625 cycles per byte, since you can only issue one such
         | multiplication per cycle.
        
           | cb321 wrote:
           | Feel free to measure it yourself instead of just speculating.
           | (And as mentioned elsewhere, it is probably 1/128.) { EDIT:
           | and I never measured meow - it could be faster than 1/16
           | cycle/byte but poorly assessed; just be sure to use a min
           | loop and small data. }
        
       | invincivlepvt wrote:
       | <a href="https://invincivlepvt.com/search-engine-optimization-
       | jalandh..." rel="nofollow">Search Engine Optimization</a> <a
       | href="https://invincivlepvt.com/search-engine-optimization-
       | jalandh..." rel="nofollow">Search Engine Optimisation</a> <a
       | href="https://invincivlepvt.com/search-engine-optimization-
       | jalandh..." rel="nofollow">Search Engine Optimisation
       | Jalandhar</a> <a href="https://invincivlepvt.com/search-engine-
       | optimization-jalandh..." rel="nofollow">Search Engine
       | Optimization Jalandhar</a> <a
       | href="https://invincivlepvt.com/search-engine-optimization-
       | jalandh..." rel="nofollow">SEO</a> <a
       | href="https://invincivlepvt.com/search-engine-optimization-
       | jalandh..." rel="nofollow">SEO Services</a> <a
       | href="https://invincivlepvt.com/search-engine-optimization-
       | jalandh..." rel="nofollow">SEO Jalandhar</a> <a
       | href="https://invincivlepvt.com/search-engine-optimization-
       | jalandh..." rel="nofollow">SEO Services Jalandhar</a> <a
       | href="https://invincivlepvt.com/search-engine-optimization-
       | jalandh..." rel="nofollow">SEO Services Agency</a> <a
       | href="https://invincivlepvt.com/search-engine-optimization-
       | jalandh..." rel="nofollow">SEO Services Company</a> <a
       | href="https://invincivlepvt.com/search-engine-optimization-
       | jalandh..." rel="nofollow">SEO Services Agency Jalandhar</a> <a
       | href="https://invincivlepvt.com/search-engine-optimization-
       | jalandh..." rel="nofollow">SEO Services Company Jalandhar</a> <a
       | href="https://invincivlepvt.com/search-engine-optimization-
       | jalandh..." rel="nofollow">Best SEO Services Company</a> <a
       | href="https://invincivlepvt.com/search-engine-optimization-
       | jalandh..." rel="nofollow">Best SEO Services</a> <a
       | href="https://invincivlepvt.com/search-engine-optimization-
       | jalandh..." rel="nofollow">Best SEO Services Agency</a> <a
       | href="https://invincivlepvt.com/search-engine-optimization-
       | jalandh..." rel="nofollow">Best Search Engine Optimisation
       | Services</a> <a href="https://invincivlepvt.com/search-engine-
       | optimization-jalandh..." rel="nofollow">Best Search Engine
       | Optimization Services</a> <a
       | href="https://invincivlepvt.com/search-engine-optimization-
       | jalandh..." rel="nofollow">Best Search Engine Optimization
       | Services Jalandhar</a> <a href="https://invincivlepvt.com/search-
       | engine-optimization-jalandh..." rel="nofollow">Best Search Engine
       | Optimisation Services Jalandhar</a> <a
       | href="https://invincivlepvt.com/search-engine-optimization-
       | jalandh..." rel="nofollow">Local SEO</a> <a
       | href="https://invincivlepvt.com/search-engine-optimization-
       | jalandh..." rel="nofollow">Best Local SEO</a> <a
       | href="https://invincivlepvt.com/search-engine-optimization-
       | jalandh..." rel="nofollow">National SEO</a> <a
       | href="https://invincivlepvt.com/search-engine-optimization-
       | jalandh..." rel="nofollow">National SEO</a>
        
       | arduinomancer wrote:
       | A collision was reported a couple months ago
       | 
       | https://github.com/cmuratori/meow_hash/issues/80
        
         | Accacin wrote:
         | The back and forth in that thread is incredibly interesting
         | too.
        
       | cbsmith wrote:
       | As noted in the post title, the original article is from 2018. I
       | had either forgotten about Meow hash or just never knew about it,
       | so I appreciate the post, but I was curious about the timing of
       | the post.
        
         | bmn__ wrote:
         | > I was curious about the timing of the post.
         | 
         | cmuratori mentioned meowhash in passing in a recent (i.e. few
         | days ago) Refterm or optimisation video. It is possible that
         | user DeathArrow watched this, too, and decided to submit the
         | project homepage to HN.
        
       | cateye wrote:
       | I really enjoyed reading this the clear and concise way of
       | explanation as documentation.
       | 
       | While I was totally not interested in a hash function whatsoever,
       | I read the whole thing :)
        
       | rurban wrote:
       | For comparison: https://github.com/rurban/smhasher/
       | 
       | It's indeed pretty fast ( among the top 5), but not particularly
       | good. It fails some basic tests. Rather use xxh3, t1ha or
       | farmhash
        
         | daniel-thompson wrote:
         | Can you elaborate on what it fails?
         | https://mollyrocket.com/meowhash claims:
         | 
         | > It cleanly passes every test in smhasher
        
           | pepoluan wrote:
           | MeowHash page last updated 2018.
           | 
           | smhasher page last updated several days ago.
        
       | [deleted]
        
       | oefrha wrote:
       | Discussed at the time:
       | https://news.ycombinator.com/item?id=18262627
        
         | dang wrote:
         | Thanks! Macroexpanded:
         | 
         |  _Meow Hash: A high-speed non-cryptographic hash function_ -
         | https://news.ycombinator.com/item?id=18262627 - Oct 2018 (68
         | comments)
         | 
         | Also:
         | 
         |  _Cryptanalysis of Meow Hash_ -
         | https://news.ycombinator.com/item?id=27978878 - July 2021 (9
         | comments)
        
       | wolf550e wrote:
       | How does it compare to XXH3?
       | 
       | https://github.com/Cyan4973/xxHash
       | 
       | https://github.com/rurban/smhasher#readme says XXH3 is 16GB/s
       | while meow is 17GB/s.
        
         | caeril wrote:
         | > How does it compare to XXH3?
         | 
         | XX is a slightly better hashing function, meow is slightly
         | faster.
         | 
         | Take your pick. Life is about tradeoffs.
        
           | pepoluan wrote:
           | Well, the page for MeowHash is written in 2018.
           | 
           | xxHash Performance Comparison page is last updated in 2021,
           | and in that page XXH3 is faster than MeowHash:
           | 
           | https://github.com/Cyan4973/xxHash/wiki/Performance-
           | comparis...
        
             | DeathArrow wrote:
             | i7-9700K can reach up to 4.9 Ghz and Meow Hash can hash at
             | a rate of 16 bytes per cycle. That means 78 GB/s.
             | 
             | xxHash rate is 31.5 GB/s on that CPU.
        
               | wolf550e wrote:
               | Scroll down, XXH3 is 59 GB/s using AVX2 while meow is 58
               | GB/s in their benchmark.
        
       | snapetom wrote:
       | I've been using Hashids (https://hashids.org/). It's been fine
       | performance-wise and well supported across languages.
        
         | andruc wrote:
         | Care to share your definition of fine, and give at least a
         | ballpark on the size of the assets you're using it on?
        
           | snapetom wrote:
           | It's a pretty small app for some researchers. Front end is
           | Angular, backend is an API in Bottle. At most, there's 10
           | people generating data, and about 40 people on reading data.
           | Most times are 10 users uploading, and 5-10 users reading.
           | The readers are external partners. Hashid is to obfuscate ids
           | we have in another system so these external people cannot tie
           | the data back due to privacy issues. It generates about 1000
           | rows of data across 4 PostGresql tables in a given day.
           | 
           | I like how the algorithm has been well supported across
           | languages. We have no problems with it on the Python side. An
           | analyst runs monthly reports using R against the data (so
           | about 30k records pulled. Numbers are summarized and stats
           | are generated. The hashing isn't a bottleneck at all.
        
       ___________________________________________________________________
       (page generated 2021-10-29 23:00 UTC)