[HN Gopher] Fuzzy Joins (Minhash) ___________________________________________________________________ Fuzzy Joins (Minhash) Author : yellowflash Score : 43 points Date : 2022-06-24 04:37 UTC (2 days ago) (HTM) web link (blog.yellowflash.in) (TXT) w3m dump (blog.yellowflash.in) | goldenkey wrote: | Brilliant stuff. Isn't the XORing essentially just equivalent to | a 1-time pad -- which isn't very hash-like. I'd think using a | PRNG [1] with the initial hash value as the seed, to generate | more values, would be more effective. | | https://en.wikipedia.org/wiki/Pseudorandom_number_generator | davesque wrote: | Yeah, I had a similar intuition about that approach possibly | being weak or problematic. However, if my cursory reading of | the algorithm is at all correct, the goal may simply be to | consistently choose another random hash value (a different min | hash) and it may work fine. In other words, the same hash | values would be considered the minimum after XORing with a | random number _and_ they will be different from the initial | hash that was selected as the minimum without XORing. Those are | the behaviors that matter and not so much that the hashes that | result from the XORing are cryptographically secure. ___________________________________________________________________ (page generated 2022-06-26 23:00 UTC)