[HN Gopher] Fuzzy Joins (Minhash)
       ___________________________________________________________________
        
       Fuzzy Joins (Minhash)
        
       Author : yellowflash
       Score  : 43 points
       Date   : 2022-06-24 04:37 UTC (2 days ago)
        
 (HTM) web link (blog.yellowflash.in)
 (TXT) w3m dump (blog.yellowflash.in)
        
       | goldenkey wrote:
       | Brilliant stuff. Isn't the XORing essentially just equivalent to
       | a 1-time pad -- which isn't very hash-like. I'd think using a
       | PRNG [1] with the initial hash value as the seed, to generate
       | more values, would be more effective.
       | 
       | https://en.wikipedia.org/wiki/Pseudorandom_number_generator
        
         | davesque wrote:
         | Yeah, I had a similar intuition about that approach possibly
         | being weak or problematic. However, if my cursory reading of
         | the algorithm is at all correct, the goal may simply be to
         | consistently choose another random hash value (a different min
         | hash) and it may work fine. In other words, the same hash
         | values would be considered the minimum after XORing with a
         | random number _and_ they will be different from the initial
         | hash that was selected as the minimum without XORing. Those are
         | the behaviors that matter and not so much that the hashes that
         | result from the XORing are cryptographically secure.
        
       ___________________________________________________________________
       (page generated 2022-06-26 23:00 UTC)