[HN Gopher] The BLAKE3 cryptographic hash function
       ___________________________________________________________________
        
       The BLAKE3 cryptographic hash function
        
       Author : erwan
       Score  : 116 points
       Date   : 2020-01-09 17:29 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | loeg wrote:
       | Looks like they've taken _Too Much Crypto_ to heart[1] and
       | dropped the number of rounds from Blake2B 's 12 down to 7 for
       | Blake3:
       | 
       | https://github.com/BLAKE3-team/BLAKE3/blob/master/reference_...
       | 
       | https://github.com/BLAKE2/BLAKE2/blob/master/ref/blake2b-ref...
       | 
       | Which, yeah, that alone will get you a significant improvement
       | over Blake2B. But definitely doesn't account for the huge
       | improvement they're showing. Most of that is the ability to take
       | advantage of AVX512 parallelism, I think. The difference will be
       | more incremental on AVX2-only amd64 or other platforms, I think.
       | 
       | [1]: Well, TMC recommended 8 rounds for Blake2B and 7 for
       | Blake2S.
        
         | zokier wrote:
         | > Looks like they've taken Too Much Crypto to heart
         | 
         | Not surprising considering that one of they is the author of
         | Too Much Crypto
        
       | nabla9 wrote:
       | That's impressive speedup. I just installed it and holy moly it
       | really is fast. All those extra cores can finally get busy. ;)
       | 
       | b3sum -l 256 big-2.6Gfile                 real 0m0.384s
       | user 0m2.302s       sys  0m0.175s
       | 
       | b2sum -l 256 big-2.6Gfile                 rear  0m3.616s
       | user  0m3.360s       sys   0m0.256s
       | 
       | (Intel(r) Core(tm) i7-8550U CPU @ 1.80GHz x 8 )
       | 
       | EDIT: ah, the catch. blake3 targets 128 bit security. It competes
       | with SipHash for speed and security
       | 
       | EDIT2 scratch the previous edit.
        
         | oconnor663 wrote:
         | > ah, the catch. blake3 targets 128 bit security. It competes
         | with SipHash for speed and security.
         | 
         | No no, BLAKE3 is a general-purpose cryptographic hash just like
         | BLAKE2, SHA-2, and SHA-3. The confusion here is that a hash
         | function's security level is half of its output size, because
         | of the birthday problem. BLAKE3, like BLAKE2s and SHA-256, has
         | a 256-bit output and a 128-bit security level. (BLAKE3 also
         | supports extendable output, but that doesn't affect the
         | security level.)
         | 
         | > holy moly it really is fast
         | 
         | Thank you :)
        
           | nabla9 wrote:
           | Thanks for clarifying that.
        
       | cogman10 wrote:
       | Where is this useful?
       | 
       | I'm guessing not for password hashes simply because a fast hash
       | is bad for passwords (makes brute forcing/rainbow tables easier).
       | 
       | So is this mostly just for file signing?
        
         | hadlock wrote:
         | Concurrent filesystems, high availability systems etc
        
         | beefhash wrote:
         | Fast hashes are useful for signing, MACs (symmetric
         | "signatures" so to speak), key derivation (HKDF and all kinds
         | of Diffie-Hellman handshakes come to mind), as part of
         | cryptographically secure PRNGs (though most of the world has
         | moved on to stream ciphers for that instead) and probably more.
         | 
         | While programming, just try to think of a scenario where having
         | a mapping between some kind of arbitrary data (and maybe a key)
         | and a fixed-size, uniformly random-looking output could be
         | useful. Opportunities to sprinkle some hashes on things come up
         | quite often when you look for them.
        
           | bscphil wrote:
           | > as part of cryptographically secure PRNGs (though most of
           | the world has moved on to stream ciphers for that instead)
           | 
           | My understanding is that plenty of stream ciphers _are_ based
           | on hashes. For example each block of the stream can be
           | hash(key + nonce + block counter + constants) that you xor
           | with your plaintext (or don 't, if you just want a CSPRNG).
        
           | lame-robot-hoax wrote:
           | So I'm not super familiar with things like this, but for
           | example, WireGuard uses BLAKE2 for hashing. What level of
           | undertaking would it be to move from BLAKE2 to BLAKE3 in
           | regards to WireGuard? Can you just pop out BLAKE2 and pop in
           | BLAKE3?
        
             | aidenn0 wrote:
             | Assuming wireguard hashes data shorter than 4k (i.e. most
             | network packets), there is no reason to switch; BLAKE3 is
             | only faster than BLAKE2 on data longer than 4k.
        
             | beefhash wrote:
             | The two hashes aren't compatible, so a hash of the same
             | message will yield two different hashes under BLAKE2 and
             | BLAKE3.
             | 
             | As far as I can tell, BLAKE2 has effectively all the
             | properties BLAKE3 has (arbitrary output length, keyed hash
             | mode), so the upgrade for communications boils down to
             | negotiating/determining which of the two hash functions to
             | use over the wire (with all the downsides that come with
             | agility of cryptographic primitives); for stored hashes,
             | they have to be recomputed and replaced (or you could store
             | a flag is BLAKE2/is BLAKE3 and update them as you touch
             | hashes, kind of similar to how password hashes are swapped
             | at login time).
             | 
             | Note that BLAKE3 existing doesn't _break_ BLAKE2. It 's
             | perfectly fine to just keep trucking BLAKE2, it's just that
             | BLAKE3 has better performance characteristics that make it
             | very attractive.
        
         | lisper wrote:
         | Hashes are useful in a wide variety of cryptographic
         | applications, of which file signing is just one example. But
         | you are correct, you would not want to use this by itself for
         | password hashing. But that is true of any cryptographic hash.
         | not just Blake3.
        
         | capableweb wrote:
         | Anywhere where you do hashing of stuff and you'd like it to be
         | faster. Content addressable systems use hashing to generate
         | identifiers for content (where the same content has the same
         | identifier, and different content for sure has different ID).
         | If you have a 2GB file you want a identifier for, using BLAKE3
         | would make that a lot faster.
         | 
         | > Capable of verified streaming and incremental updates, again
         | because it's a Merkle tree.
         | 
         | I don't really understand what this means (verified streaming +
         | incremental updates), could someone clarify? Merkle Trees are
         | simple (https://en.wikipedia.org/wiki/Merkle_tree for people
         | who don't know)
        
           | mauricio wrote:
           | From their BLAKE3 paper (https://github.com/BLAKE3-team/BLAKE
           | 3-specs/blob/master/blak...) they give an example of verified
           | streaming in 6.4.
           | 
           | Basically to verify a video file using serial hash functions
           | you need to download the entire video file before you can
           | perform the hashing. In BLAKE3 you can verify each chunk of
           | the video as it is being streamed because the hash internally
           | is just a Merkle Tree.
        
             | capableweb wrote:
             | Thanks for the link mauricio! That's very interesting and
             | makes a ton of sense. Gonna have a lot of fun playing
             | around with BLAKE3 for different things.
        
             | the8472 wrote:
             | As far as I can tell the crate doesn't expose APIs needed
             | to construct or verify the sibling/uncle hashes that would
             | be used in partial verification.
        
               | bjoli wrote:
               | Just thinking of writing such code myself is enough to
               | get nightmares.
        
               | oconnor663 wrote:
               | Verified streaming is implemented by
               | https://github.com/oconnor663/bao.
        
       | s_tec wrote:
       | It looks like the speedup is coming from two main changes.
       | 
       | The first change is reducing the number of rounds from 10 to 7.
       | Think of it like making a smoothie - you add bits of fruit to the
       | drink (the input data), then pulse the blades to blend it up
       | (making the output hash). This change basically runs the blades
       | for 7 seconds instead of 10 seconds each time they add fruit.
       | They cite evidence that the extra 3 seconds aren't doing much -
       | once the fruit's fully liquid, extra blending doesn't help - but
       | I worry that this reduces the security margin. Maybe those extra
       | 3 rounds aren't useful against current attacks, but they may be
       | useful against unknown future attacks.
       | 
       | The other change they make is to break the input into 1KiB
       | chunks, then hash each chunk independently. Finally, they combine
       | the individual chunk hashes into a single big hash using a binary
       | tree. The benefit is that if you have 4KiB of data, you can use
       | 4-way SIMD instructions to process all four chunks
       | simultaneously. The more data you have, the more parallelism you
       | can unlock, unlike traditional hash functions that process
       | everything sequentially. On the flip side, modern SIMD
       | instructions can handle 2 x 32-bit instructions just as fast as 1
       | x 64-bit instructions, so building the algorithm out of 32-bit
       | arithmetic doesn't cost anything, but gives a big boost to low-
       | end 32-bit CPU's that struggle with 64-bit arithmetic. The tree
       | structure is a big win overall.
        
         | loeg wrote:
         | Minor nitpick: From 12 rounds for Blake2B; 10 is correct for
         | Blake2S.
        
         | zokier wrote:
         | > but I worry that this reduces the security margin. Maybe
         | those extra 3 rounds aren't useful against current attacks, but
         | they may be useful against unknown future attacks.
         | 
         | This was covered in more detail in previous "Too Much Crypto"
         | paper [1], which argued that many standards have excessively
         | high round counts. Note that Aumasson is author of both Blake3
         | and Too Much Crypto
         | 
         | [1] https://news.ycombinator.com/item?id=21917505
        
         | hinkley wrote:
         | Thinking back to Schneier's running commentary on SHA3, using
         | hierarchical hashes was part of a general attempt to increase
         | the internal state space of the hashes. SHA1 exposes all of the
         | bits. So once you get two prefixes with the same hash, adding a
         | common suffix results in the same output hash.
         | 
         | With hashes of hashes, the prefixes have to be the same length,
         | and possibly very short.
        
       | [deleted]
        
       | dpc_pw wrote:
       | So this is bao + blake2?
       | 
       | I remember watching Bao, a general purpose cryptographic tree
       | hash, and perhaps the fastest hash function in the world:
       | https://www.youtube.com/watch?v=Dya9c2DXMqQ a while ago.
       | 
       | Nice job!
        
         | oconnor663 wrote:
         | Yep that's me :) The Bao project evolved into BLAKE3, and the
         | latest version -- which I literally just released -- is now
         | based on BLAKE3.
        
           | Exuma wrote:
           | Hey I like your speaking style, clear, and funny. Nice work.
        
           | dpc_pw wrote:
           | Excuse my confusion. I understand "the Bao project evolved
           | into BLAKE3", but "is now based on BLAKE3" confuses me. Bao
           | is based on blake3? But isn't bao ... the blake3 itself now?
           | Circular dependency detected.
        
             | oconnor663 wrote:
             | Ha, yes, that's confusing. The Bao project was originally
             | two things: 1) a custom tree hash mode, and 2) an encoding
             | format and verified streaming implementation based on that
             | tree hash. The first half evolved into BLAKE3. Now the Bao
             | project itself is just the second half.
        
               | prilanoth wrote:
               | Hi, some questions...
               | 
               | The README lists 4 designers, including yourself. However
               | the Bao project doesn't list anybody, so presumably you
               | are the only designer. What exactly were the
               | contributions of the other 3 people to warrant being
               | listed?
               | 
               | At what point did the Bao project become "BLAKE3" and
               | why?
        
               | loeg wrote:
               | All three others are principals of the Blake or Blake2
               | design and major implementations.
        
         | aidenn0 wrote:
         | IIRC Bao was already based off of blake2s.
        
       | kzrdude wrote:
       | Just one variant, that's refreshing. And performance is
       | impressive. What's the short input performance like? Say for 64
       | bytes of input.
        
         | oconnor663 wrote:
         | The Performance section of the spec (https://github.com/BLAKE3-
         | team/BLAKE3-specs/blob/master/blak...) has a paragraph about
         | short input performance. 64 bytes happens to be the BLAKE3
         | block size, and performance at that length or shorter is best
         | in class. Look at the left edge of Figure 3
         | (https://i.imgur.com/smGHAKA.png).
        
           | nabla9 wrote:
           | 64 bytes happens to be the typical cache line size so it
           | makes sense to use it as a block size.
        
       | memco wrote:
       | I would be interested in how this compares on the smhasher
       | against some of the other fastest hash competitors like meow hash
       | or xxhash.
        
         | loeg wrote:
         | I am also curious about how it performs as a PRF in places
         | where e.g. Chacha20 is used as a keystream generator now. Also
         | as a reduced round variant in places where non-cryptographic
         | PRNGs are used for very fast RNG needs: JSF, SFC, Lehmer,
         | Splitmix, PCG.
         | 
         | In my extremely limited testing (on AVX2, but not AVX512
         | hardware), (buffered) reduced (four) round Chacha is only about
         | 1.5-2x slower than fast non-cryptographic PRNGs like JSF, SFC,
         | Lehmer, or pcg64_fast (all with Clang -O2 -flto, the fast PRNGs
         | are header-only implementations and only chacha is two files).
         | 
         | This thing still uses 7 rounds, but that is easy to tune down.
         | Very neat.
        
         | rurban wrote:
         | Give me two days.
        
       | bjoli wrote:
       | Does anybody know of benchmarks for ARM? Or any research trying
       | to break it?
       | 
       | The numbers look astonishing.
        
         | oconnor663 wrote:
         | Take a look at Figure 5 (https://i.imgur.com/Izs23wf.png) in
         | the spec (https://github.com/BLAKE3-team/BLAKE3-specs/blob/mast
         | er/blak...). That benchmark was done on a Raspberry Pi Zero,
         | which is a 32-bit ARM1176.
        
           | bjoli wrote:
           | Thanks! Impressive as well!
        
         | loeg wrote:
         | I'm also curious for benchmarks on amd64 without AVX512, as
         | that's the majority of existing x86 metal and the entire AMD
         | product line, even the new Zen2 stuff.
        
       ___________________________________________________________________
       (page generated 2020-01-09 23:00 UTC)