[HN Gopher] The BLAKE3 cryptographic hash function ___________________________________________________________________ The BLAKE3 cryptographic hash function Author : erwan Score : 116 points Date : 2020-01-09 17:29 UTC (5 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | loeg wrote: | Looks like they've taken _Too Much Crypto_ to heart[1] and | dropped the number of rounds from Blake2B 's 12 down to 7 for | Blake3: | | https://github.com/BLAKE3-team/BLAKE3/blob/master/reference_... | | https://github.com/BLAKE2/BLAKE2/blob/master/ref/blake2b-ref... | | Which, yeah, that alone will get you a significant improvement | over Blake2B. But definitely doesn't account for the huge | improvement they're showing. Most of that is the ability to take | advantage of AVX512 parallelism, I think. The difference will be | more incremental on AVX2-only amd64 or other platforms, I think. | | [1]: Well, TMC recommended 8 rounds for Blake2B and 7 for | Blake2S. | zokier wrote: | > Looks like they've taken Too Much Crypto to heart | | Not surprising considering that one of they is the author of | Too Much Crypto | nabla9 wrote: | That's impressive speedup. I just installed it and holy moly it | really is fast. All those extra cores can finally get busy. ;) | | b3sum -l 256 big-2.6Gfile real 0m0.384s | user 0m2.302s sys 0m0.175s | | b2sum -l 256 big-2.6Gfile rear 0m3.616s | user 0m3.360s sys 0m0.256s | | (Intel(r) Core(tm) i7-8550U CPU @ 1.80GHz x 8 ) | | EDIT: ah, the catch. blake3 targets 128 bit security. It competes | with SipHash for speed and security | | EDIT2 scratch the previous edit. | oconnor663 wrote: | > ah, the catch. blake3 targets 128 bit security. It competes | with SipHash for speed and security. | | No no, BLAKE3 is a general-purpose cryptographic hash just like | BLAKE2, SHA-2, and SHA-3. The confusion here is that a hash | function's security level is half of its output size, because | of the birthday problem. BLAKE3, like BLAKE2s and SHA-256, has | a 256-bit output and a 128-bit security level. (BLAKE3 also | supports extendable output, but that doesn't affect the | security level.) | | > holy moly it really is fast | | Thank you :) | nabla9 wrote: | Thanks for clarifying that. | cogman10 wrote: | Where is this useful? | | I'm guessing not for password hashes simply because a fast hash | is bad for passwords (makes brute forcing/rainbow tables easier). | | So is this mostly just for file signing? | hadlock wrote: | Concurrent filesystems, high availability systems etc | beefhash wrote: | Fast hashes are useful for signing, MACs (symmetric | "signatures" so to speak), key derivation (HKDF and all kinds | of Diffie-Hellman handshakes come to mind), as part of | cryptographically secure PRNGs (though most of the world has | moved on to stream ciphers for that instead) and probably more. | | While programming, just try to think of a scenario where having | a mapping between some kind of arbitrary data (and maybe a key) | and a fixed-size, uniformly random-looking output could be | useful. Opportunities to sprinkle some hashes on things come up | quite often when you look for them. | bscphil wrote: | > as part of cryptographically secure PRNGs (though most of | the world has moved on to stream ciphers for that instead) | | My understanding is that plenty of stream ciphers _are_ based | on hashes. For example each block of the stream can be | hash(key + nonce + block counter + constants) that you xor | with your plaintext (or don 't, if you just want a CSPRNG). | lame-robot-hoax wrote: | So I'm not super familiar with things like this, but for | example, WireGuard uses BLAKE2 for hashing. What level of | undertaking would it be to move from BLAKE2 to BLAKE3 in | regards to WireGuard? Can you just pop out BLAKE2 and pop in | BLAKE3? | aidenn0 wrote: | Assuming wireguard hashes data shorter than 4k (i.e. most | network packets), there is no reason to switch; BLAKE3 is | only faster than BLAKE2 on data longer than 4k. | beefhash wrote: | The two hashes aren't compatible, so a hash of the same | message will yield two different hashes under BLAKE2 and | BLAKE3. | | As far as I can tell, BLAKE2 has effectively all the | properties BLAKE3 has (arbitrary output length, keyed hash | mode), so the upgrade for communications boils down to | negotiating/determining which of the two hash functions to | use over the wire (with all the downsides that come with | agility of cryptographic primitives); for stored hashes, | they have to be recomputed and replaced (or you could store | a flag is BLAKE2/is BLAKE3 and update them as you touch | hashes, kind of similar to how password hashes are swapped | at login time). | | Note that BLAKE3 existing doesn't _break_ BLAKE2. It 's | perfectly fine to just keep trucking BLAKE2, it's just that | BLAKE3 has better performance characteristics that make it | very attractive. | lisper wrote: | Hashes are useful in a wide variety of cryptographic | applications, of which file signing is just one example. But | you are correct, you would not want to use this by itself for | password hashing. But that is true of any cryptographic hash. | not just Blake3. | capableweb wrote: | Anywhere where you do hashing of stuff and you'd like it to be | faster. Content addressable systems use hashing to generate | identifiers for content (where the same content has the same | identifier, and different content for sure has different ID). | If you have a 2GB file you want a identifier for, using BLAKE3 | would make that a lot faster. | | > Capable of verified streaming and incremental updates, again | because it's a Merkle tree. | | I don't really understand what this means (verified streaming + | incremental updates), could someone clarify? Merkle Trees are | simple (https://en.wikipedia.org/wiki/Merkle_tree for people | who don't know) | mauricio wrote: | From their BLAKE3 paper (https://github.com/BLAKE3-team/BLAKE | 3-specs/blob/master/blak...) they give an example of verified | streaming in 6.4. | | Basically to verify a video file using serial hash functions | you need to download the entire video file before you can | perform the hashing. In BLAKE3 you can verify each chunk of | the video as it is being streamed because the hash internally | is just a Merkle Tree. | capableweb wrote: | Thanks for the link mauricio! That's very interesting and | makes a ton of sense. Gonna have a lot of fun playing | around with BLAKE3 for different things. | the8472 wrote: | As far as I can tell the crate doesn't expose APIs needed | to construct or verify the sibling/uncle hashes that would | be used in partial verification. | bjoli wrote: | Just thinking of writing such code myself is enough to | get nightmares. | oconnor663 wrote: | Verified streaming is implemented by | https://github.com/oconnor663/bao. | s_tec wrote: | It looks like the speedup is coming from two main changes. | | The first change is reducing the number of rounds from 10 to 7. | Think of it like making a smoothie - you add bits of fruit to the | drink (the input data), then pulse the blades to blend it up | (making the output hash). This change basically runs the blades | for 7 seconds instead of 10 seconds each time they add fruit. | They cite evidence that the extra 3 seconds aren't doing much - | once the fruit's fully liquid, extra blending doesn't help - but | I worry that this reduces the security margin. Maybe those extra | 3 rounds aren't useful against current attacks, but they may be | useful against unknown future attacks. | | The other change they make is to break the input into 1KiB | chunks, then hash each chunk independently. Finally, they combine | the individual chunk hashes into a single big hash using a binary | tree. The benefit is that if you have 4KiB of data, you can use | 4-way SIMD instructions to process all four chunks | simultaneously. The more data you have, the more parallelism you | can unlock, unlike traditional hash functions that process | everything sequentially. On the flip side, modern SIMD | instructions can handle 2 x 32-bit instructions just as fast as 1 | x 64-bit instructions, so building the algorithm out of 32-bit | arithmetic doesn't cost anything, but gives a big boost to low- | end 32-bit CPU's that struggle with 64-bit arithmetic. The tree | structure is a big win overall. | loeg wrote: | Minor nitpick: From 12 rounds for Blake2B; 10 is correct for | Blake2S. | zokier wrote: | > but I worry that this reduces the security margin. Maybe | those extra 3 rounds aren't useful against current attacks, but | they may be useful against unknown future attacks. | | This was covered in more detail in previous "Too Much Crypto" | paper [1], which argued that many standards have excessively | high round counts. Note that Aumasson is author of both Blake3 | and Too Much Crypto | | [1] https://news.ycombinator.com/item?id=21917505 | hinkley wrote: | Thinking back to Schneier's running commentary on SHA3, using | hierarchical hashes was part of a general attempt to increase | the internal state space of the hashes. SHA1 exposes all of the | bits. So once you get two prefixes with the same hash, adding a | common suffix results in the same output hash. | | With hashes of hashes, the prefixes have to be the same length, | and possibly very short. | [deleted] | dpc_pw wrote: | So this is bao + blake2? | | I remember watching Bao, a general purpose cryptographic tree | hash, and perhaps the fastest hash function in the world: | https://www.youtube.com/watch?v=Dya9c2DXMqQ a while ago. | | Nice job! | oconnor663 wrote: | Yep that's me :) The Bao project evolved into BLAKE3, and the | latest version -- which I literally just released -- is now | based on BLAKE3. | Exuma wrote: | Hey I like your speaking style, clear, and funny. Nice work. | dpc_pw wrote: | Excuse my confusion. I understand "the Bao project evolved | into BLAKE3", but "is now based on BLAKE3" confuses me. Bao | is based on blake3? But isn't bao ... the blake3 itself now? | Circular dependency detected. | oconnor663 wrote: | Ha, yes, that's confusing. The Bao project was originally | two things: 1) a custom tree hash mode, and 2) an encoding | format and verified streaming implementation based on that | tree hash. The first half evolved into BLAKE3. Now the Bao | project itself is just the second half. | prilanoth wrote: | Hi, some questions... | | The README lists 4 designers, including yourself. However | the Bao project doesn't list anybody, so presumably you | are the only designer. What exactly were the | contributions of the other 3 people to warrant being | listed? | | At what point did the Bao project become "BLAKE3" and | why? | loeg wrote: | All three others are principals of the Blake or Blake2 | design and major implementations. | aidenn0 wrote: | IIRC Bao was already based off of blake2s. | kzrdude wrote: | Just one variant, that's refreshing. And performance is | impressive. What's the short input performance like? Say for 64 | bytes of input. | oconnor663 wrote: | The Performance section of the spec (https://github.com/BLAKE3- | team/BLAKE3-specs/blob/master/blak...) has a paragraph about | short input performance. 64 bytes happens to be the BLAKE3 | block size, and performance at that length or shorter is best | in class. Look at the left edge of Figure 3 | (https://i.imgur.com/smGHAKA.png). | nabla9 wrote: | 64 bytes happens to be the typical cache line size so it | makes sense to use it as a block size. | memco wrote: | I would be interested in how this compares on the smhasher | against some of the other fastest hash competitors like meow hash | or xxhash. | loeg wrote: | I am also curious about how it performs as a PRF in places | where e.g. Chacha20 is used as a keystream generator now. Also | as a reduced round variant in places where non-cryptographic | PRNGs are used for very fast RNG needs: JSF, SFC, Lehmer, | Splitmix, PCG. | | In my extremely limited testing (on AVX2, but not AVX512 | hardware), (buffered) reduced (four) round Chacha is only about | 1.5-2x slower than fast non-cryptographic PRNGs like JSF, SFC, | Lehmer, or pcg64_fast (all with Clang -O2 -flto, the fast PRNGs | are header-only implementations and only chacha is two files). | | This thing still uses 7 rounds, but that is easy to tune down. | Very neat. | rurban wrote: | Give me two days. | bjoli wrote: | Does anybody know of benchmarks for ARM? Or any research trying | to break it? | | The numbers look astonishing. | oconnor663 wrote: | Take a look at Figure 5 (https://i.imgur.com/Izs23wf.png) in | the spec (https://github.com/BLAKE3-team/BLAKE3-specs/blob/mast | er/blak...). That benchmark was done on a Raspberry Pi Zero, | which is a 32-bit ARM1176. | bjoli wrote: | Thanks! Impressive as well! | loeg wrote: | I'm also curious for benchmarks on amd64 without AVX512, as | that's the majority of existing x86 metal and the entire AMD | product line, even the new Zen2 stuff. ___________________________________________________________________ (page generated 2020-01-09 23:00 UTC)