[HN Gopher] Building a Curve25519 Hardware Accelerator ___________________________________________________________________ Building a Curve25519 Hardware Accelerator Author : parsecs Score : 163 points Date : 2021-07-15 14:37 UTC (8 hours ago) (HTM) web link (www.bunniestudios.com) (TXT) w3m dump (www.bunniestudios.com) | upofadown wrote: | >The "double ratchet" algorithm is integral to modern end-to-end- | encrypted chat apps, such as Signal, WhatsApp, and Matrix. | | Signal protocol uses a hash ratchet for the message to message | forward secrecy. So you don't need to do the expensive key | agreement stuff unless that hash ratchet falls out of | synchronization. | | It isn't really clear if any messaging application really needs | message to message forward secrecy in the first place. You really | only need to do that expensive key agreement operation when you | want to get rid of some old messages that someone has captured | off the wire. That can be reasonably done at the end of a session | or even weekly. Very few people need messages that have not even | scrolled off the screen to be forward secret, assuming they need | it at all. | tptacek wrote: | _You really only need to do that expensive key agreement | operation when you want to get rid of some old messages that | someone has captured off the wire._ | | Could you perhaps share more of your analysis here? I'd like to | be able to better follow what you are trying to say. | upofadown wrote: | If you are using a temporary shared secret then after you | forget that shared secret you will need a new secret which | you will have to generate. I mean nothing more than that. | tptacek wrote: | Right, why would anyone ever be better off holding that | temporary secret over many messages and many days? | upofadown wrote: | My entire point is that it is not really necessary to do | shared secret generation at any but a very slow rate. | Even if you don't so it the Signal protocol way it is | still perfectly reasonable to only do it once in a while. | That might be a short while but it could reasonably be | quite long. | tptacek wrote: | Why is it reasonable to only do shared secret generation | once in awhile? What's the win to doing that? The win to | not doing it is that it narrows the blast radius of a | single key compromise to a small number of messages (or, | in full ratchets, to a single message). Why would I ever, | ever want to _increase_ the blast radius of a key? | admax88q wrote: | I'm not sure you gain much by moving the forward secrecy to a | weekly or session based level, and I think you lose a lot in | terms of simplicity. | | It is simpler to design and analyze if it applies at every | message. It's a design for the best case. No longer need to | make judgement calls on if 1 week is enough, or maybe 2, or | maybe some middle ground. | toast0 wrote: | > That can be reasonably done at the end of a session or even | weekly. | | At the end of a session might be reasonable cryptographically, | but isn't really reasonable in terms of application | development; you don't consistently get to have use CPU and | network when a session ends and you may not get access again | until a new session begins. A weekly trigger is more likely to | be practical, but that may not work consistently either. Better | to do the ratchetting as you're doing the messaging to | guarantee it gets done. | sliken wrote: | Say it takes a $5,000 machine 24 hours to break the code. | Clearly it's much more attractive to an attacker if they can | break a weeks worth of messages than 10 seconds worth. | OJFord wrote: | > It's a weird thing about academics -- they like to write papers | and "share ideas", but it's very hard to get source code from | them. | | I suppose at least part of the reason, especially for things like | crypto and statistics, is that they don't want to get bogged down | in plausibly-correct claims of fatal flaws, and be perhaps | repeatedly (over years hence) put in a defensive position of | having to prove the criticism _in_ correct or have everyone | assume the work was wrong? | | I can sort of understand that. The same's true for the papers | themselves of course, but I can see it being more annoying for | code. ('But it's never going to actually be in that state', etc.) | JJMcJ wrote: | Donald Knuth once wrote a paper with some source code. | | He warned his readers about just copying the code, perhaps half | jokingly. | | He said, approximately, "this code has been proven correct, but | not tested." | bunnie wrote: | You're probably right. And assuming you are, then I think maybe | the core problem may be that academics even tolerate the petty | undercutting of reputation based not on the merit of the big | ideas, but rather minor flaws in the proofs of concept. | | There's also a problem in allowing papers to get away with | making arbitrary claims that are by definition irreproducible. | Every paper always includes some section about how many gates a | design took up and how fast it ran -- but there aren't entirely | consistent standards for reporting these things; it's often too | easy to hide significant setup costs when running benchmarks. | | This leads to an incentive to overstate results, because | without any threat of checks it would be disadvantageous to not | cherry-pick the absolute best numbers you could get through a | peer review committee. | | I get the desire to escape petty criticism: but it seems like a | weak argument against the myriad of reproducibility and | integrity hazards such a lack of transparency begets, | especially for a line of work that ostensibly distinguishes | itself on high standards for merit and integrity. | | It also would have been nice to just receive a polite response | of the form of "sorry, I don't support this anymore, I've moved | on..." from anyone I had reached out to. Before starting on | this project, I emailed four or five groups that had all | published various papers on Curve25519 accelerators hoping to | get even a broken non-compiling repo that could have helped me | sanity-check my understanding of the thornier concepts | presented in the papers. | | Not even a single response...hence the motivation to write up | notes and share them. Seems a waste of funding to have so many | implementations, but no way to actually use any of them... | staticassertion wrote: | Are you saying this based on experience? My assumption was that | it was a number of factors. | | 1. Researchers are often weak coders, and they may be self | conscious about their code. | | 2. My own experience is that researcher code is totally | undocument, or broken. I've repeatedly found that the code | linked to from a paper has no straightforward way to be built, | or appears to have compiler errors etc. | | 3. It's more work, and researchers have the incentive to put | out a compelling paper, not to put out code (because we don't | hold them to that standard). | | My own experiences have led to me to mostly dismiss papers | where I can't find the code or can't get it to build. | kzrdude wrote: | Given all that, it's understandable that it's not released | and completely reasonable - a support structure is needed if | the code should be released. | staticassertion wrote: | I'm setting the bar super low. A `./configure` and a readme | with basic info about the project and how to build/run it | would be quite exceptional. I don't need them to be | triaging github issues and merging in PRs. | gww wrote: | In the biological sciences there is a huge push for the code to | be released alongside the paper. Many publications (but not | enough yet) include a complete pipeline in a workflow language | that downloads, processes the data and generates the figures | used in the paper. | toomuchtodo wrote: | What's the language de jour typically leveraged to declare | these workflows? | semi-extrinsic wrote: | Since GP mentions biology, I would guess R is the first | choice. | jlrubin wrote: | Hi Bunnie! | | Curious if you also have a power usage comparison, which I'm | guessing is also relevant in the mobile context. | korethr wrote: | This is neat. The ever-present admonition against rolling one's | own crypto lurks in the back of my mind, though. It is not my | intent here to impugn Bunnie's competence, though I do wonder if | he's run the design or implementation past any cryptographers to | try to verify that the design or implementation are sound, and | don't subtly break critical assumptions vital to the security of | the algorithm. | pinewurst wrote: | But he's not rolling his own crypto - in an algorithm sense - | purely trying to speed the execution of an existing algorithm | which can be easily validated against other existing | implementations | korethr wrote: | Sure, but is it not possible that a hardware acceleration, if | not also proven correct, could subtly undermine an otherwise | valid software implementation? | | I do not assert that this is the case generally, nor | particularly here -- I would be quite happy to be wrong about | this. But if there's one thing I've taken from my admittedly | minor study of crypto is that the most trivial of details can | be critically important, and failure to get a detail exactly | correct can compromise an entire algorithm and anything built | thereupon. | | I would love to see hardware acceleration for ed25519, just | like there's been hardware acceleration for AES. It could | help drive adoption of an algorithm that is less fiddly and | easy to get wrong than RSA, and maybe in another 15 years, | I'd be able to purchase IT equipment that supports ed25519 | and not just hardware accelerated RSA and AES. | IfOnlyYouKnew wrote: | Excellent deep dive on this beast that had me similarly question | my intelligence in the past. | | As a former academic: the reason sharing source code is/was rare | is that it tends to be extremely ugly, and that you don't get any | credit for publishing it where it matters (I. e. tenure | committees). To go from something that works to something that | you would allow people to see takes about as much time as the | original work. | | (As a side note: I feared for the worst when I saw, among tags | like "hacking" and "open source", the tag "feminism" in the side | bar. I enjoyed being wrong.) | vlovich123 wrote: | I don't think that's a good reason. Production code is also | ugly. | sowbug wrote: | This work is associated with Precursor | (https://www.bunniestudios.com/blog/?p=5921), which is Bunnie's | project to build a mobile hardware platform from the ground up. | jazzyjackson wrote: | TIL 255 is divisible by 17 | | Edit: thanks to my replies I know several more things now lol big | day for me | pavpanchekha wrote: | 256 = 16 * 16 | | 255 = 256 - 1 = 16 * 16 - 1 = 15 * 17 | Sesse__ wrote: | 2^p - 1 can only be prime (a so-called Mersenne prime) if p is | prime. 2^8 - 1 thus cannot be a prime, since 8 is composite. | | (Note that is it not _sufficient_ that p is prime, but it is | necessary.) | pfortuny wrote: | (a^2-1)=(a+1)(a-1). | jandrese wrote: | It is also divisible by 5. | ChuckMcM wrote: | This is an excellent read[1]. It reminded me of the process I | went through putting together an FFT engine in an FPGA although I | think the small execution engine is just stellar, and not | something I had considered. | | [1] To be fair I find most of the stuff Bunnie writes about to be | worth reading! :-) | staticassertion wrote: | > Wait, what? So many articles and journals I read on the topic | just talk about prime fields, modular reduction and blah blah | blah like it's asking someone to buy milk and eggs at the grocery | store. | | No kidding, it's so frustrating! I have to read some articles/ | papers like 20x and open a bunch of wikipedia tabs to understand | wtf they're talking about. If they gave a simple, high level | explanation it would save tons of time - while a wikipedia | article is going to be very in depth, it's not tailored to the | context of what I'm reading, so I often have to look at a whole | bunch of things to start to build a good model in my head for wtf | they're trying to convey. | | This post, on the other hand, is perfect for me. Thank you so | much for writing it I'm learning a ton. | dragontamer wrote: | https://www.youtube.com/watch?v=xE4jEKx9fTM | | GF5 is the 3rd smallest finite field (GF2 and GF3 are smaller), | but GF2 and GF3 are too small to really see where the beauty of | all this math lies. | | ------ | | GF5 (which is simply "arithmetic modulo 5") has all the | properties of Real-numbers "we care about", and none of the | hassles of decimals or partial numbers. | | GF5 is simply arithmetic modulo 5. Its really easy to do. 4 + 2 | == 1. (Aka: 4 + 2 mod 5 == 6 mod 5 == 1). Addition and | multiplication are easily proven to be the same. | | The fun starts when you start doing roots, exponents and | logarithms (!!!). Take the definition of addition / | multiplication above (aka: addition / multiplication modulo 5) | and redefine square-root, exponents, and logarithms to be | consistent with it. | | That is to say: 2 * 3 == 1. Which means that 3 == (1/2) (yes, | "one half" is now 3 in GF5). Which means 3 is 2^(-1). | | * 2^2 == 4. | | * 2^2 * (2^-1) == 4 * 3 == 12 mod 5 == 2. | | Which means 2^2 * 2^-1 == 2^(2-1) and look, exponents work as | expected. Feel free to play around with all the numbers: you'll | find that exponents work once you match this new "definition" | of exponents. | | --------- | | Because exponents work, logarithms work. Because everything | works, we can have complicated ellipical curves defined over | the numbers {0, 1, 2, 3, 4} (aka: the numbers mod 5) and have | everything work out mathematically. | | -------- | | The hard part is proving that all of this works in GF(5^2) aka | GF25. It turns out that you can't just "mod 25" the stuff, you | need to do an extension field (and basically make a complex | number out of things). | | Once you understand GF25 extension fields and its relationship | with GF5, then it becomes easy to generalize that to GF2 and | its extension fields GF(2^8) (aka: binary numbers of 8 bits) or | GF(2^32) (aka: 32-bit numbers). | | Turns out for extension fields (like GF(5^2) aka GF25) you need | to redefine multiplication and addition as well and then | rebuild all the math from the ground up off of this new | definition of addition and new definition of multiplication. | And again, this is most important for GF(2^32), aka the 32-bit | number field that's common on today's computers. | | Bonus points: GF(2^32)'s new addition operator is just an xor. | Only multiplication is difficult. | | But once you do that, it all __JUST WORKS__ and is amazing. | | ---------- | | Galois Fields are a way to "cheat" a definition of logarithm, | exponents, square roots, multiplication, division, and addition | (by redefining these operations). | | It turns out that these operations continue to work "as | expected" in the Galois Field... as long as you're cool with | the new definitions of them. | | ------ | | EDIT: I'm slightly wrong. Seems like Curve25519 is a prime | field and not an extension field. So this is more similar to | GF5 than it is to GF(2^32). Still, other elliptical curves are | in GF(2^32) or other extension fields, so the study of | extension fields is useful for other curves. | | Curve25519 can be understood with just the GF(5) description I | gave earlier in this post. Prime fields are really simple: just | mod(the prime number) and you're set. There's some weird | properties that can accelerate speed (Any division by Foo is | equivalent to multiplying by (1/Foo), which might be faster). | fogof wrote: | > GF5 is the 3rd smallest finite field (GF2 and GF3 are | smaller) | | There is also a field of 4 elements, since 4 is a power of a | prime. | dragontamer wrote: | Fair. I guess I was implicitly staying away from extension | fields because people's attention span will start drifing | away the minute I start to attempt to explain how to | perform modulo polynomial arithmetic over irreducible | polynomials. | | Prime fields (GF2, GF3, GF5...) are just easier to explain | than extension fields (GF (2^2), GF(3^2), GF(5^2)... | GF(2^8)...) | Kalium wrote: | This feels, to me, like the complaint of someone with a general | technical grounding going into a narrow specialization and | discovering they understand none of the terminology. | | This is _exactly_ how I felt when I was learning Haskell and | hearing about eta conversions and pointfreeness and so on. Each | new bit of terminology had a precise definition, but | specialists writing primarily for one another were not inclined | to offer definitions each time. They didn 't need them and | neither did their anticipated audience. I had to learn the | terminology and jargon so I could understand what they were | talking about, and high-level abstracted explanations were not | going to be enough to participate usefully. I had to learn the | basics. | | Similarly, Watson & Crick's seminal paper might not have been | the best of all possible times to stop and define an X-ray or | how crystallography worked. | | So yes, just instantiate the metaclass, implement the | interface, and go buy milk. Easy. You just have to learn what | each of those things actually means, especially if the details | might be important (and in cryptography, that's always the | case). | bunnie wrote: | Thanks! It's good to know I'm not alone in feeling this way. | | This is actually an abridged version, if you click into the | link for the "datasheet" (https://ci.betrusted.io/betrusted- | soc/doc/engine.html) you can find all the gory details down to | the register level... | staticassertion wrote: | Awesome, much appreciated, I'll dig into it. I was actually | just using the dalek crate myself quite heavily this week, so | this is great timing. | amirhirsch wrote: | I liked your struggle with the corner-case :) ___________________________________________________________________ (page generated 2021-07-15 23:00 UTC)