[HN Gopher] Mathematicians discover a perfect way to multiply (2... ___________________________________________________________________ Mathematicians discover a perfect way to multiply (2019) Author : mmphosis Score : 118 points Date : 2020-07-22 18:10 UTC (4 hours ago) (HTM) web link (www.quantamagazine.org) (TXT) w3m dump (www.quantamagazine.org) | mgrennan wrote: | I can't believe this is a thing. I showed this to my grade school | teacher and she told me I was doing it WRONG. I just new it | worked so I kept using it. | glouwbug wrote: | Looks like Joris is now at SFU according to his homepage: | http://www.texmacs.org/joris/main/joris.html. | | Would be cool to run into him someday | bordercases wrote: | Yeah now that SFU got the hosting for Canada's West Coast | supercomputation, they've been attracting top-shelf talent. | albertzeyer wrote: | > Regardless of what computers look like in the future, Harvey | and van der Hoeven's algorithm will still be the most efficient | way to multiply. | | This is only true if you just look at the big O notation. | However, if you also look at the constant involved to get C * n * | log(n) + C2 as an upper bound for the number of operations for | the calculation, there can be faster algorithms with lower | constants C and C2. And these algorithms also will be of | practical interest. | | Also, as I understand it, it was not actually proven (yet) that | O(n log(n)) is the fastest possible class. | | Also, you could also study the average runtime, instead of the | worst case upper bound. There are cases where the multiplication | can be much more efficient than O(n log(n)). An algorithm could | be clever and detect many such cases and then maybe end up in a | strictly better average runtime class. Or I'm not sure if it is | proven that this cannot be the case. | | Similarly, sorting algorithms are still optimized more and more, | although the worst case complexity is always O(n log(n)) (ignore | integer sorting for now). But average and best case runtime | complexity is relevant. And memory usage complexity as well. And | other properties. And of course also these constants which matter | in practice. | | A big C2 basically corresponds to the galactic algorithms, | mentioned in another thread here. | mellosouls wrote: | While this approach may have practical uses, the "sufficiently | large numbers" bound got me reading about Galactic Algorithms... | | https://en.wikipedia.org/wiki/Galactic_algorithm | adfsgkinio wrote: | I found this line amusing: | | >This means it will not reach its stated efficiency until the | numbers have at least 2^1729^12 bits (at least 10^10^38 | digits), which is vastly larger than the number of atoms in the | known universe. | | "Vastly larger" is one of the biggest understatements I've ever | heard. It's roughly 10^10^38 times larger. | | I can't think of a better way to write that line. I just found | it funny. | nurettin wrote: | Previously on HN https://news.ycombinator.com/item?id=19474280 | dang wrote: | Also https://news.ycombinator.com/item?id=19672835 | | and https://news.ycombinator.com/item?id=19644374 | mikorym wrote: | Title, paper version: | | > Integer multiplication in time O(n log n) | | https://hal.archives-ouvertes.fr/hal-02070778/document | andyljones wrote: | For anyone else just looking for an outline of the new | algorithm: last two paragraphs of p4, first two paragraphs of | p5. | | Coming from a position of a few abstract algebra classes in | college many years ago, all the words and notation are familiar | but I am a long way from being able to follow it. | zitterbewegung wrote: | If the title had "almost the perfect way" it would be the perfect | way to write this title. | | But, leaving that out increases your earnings by a constant | factor. | rdtsc wrote: | The traditional method in the article looks a bit different from | how I learned it, which was this way https://www.wikihow.com/Do- | Double-Digit-Multiplication, | | The overall description of it looks right, but in the picture the | order of operations would be more like B, C, D, A not A, B, C, D | Sharlin wrote: | Yeah, that's a bit nonstandard order but multiplication, of | course, is associative and commutative, so it doesn't really | matter. | JadeNB wrote: | Indeed--the advantage of doing it as you describe is that each | new multiplication gives you a new digit that can be recorded | immediately, rather than having to record the '2' from the 2 x | 6 multiplication and later amend it to '5'. | thelazydogsback wrote: | This is amazing to me: > "Their conjecture was based on a hunch | that an operation as fundamental as multiplication must have a | limit more elegant than n x log n x log(log n)" | | Only in maths (and sometimes in its "implementation"/ expression | in nature in conjunction with physical systems) can such a | subjective viewpoint be so powerful. | | Also blows me away, at least at first, that multiplying using an | _FFT_ would be the fastest method so far. (I guess I have an | older mindset based on how slow FFTs used to be.) I wonder in | actual implementations at how many digits the perf curves meet at | break-even compared to other methods. | gowld wrote: | When you say FFT is slow, did you know any way to multiply | large numbers that was fast? Or you assumed it was without | evidence? | thelazydogsback wrote: | Huh? | amelius wrote: | > Two decades ago, computers performed addition much faster than | multiplication. The speed gap between multiplication and addition | has narrowed considerably over the past 20 years to the point | where multiplication can be even faster than addition in some | chip architectures. With some hardware, "you could actually do | addition faster by telling the computer to do a multiplication | problem, which is just insane," Harvey said. | | Is this true for popular architectures such as Intel and ARM? | corsix wrote: | An Intel Haswell chip, when doing floating point vector | operations, can dispatch one addition per cycle, or two fused- | multiply-add. The latency is worse for the FMA (5 cycles versus | 3 cycles), but if you've got enough parallel accumulations to | hide that latency, then using FMAs for addition will be faster | than addition. | | This is a consequence of Intel optimising for FLOPs (or - | equivalently - for matrix multiplication), as an FMA counts as | two FLOPs, versus addition counting as just one. | gowld wrote: | Is that just because multiplication is so commonly needed | that there is double hardware allocated for FMA? | | Like, if I had a special purpose adding machine and a special | purpose multiply machine, and I bought a second multiply | machine, then I could multiply large numbers faster than | adding, but that's not insane, that's resource allocation. | jcranmer wrote: | A floating point add works roughly like this: | | 0. Extract mantissa and exponents from the inputs | [basically free in a HW implementation]. | | 1. Combine the two exponents and shift the mantissas so | that they same the share scale. | | 2. Add the two mantissas together. | | 3. Normalize the result back into a floating-point number. | | Now look at the process for a floating-point multiply | operation: | | 0. Extract mantissa and exponents from the inputs | [basically free in a HW implementation]. | | 1. Add the two exponents together. | | 2. Multiply the two mantissas together. | | 3. Normalize the result back into a floating-point number. | | Steps 0 and 3 are identical in both cases. I'm not really | knowledgeable about hardware multiplier implementations, | but doing an additional "add" operation can be pretty close | to free in some implementations. In any case, adding in an | extra adder to the process is going to be quite cheap, even | if not free. | | What this means is that there is not much extra hardware to | turn a FMUL ALU op into an FMA ALU op... and if you make | just a single FMA ALU op, you can use that hardware to | implement FADD, FMUL, and FMA with nothing more than | microcode. | | In other words, instead of thinking of FMA as something | worth pouring more resources into, think of it as FADD as | not worth enough to be made its own independent unit. | ummonk wrote: | Well the algorithm for floating point addition is very | complicated so regardless of whether they were gaming the | flops you would expect floating point multiplication to be at | least as fast as floating point addition on most | architectures. | amelius wrote: | > the algorithm for floating point addition is very | complicated | | Ah yes, of course. I think it is slightly misleading that | the article is mostly about large integer multiplications, | while that remark holds for floating point operations. | ummonk wrote: | Not really. Afaik integer multiplication still usually has | lower instruction per cycle throughput than addition and | obviously even if it were faster that would only apply to 32 | but 64 bit etc integers not big integer multiplication. | coliveira wrote: | I think the article is talking about floating point addition. | Integer addition it is very hard to beat. | mumbisChungo wrote: | I think I'm misunderstanding something about the article. In what | way is this method perfect, as is mentioned multiple times? Is it | not just an incremental improvement over standard practice? | ghastmaster wrote: | "Perfect" in this case is a qualitative exaggeration bordering | on hyperbole. | | > At the end of February, a team of computer scientists at | Aarhus University posted a paper arguing that if another | unproven conjecture is also true, this is indeed the fastest | way multiplication can be done. | mumbisChungo wrote: | I would say it's firmly hyperbole/clickbait, upon further | investigation. | bo1024 wrote: | I think it's fair. Nobody thinks a faster algorithm than n | log n will ever be possible. It's like a mini P!=NP. | hinkley wrote: | In computational complexity, we have some proofs that a certain | cost is the absolute bare minimum that an optimal solution must | expend. We have the same for some mechanical systems too. | | If you get the order down to the proven limit, especially for | the first time, people sometimes use the word 'perfect', even | if there are potentially other solutions with lower constant | overhead, or lower practical overhead (see also Timsort, which | uses insertion sort for leaf operations). | | > Second, in that same paper Schonhage and Strassen conjectured | that there should be an even faster algorithm than the one they | found -- a method that needs only n x log n single-digit | operations -- and that such an algorithm would be the fastest | possible. | | Is this saying that there is a proof in there paper that nlogn | is the cheapest? If so it certainly beats around the bush. | remcob wrote: | A conjecture is a proposition that is suspected to be true. | So there would be no proof in the paper, but likely some | arguments why the authors expect it to be true. | hinkley wrote: | If a scientist were writing the article about the paper, | yes. But everyone has been quick to point out is definitely | not the case. | timwaagh wrote: | No (people think there isn't but it's not been proven). | EvgeniyZh wrote: | There is a (conditional) proof that nlogn is optimal: | https://arxiv.org/abs/1902.10935 | ogogmad wrote: | It's perfect in the sense that its running time is O(n log n), | and there is evidence that this is the fastest running time | possible (at least up to a constant factor). | mumbisChungo wrote: | And that evidence is... an unproven theory, which itself is | also heavily qualified? | waynecochran wrote: | I would say most mathematical conjectures are stronger than | most scientific theories. What I mean is that the | conjectures are easily falsifiable (i.e., counter examples, | if found, can easily be verified) and have been shown to be | true for an enormous number of of cases. So unproven | mathematical conjectures are not just "wild hypotheses." | ogogmad wrote: | According to the article, it rests on a plausible | conjecture. I need to read further. | | [edit] | | According to the abstract of the paper referred to in the | article (https://arxiv.org/abs/1902.10935) it rests on a " | _central conjecture in the area of network coding_ ". Make | of that what you will. | mumbisChungo wrote: | From the abstract: | | >In this work, we prove that if a central conjecture in | the area of network coding is true, then any constant | degree boolean circuit for multiplication must have size | O(nlgn), thus almost completely settling the complexity | of multiplication circuits | | So, given an unproven assumption and also given that we | are constrained to doing multiplication on a boolean | circuit, this may be "perfect" in a sense. | | Seems like the title is mostly just clickbait. | hinkley wrote: | Yeah but in the backstory they also mentioned that | _Kolmogorov_ conjectured that the optimal solution was | O(n^2). | | Conjectures aren't doing so hot in this area, it seems. | ChrisLomont wrote: | >Conjectures aren't doing so hot in this area, it seems. | | For that one counter-example how many conjectures turned | out to be true? | timwaagh wrote: | Ah but that's not what this game is about. It's about | what you can prove. And conjectures are by definition | things you can't prove yet. So the longer they stand and | the more paths they block the more frustrating it gets. | dogfoods wrote: | RTFA and find out! | dogfoods wrote: | It achieves a theoretical lower bound on the complexity. This | is what the article says in like 1000 words because it's | written for mathematical illiterates. | senkora wrote: | They do not prove a lower bound. | | > Harvey and van der Hoeven's algorithm proves that | multiplication can be done in n x log n steps. However, it | doesn't prove that there's no faster way to do it. | Establishing that this is the best possible approach is much | more difficult. At the end of February, a team of computer | scientists at Aarhus University posted a paper arguing that | if another unproven conjecture is also true, this is indeed | the fastest way multiplication can be done. | mumbisChungo wrote: | Yes, a lower bound for multiplication using a boolean | circuit, resting on an unproven conjecture. | | Even if the conjecture is proven to be true, we may find a | more efficient way to perform this operation than a boolean | circuit. | tdons wrote: | Respect :). I Implemented Karatsuba a while ago and wondered if | an even faster method would be available. It is! | | I admire the perseverance, grit, and intelligence needed to crack | hard nuts like this. | jacobolus wrote: | This new method will be marginally faster if you ever need to | multiply two numbers on a scale where writing down the bits | fills every plank-length cube in the known universe. | | Otherwise the "practical" benefit is limited to saving a few | symbols when writing abstract math papers. | the8472 wrote: | > This new method will be marginally faster if you ever need | to multiply two numbers on a scale where writing down the | bits fills every plank-length cube in the known universe. | | If we're talking about physical implementations then you | would also have to take the delay between those planck | volumes into account | [deleted] | JakeStone wrote: | Gotta admit, I started chuckling at "We use [the fast Fourier | transform] in a much more violent way..." I've known some math | geeks in my day, and this sounded like them. ___________________________________________________________________ (page generated 2020-07-22 23:00 UTC)