[HN Gopher] Mathematicians discover a perfect way to multiply (2...
       ___________________________________________________________________
        
       Mathematicians discover a perfect way to multiply (2019)
        
       Author : mmphosis
       Score  : 118 points
       Date   : 2020-07-22 18:10 UTC (4 hours ago)
        
 (HTM) web link (www.quantamagazine.org)
 (TXT) w3m dump (www.quantamagazine.org)
        
       | mgrennan wrote:
       | I can't believe this is a thing. I showed this to my grade school
       | teacher and she told me I was doing it WRONG. I just new it
       | worked so I kept using it.
        
       | glouwbug wrote:
       | Looks like Joris is now at SFU according to his homepage:
       | http://www.texmacs.org/joris/main/joris.html.
       | 
       | Would be cool to run into him someday
        
         | bordercases wrote:
         | Yeah now that SFU got the hosting for Canada's West Coast
         | supercomputation, they've been attracting top-shelf talent.
        
       | albertzeyer wrote:
       | > Regardless of what computers look like in the future, Harvey
       | and van der Hoeven's algorithm will still be the most efficient
       | way to multiply.
       | 
       | This is only true if you just look at the big O notation.
       | However, if you also look at the constant involved to get C * n *
       | log(n) + C2 as an upper bound for the number of operations for
       | the calculation, there can be faster algorithms with lower
       | constants C and C2. And these algorithms also will be of
       | practical interest.
       | 
       | Also, as I understand it, it was not actually proven (yet) that
       | O(n log(n)) is the fastest possible class.
       | 
       | Also, you could also study the average runtime, instead of the
       | worst case upper bound. There are cases where the multiplication
       | can be much more efficient than O(n log(n)). An algorithm could
       | be clever and detect many such cases and then maybe end up in a
       | strictly better average runtime class. Or I'm not sure if it is
       | proven that this cannot be the case.
       | 
       | Similarly, sorting algorithms are still optimized more and more,
       | although the worst case complexity is always O(n log(n)) (ignore
       | integer sorting for now). But average and best case runtime
       | complexity is relevant. And memory usage complexity as well. And
       | other properties. And of course also these constants which matter
       | in practice.
       | 
       | A big C2 basically corresponds to the galactic algorithms,
       | mentioned in another thread here.
        
       | mellosouls wrote:
       | While this approach may have practical uses, the "sufficiently
       | large numbers" bound got me reading about Galactic Algorithms...
       | 
       | https://en.wikipedia.org/wiki/Galactic_algorithm
        
         | adfsgkinio wrote:
         | I found this line amusing:
         | 
         | >This means it will not reach its stated efficiency until the
         | numbers have at least 2^1729^12 bits (at least 10^10^38
         | digits), which is vastly larger than the number of atoms in the
         | known universe.
         | 
         | "Vastly larger" is one of the biggest understatements I've ever
         | heard. It's roughly 10^10^38 times larger.
         | 
         | I can't think of a better way to write that line. I just found
         | it funny.
        
       | nurettin wrote:
       | Previously on HN https://news.ycombinator.com/item?id=19474280
        
         | dang wrote:
         | Also https://news.ycombinator.com/item?id=19672835
         | 
         | and https://news.ycombinator.com/item?id=19644374
        
       | mikorym wrote:
       | Title, paper version:
       | 
       | > Integer multiplication in time O(n log n)
       | 
       | https://hal.archives-ouvertes.fr/hal-02070778/document
        
         | andyljones wrote:
         | For anyone else just looking for an outline of the new
         | algorithm: last two paragraphs of p4, first two paragraphs of
         | p5.
         | 
         | Coming from a position of a few abstract algebra classes in
         | college many years ago, all the words and notation are familiar
         | but I am a long way from being able to follow it.
        
       | zitterbewegung wrote:
       | If the title had "almost the perfect way" it would be the perfect
       | way to write this title.
       | 
       | But, leaving that out increases your earnings by a constant
       | factor.
        
       | rdtsc wrote:
       | The traditional method in the article looks a bit different from
       | how I learned it, which was this way https://www.wikihow.com/Do-
       | Double-Digit-Multiplication,
       | 
       | The overall description of it looks right, but in the picture the
       | order of operations would be more like B, C, D, A not A, B, C, D
        
         | Sharlin wrote:
         | Yeah, that's a bit nonstandard order but multiplication, of
         | course, is associative and commutative, so it doesn't really
         | matter.
        
         | JadeNB wrote:
         | Indeed--the advantage of doing it as you describe is that each
         | new multiplication gives you a new digit that can be recorded
         | immediately, rather than having to record the '2' from the 2 x
         | 6 multiplication and later amend it to '5'.
        
       | thelazydogsback wrote:
       | This is amazing to me: > "Their conjecture was based on a hunch
       | that an operation as fundamental as multiplication must have a
       | limit more elegant than n x log n x log(log n)"
       | 
       | Only in maths (and sometimes in its "implementation"/ expression
       | in nature in conjunction with physical systems) can such a
       | subjective viewpoint be so powerful.
       | 
       | Also blows me away, at least at first, that multiplying using an
       | _FFT_ would be the fastest method so far. (I guess I have an
       | older mindset based on how slow FFTs used to be.) I wonder in
       | actual implementations at how many digits the perf curves meet at
       | break-even compared to other methods.
        
         | gowld wrote:
         | When you say FFT is slow, did you know any way to multiply
         | large numbers that was fast? Or you assumed it was without
         | evidence?
        
           | thelazydogsback wrote:
           | Huh?
        
       | amelius wrote:
       | > Two decades ago, computers performed addition much faster than
       | multiplication. The speed gap between multiplication and addition
       | has narrowed considerably over the past 20 years to the point
       | where multiplication can be even faster than addition in some
       | chip architectures. With some hardware, "you could actually do
       | addition faster by telling the computer to do a multiplication
       | problem, which is just insane," Harvey said.
       | 
       | Is this true for popular architectures such as Intel and ARM?
        
         | corsix wrote:
         | An Intel Haswell chip, when doing floating point vector
         | operations, can dispatch one addition per cycle, or two fused-
         | multiply-add. The latency is worse for the FMA (5 cycles versus
         | 3 cycles), but if you've got enough parallel accumulations to
         | hide that latency, then using FMAs for addition will be faster
         | than addition.
         | 
         | This is a consequence of Intel optimising for FLOPs (or -
         | equivalently - for matrix multiplication), as an FMA counts as
         | two FLOPs, versus addition counting as just one.
        
           | gowld wrote:
           | Is that just because multiplication is so commonly needed
           | that there is double hardware allocated for FMA?
           | 
           | Like, if I had a special purpose adding machine and a special
           | purpose multiply machine, and I bought a second multiply
           | machine, then I could multiply large numbers faster than
           | adding, but that's not insane, that's resource allocation.
        
             | jcranmer wrote:
             | A floating point add works roughly like this:
             | 
             | 0. Extract mantissa and exponents from the inputs
             | [basically free in a HW implementation].
             | 
             | 1. Combine the two exponents and shift the mantissas so
             | that they same the share scale.
             | 
             | 2. Add the two mantissas together.
             | 
             | 3. Normalize the result back into a floating-point number.
             | 
             | Now look at the process for a floating-point multiply
             | operation:
             | 
             | 0. Extract mantissa and exponents from the inputs
             | [basically free in a HW implementation].
             | 
             | 1. Add the two exponents together.
             | 
             | 2. Multiply the two mantissas together.
             | 
             | 3. Normalize the result back into a floating-point number.
             | 
             | Steps 0 and 3 are identical in both cases. I'm not really
             | knowledgeable about hardware multiplier implementations,
             | but doing an additional "add" operation can be pretty close
             | to free in some implementations. In any case, adding in an
             | extra adder to the process is going to be quite cheap, even
             | if not free.
             | 
             | What this means is that there is not much extra hardware to
             | turn a FMUL ALU op into an FMA ALU op... and if you make
             | just a single FMA ALU op, you can use that hardware to
             | implement FADD, FMUL, and FMA with nothing more than
             | microcode.
             | 
             | In other words, instead of thinking of FMA as something
             | worth pouring more resources into, think of it as FADD as
             | not worth enough to be made its own independent unit.
        
           | ummonk wrote:
           | Well the algorithm for floating point addition is very
           | complicated so regardless of whether they were gaming the
           | flops you would expect floating point multiplication to be at
           | least as fast as floating point addition on most
           | architectures.
        
             | amelius wrote:
             | > the algorithm for floating point addition is very
             | complicated
             | 
             | Ah yes, of course. I think it is slightly misleading that
             | the article is mostly about large integer multiplications,
             | while that remark holds for floating point operations.
        
         | ummonk wrote:
         | Not really. Afaik integer multiplication still usually has
         | lower instruction per cycle throughput than addition and
         | obviously even if it were faster that would only apply to 32
         | but 64 bit etc integers not big integer multiplication.
        
           | coliveira wrote:
           | I think the article is talking about floating point addition.
           | Integer addition it is very hard to beat.
        
       | mumbisChungo wrote:
       | I think I'm misunderstanding something about the article. In what
       | way is this method perfect, as is mentioned multiple times? Is it
       | not just an incremental improvement over standard practice?
        
         | ghastmaster wrote:
         | "Perfect" in this case is a qualitative exaggeration bordering
         | on hyperbole.
         | 
         | > At the end of February, a team of computer scientists at
         | Aarhus University posted a paper arguing that if another
         | unproven conjecture is also true, this is indeed the fastest
         | way multiplication can be done.
        
           | mumbisChungo wrote:
           | I would say it's firmly hyperbole/clickbait, upon further
           | investigation.
        
             | bo1024 wrote:
             | I think it's fair. Nobody thinks a faster algorithm than n
             | log n will ever be possible. It's like a mini P!=NP.
        
         | hinkley wrote:
         | In computational complexity, we have some proofs that a certain
         | cost is the absolute bare minimum that an optimal solution must
         | expend. We have the same for some mechanical systems too.
         | 
         | If you get the order down to the proven limit, especially for
         | the first time, people sometimes use the word 'perfect', even
         | if there are potentially other solutions with lower constant
         | overhead, or lower practical overhead (see also Timsort, which
         | uses insertion sort for leaf operations).
         | 
         | > Second, in that same paper Schonhage and Strassen conjectured
         | that there should be an even faster algorithm than the one they
         | found -- a method that needs only n x log n single-digit
         | operations -- and that such an algorithm would be the fastest
         | possible.
         | 
         | Is this saying that there is a proof in there paper that nlogn
         | is the cheapest? If so it certainly beats around the bush.
        
           | remcob wrote:
           | A conjecture is a proposition that is suspected to be true.
           | So there would be no proof in the paper, but likely some
           | arguments why the authors expect it to be true.
        
             | hinkley wrote:
             | If a scientist were writing the article about the paper,
             | yes. But everyone has been quick to point out is definitely
             | not the case.
        
           | timwaagh wrote:
           | No (people think there isn't but it's not been proven).
        
           | EvgeniyZh wrote:
           | There is a (conditional) proof that nlogn is optimal:
           | https://arxiv.org/abs/1902.10935
        
         | ogogmad wrote:
         | It's perfect in the sense that its running time is O(n log n),
         | and there is evidence that this is the fastest running time
         | possible (at least up to a constant factor).
        
           | mumbisChungo wrote:
           | And that evidence is... an unproven theory, which itself is
           | also heavily qualified?
        
             | waynecochran wrote:
             | I would say most mathematical conjectures are stronger than
             | most scientific theories. What I mean is that the
             | conjectures are easily falsifiable (i.e., counter examples,
             | if found, can easily be verified) and have been shown to be
             | true for an enormous number of of cases. So unproven
             | mathematical conjectures are not just "wild hypotheses."
        
             | ogogmad wrote:
             | According to the article, it rests on a plausible
             | conjecture. I need to read further.
             | 
             | [edit]
             | 
             | According to the abstract of the paper referred to in the
             | article (https://arxiv.org/abs/1902.10935) it rests on a "
             | _central conjecture in the area of network coding_ ". Make
             | of that what you will.
        
               | mumbisChungo wrote:
               | From the abstract:
               | 
               | >In this work, we prove that if a central conjecture in
               | the area of network coding is true, then any constant
               | degree boolean circuit for multiplication must have size
               | O(nlgn), thus almost completely settling the complexity
               | of multiplication circuits
               | 
               | So, given an unproven assumption and also given that we
               | are constrained to doing multiplication on a boolean
               | circuit, this may be "perfect" in a sense.
               | 
               | Seems like the title is mostly just clickbait.
        
               | hinkley wrote:
               | Yeah but in the backstory they also mentioned that
               | _Kolmogorov_ conjectured that the optimal solution was
               | O(n^2).
               | 
               | Conjectures aren't doing so hot in this area, it seems.
        
               | ChrisLomont wrote:
               | >Conjectures aren't doing so hot in this area, it seems.
               | 
               | For that one counter-example how many conjectures turned
               | out to be true?
        
               | timwaagh wrote:
               | Ah but that's not what this game is about. It's about
               | what you can prove. And conjectures are by definition
               | things you can't prove yet. So the longer they stand and
               | the more paths they block the more frustrating it gets.
        
             | dogfoods wrote:
             | RTFA and find out!
        
         | dogfoods wrote:
         | It achieves a theoretical lower bound on the complexity. This
         | is what the article says in like 1000 words because it's
         | written for mathematical illiterates.
        
           | senkora wrote:
           | They do not prove a lower bound.
           | 
           | > Harvey and van der Hoeven's algorithm proves that
           | multiplication can be done in n x log n steps. However, it
           | doesn't prove that there's no faster way to do it.
           | Establishing that this is the best possible approach is much
           | more difficult. At the end of February, a team of computer
           | scientists at Aarhus University posted a paper arguing that
           | if another unproven conjecture is also true, this is indeed
           | the fastest way multiplication can be done.
        
           | mumbisChungo wrote:
           | Yes, a lower bound for multiplication using a boolean
           | circuit, resting on an unproven conjecture.
           | 
           | Even if the conjecture is proven to be true, we may find a
           | more efficient way to perform this operation than a boolean
           | circuit.
        
       | tdons wrote:
       | Respect :). I Implemented Karatsuba a while ago and wondered if
       | an even faster method would be available. It is!
       | 
       | I admire the perseverance, grit, and intelligence needed to crack
       | hard nuts like this.
        
         | jacobolus wrote:
         | This new method will be marginally faster if you ever need to
         | multiply two numbers on a scale where writing down the bits
         | fills every plank-length cube in the known universe.
         | 
         | Otherwise the "practical" benefit is limited to saving a few
         | symbols when writing abstract math papers.
        
           | the8472 wrote:
           | > This new method will be marginally faster if you ever need
           | to multiply two numbers on a scale where writing down the
           | bits fills every plank-length cube in the known universe.
           | 
           | If we're talking about physical implementations then you
           | would also have to take the delay between those planck
           | volumes into account
        
         | [deleted]
        
       | JakeStone wrote:
       | Gotta admit, I started chuckling at "We use [the fast Fourier
       | transform] in a much more violent way..." I've known some math
       | geeks in my day, and this sounded like them.
        
       ___________________________________________________________________
       (page generated 2020-07-22 23:00 UTC)