[HN Gopher] Discovering faster matrix multiplication algorithms ...
       ___________________________________________________________________
        
       Discovering faster matrix multiplication algorithms with
       reinforcement learning
        
       Author : shantanu_sharma
       Score  : 331 points
       Date   : 2022-10-05 15:12 UTC (7 hours ago)
        
 (HTM) web link (www.nature.com)
 (TXT) w3m dump (www.nature.com)
        
       | [deleted]
        
       | didibus wrote:
       | Can someone knowledgeable tell me if it discovered an algorithm
       | we can understand and re-implement ourself, like is the pseudo-
       | code for it known? Or is it kind of stuck in the infered function
       | of the ML model?
        
         | bugzz wrote:
         | yes, they implemented the resulting algorithms and have charts
         | showing real world performance on GPUs and TPUs
        
       | ckrapu wrote:
       | 50% fewer operations for skewsymmetric matrix vector product is
       | pretty big, IMO.
        
       | xpe wrote:
       | The paper is a great read -- an interesting approach, fun
       | theoretical comparisons, plus benchmarks for optimized algorithms
       | for specific hardware (CPU and GPU, allowing for differing
       | instruction sets).
       | 
       | But it doesn't stop there; from the "Discussion" section:
       | 
       | > One important strength of AlphaTensor is its flexibility to
       | support complex stochastic and non-differentiable rewards (from
       | the tensor rank to practical efficiency on specific hardware), in
       | addition to finding algorithms for custom operations in a wide
       | variety of spaces (such as finite fields). We believe this will
       | spur applications of AlphaTensor towards designing algorithms
       | that optimize metrics that we did not consider here, such as
       | numerical stability or energy usage.
        
       | machinekob wrote:
       | Wasn't this based on some other paper? I have read something like
       | that a year+ (?) ago also about deep learning based matrix
       | multiplication boost.
        
       | KKKKkkkk1 wrote:
       | Reinforcement learning is usually introduced as a technique to
       | train an agent in simulation that would then be let loose in a
       | real environment. Interesting that they choose to treat it is a
       | search strategy.
        
       | mikewarot wrote:
       | I have a gut feeling that there is a faster way to compute
       | logarithms going from least to most significant bit. How would I
       | go about using ML to find it?
       | 
       | [Edit] I think Feynman's algorithm might do it:
       | 
       | "Consider the problem of finding the logarithm of a fractional
       | number between 1 and 2. (The algorithm can be generalized without
       | too much difficulty.) Feynman observed that any such number can
       | be uniquely represented as a product of numbers of the form 1 +
       | 2^(-k), where k is an integer. Testing for the presence of each
       | of these factors in a binary representation is simply a matter of
       | a shift and a subtraction. Once the factors are determined, the
       | logarithm can be computed by adding together the precomputed
       | logarithms of the factors. The algorithm fit the Connection
       | Machine especially well because the small table of the logarithms
       | of 1 + 2^(-k) could be shared by all the processors. The entire
       | computation took less time than doing a division."
        
         | radford-neal wrote:
         | Seems similar to the CORDIC algorithm
         | (https://en.wikipedia.org/wiki/CORDIC).
        
         | adgjlsfhk1 wrote:
         | note that this algorithm will no longer be good. for precisions
         | up to roughly 1000 bits small tables combined with minimax
         | polynomials are optimal and for higher precision, you want to
         | use more complicated methods. if you're interested, the ARB
         | library unlikely the currently fastest known methods.
        
         | xpe wrote:
         | Start with these questions:
         | 
         | 1. How big is the search space?
         | 
         | 2. What analysis approaches are likely to bear fruit for the
         | search space? (theoretical analysis? optimization?)
         | 
         | 3. If optimization is called for, what kind?
        
       | svnt wrote:
       | The big claim turns out to be a little overstated. The claim:
       | 
       | > AlphaTensor's algorithm improves on Strassen's two-level
       | algorithm for the first time, to our knowledge, since its
       | discovery 50 years ago.
       | 
       | reduces to:
       | 
       | > AlphaTensor discovers algorithms that outperform the Strassen-
       | square algorithm, which is a fast algorithm for large square
       | matrices31,32. Although the discovered algorithm has the same
       | theoretical complexity as Strassen-square, it outperforms it in
       | practice, as it is optimized for the considered hardware.
       | Interestingly, AlphaTensor finds algorithms with a larger number
       | of additions compared with Strassen-square (or equivalently,
       | denser decompositions), but the discovered algorithms generate
       | individual operations that can be efficiently fused by the
       | specific XLA33 grouping procedure and thus are more tailored
       | towards the compiler stack we use. The algorithms found by
       | AlphaTensor also provide gains on matrix sizes larger than what
       | they were optimized for. Finally, Fig. 5c shows the importance of
       | tailoring to particular hardware, as algorithms optimized for one
       | hardware do not perform as well on other hardware.
       | 
       | So they improved on Strassen's 50-year-old algorithm by
       | optimizing it for hardware that has only existed for a few years
       | and de-optimizing it for human use? There is so much cool stuff
       | here without claiming to do something you really didn't.
        
         | atty wrote:
         | On the other hand, real world performance is really the only
         | useful metric for matrix multiplication. I don't really care
         | about the theoretical performance, or the number of operations.
         | Not disagreeing with your take that the claim is grandiose,
         | just pointing out that finding a generalizable way to automate
         | the improvement of what is almost certainly the most important
         | mathematical operation a computer can do is worth some
         | attention. It also suggests other mathematical operations could
         | be improved this way as well - potentially algorithms that
         | haven't gotten as much theoretical or practical attention as
         | matrix multiplication. With all that said, they even point out
         | that other people have worked on this before using other
         | optimization strategies. I'd guess they got into Nature either
         | by using the Deepmind name, because Nature really loves
         | reinforcement learning because it's a cool topic that draws
         | views, or because they're the first group to find their
         | algorithm leads to real improvements. (Probably a mixture of
         | all three, I'd guess)
        
           | throwawaymaths wrote:
           | > the number of operations
           | 
           | This is a very good proxy for actual real world speed. It's
           | pretty much "as good as it gets" for most straight
           | computational tasks, though sometimes memory movement is your
           | real bottleneck.
        
             | fulafel wrote:
             | It's become a much worse proxy since the memory wall.
        
             | sdenton4 wrote:
             | This is questionable...
             | 
             | The 'best' algorithms for matrix multiplication are
             | galactic algorithms that provide no actual benefit. Raw
             | operation counts are a good proxy for speed, but the big-O
             | complexity that people actually chase hasn't been
             | especially helpful for this problem in the last twenty+
             | years.
             | 
             | https://en.wikipedia.org/wiki/Matrix_multiplication_algorit
             | h...
        
               | throwawaymaths wrote:
               | Probably the most common matrix multiplication is (nx9) x
               | (9xm) (9 = 3x3 cells from a convnet) . If you can
               | optimize the shit out of those you might be in business
               | for something interesting.
               | 
               | Though to be honest the real slow step in machine
               | learning is training and the slow step in training is the
               | _outer product_ of two matrices.... I don 't believe
               | there is an algorithmic way out of that one.
               | 
               | For non-ml/non-GF purposes, you might also worry about
               | numerical stability of these matrix multiplications,
               | which is not addressed in this paper.
        
               | a1369209993 wrote:
               | > and the slow step in training is the _outer product_ of
               | two matrices.... I don 't believe there is an algorithmic
               | way out of that one.
               | 
               | Well, there are several, but the obvious ones tend to
               | require strange or unrealistic assumptions about the
               | hardware. The most obvious such assumption, IMO, being
               | that the hardware is arranged in 3D space in a manner
               | roughly analogous to a human brain, which tends to be at
               | odds with the common practice of mostly-planar
               | photolithography, and with the preference to be able to
               | change the network topology experimentally without
               | building new hardware.
        
           | dekhn wrote:
           | Science proceeds at the rate by which linearly larger
           | matrices can be decomposed in less than exponential time.
        
             | matheusd wrote:
             | This is excellent! You should make a T-Shirt with this
             | quote on it!
        
           | psychphysic wrote:
           | Surely theoretical improvement begets real world, especially
           | in the context of highly specialised hardware.
           | 
           | It was with theoretical performance improvement that
           | motivated the creation of SIMD and led to real world speed
           | ups.
        
             | fulafel wrote:
             | It's interesting how both SIMD and big-o have had so
             | different reasons for being of limited but still
             | significant relevance.
             | 
             | SIMD, and vector processors like it was called in the 70s,
             | delivered practical speedups in simple benchmarks right
             | away but most applications dont't take advantage because of
             | SW engineering reasons. Whereas big-O improvement ignores
             | important components of performance per unit of time
             | (memory access and constant factors) and is purely
             | theoretical in an essential sense.
        
             | taeric wrote:
             | Not necessarily. In particular, this could have slower
             | growth, but still higher cost on normal workloads.
        
             | chrchang523 wrote:
             | Sometimes. In the case of matrix multiplication, there is a
             | pretty large backlog of "galactic algorithms" going down to
             | ~O(n^2.373) that haven't yet led to real-world
             | improvements. Strassen's ~O(n^2.807) algorithm is only
             | considered over the basic O(n^3) strategy for n>1000.
        
               | HarHarVeryFunny wrote:
               | Winograd O(n^2.37) is a win for 3x3 convolutions in
               | cuDNN, so it can be implemented efficiently.
        
               | chrchang523 wrote:
               | My understanding is that the Winograd minimal filtering
               | algorithms used in cuDNN are different from the O(n^2.37)
               | Coppersmith-Winograd-descended matrix multiplication
               | algorithms. But I acknowledge that these can be
               | considered cousins, produced by the same line of
               | research.
        
               | HarHarVeryFunny wrote:
               | I'm pretty sure there's no difference. It does seem to be
               | pretty hard to turn the theoretical win into a practical
               | one - the GPU kernel needs to be coded extremely
               | efficiently to match the underlying hardware. AFAIK it's
               | only a win for 3x3 - maybe for one other size too.
               | Originally Winograd wasn't supported by cuDNN on NVidia's
               | Tensor Cores (matmul-specific hardware on more recentish
               | GPUs), vs CUDA cores, but a Google search seems to
               | indicate it can be done - not sure if that's in cuDNN
               | though.
        
         | letitgo12345 wrote:
         | I think they find a N^2.778 complexity algorithm (obtained by
         | applying their 4x4 discovered algorithm recursively) so this is
         | not correct.
        
         | Ar-Curunir wrote:
         | You're wrong. They discovered an algorithm with fewer
         | multiplications for 4x4 matrices.
         | 
         | They also discovered better algorithms for other dimensions.
        
         | dekhn wrote:
         | That's the DeepMind modus operandi- and why they publish in
         | Nature. It's in their best interest to claim as much covered
         | ground as possible so they can keep working towards their end-
         | goal. But most people who read Nature already know that results
         | in that journal are overhyped. You just have to apply a
         | specific prior/translator when you read their papers.
        
           | noobermin wrote:
           | This really is a depressingly true sentiment. Nature is both
           | the most important publication but so often, the big shots
           | publish overhyped stuff there that doesn't deserve it but get
           | away with it since they are bigshots. Underscore here for
           | another piece of evidence of the dysfunction in science.
        
             | brutusborn wrote:
             | I don't doubt the sentiment is true, but hasn't this kind
             | of science always been this way? By that I mean authors
             | inflating the importance of their work; everyone wants to
             | be seen as having the biggest breakthroughs.
             | 
             | When I think of dysfunction in 'science' I usually think of
             | unfalsifiable hypothesis, the repeatability crisis in
             | Psychology, p-hacking in Medicine, misuse of statistical
             | methods in Economics and other epistemic issues, but I
             | don't think of exaggerations like this.
        
               | dekhn wrote:
               | Scientists have long been self-promoters who desire that
               | their theories become the dominant ones and they use many
               | techniques to achieve this.
               | 
               | However, the trend towards maximizing the predicted
               | outcomes of your research really took off during the
               | human genomics project.
        
               | adalacelove wrote:
               | > Scientists have long been self-promoters
               | 
               | The famous ones more than the rest, I guess. To my mind
               | nevertheless comes Cavendish, my hero.
        
               | YorkshireSeason wrote:
               | I would imagine that it's the metrics universities and
               | funding agencies apply in promotion decisions. For
               | example, my (well-known) university decided to measure
               | impact, and takes "Twitter engagement" as one proxy
               | metric for impact -- against my explicit recommendations.
               | I'll leave the consequences to everybody's imagination.
        
               | dekhn wrote:
               | Yes. In fact many scientists post to twitter but don't
               | look at the replies (Emily Bender is an example) or even
               | block you if you disagree with them. That's not
               | engagement and I wonder what the dean would do about a
               | promotion where the scientist just sort of blathered on
               | twitter and had lots of followers, but wasn't actually
               | providing any real scientific value (again, Emily Bender
               | is an example).
        
               | YorkshireSeason wrote:
               | Deans typically lack detailed technical knowledge to
               | evaluate candidates (not to mention time), they just go
               | with the flow, and want to run the department smoothly.
               | 
               | I think the real mechanism is this: those who make the
               | decision (those with the most power, i.e. those who bring
               | in the most funding and can threaten to leave if they
               | don't get their way) already know whom they want and they
               | cherry-pick data, eg "impact" figures, to bolster their
               | case. That enables the dean then to justify the decision
               | in public with those cherry-picked figures ... (the dean
               | can hardly say we are hiring X because otherwise top
               | funding getter Y will leave)
               | 
               | I think Twitter is less important in STEM subjects than
               | in social sciences or humanities, as STEM has more
               | clearcut results.
        
         | chabons wrote:
         | Maybe I'm mis-reading the paper, but my interpretation was that
         | the algorithm discovered using a reward which scales with HW
         | performance matches the theoretical complexity of Strassen, but
         | is more performant on the HW.
         | 
         | They _also_ identified algorithms which do fewer matrix
         | multiplications than Strassen, improving the lower bound of
         | matrix multiplies required (They highlight this in Fig 3).
         | 
         | In that light, I thought their claim was fair. They've
         | discovered (different) algorithms which are both theoretically
         | and practically better than Strassen's.
        
           | [deleted]
        
         | boole1854 wrote:
         | This is not a correct summary of the results of the paper.
         | 
         | First, you cut out the initial part of the sentence about
         | improving on Strassen's two-level algorithm. Here is the
         | complete sentence:
         | 
         | > Particularly relevant is the case of 4 x 4 matrices in a
         | finite field, where AlphaTensor's algorithm improves on
         | Strassen's two-level algorithm for the first time, to our
         | knowledge, since its discovery 50 years ago.
         | 
         | That is, note that the improvement cited in the latter part of
         | the sentence is for the _4 x 4 matrix case_.
         | 
         | But then your next quote is not in reference to 4 x 4 matrices.
         | That is, for _larger matrix sizes_ , the best algorithm
         | AlphaTensor discovered had the same theoretical complexity as
         | Strassen-square, but better performance.
         | 
         |  _EDIT: my next comments are confused and best ignored. See
         | reply from pxx below._
         | 
         | For the 4 x 4 matrix case, the theoretical complexity of
         | AlphaTensor's algorithm was O(N^2.778) compared with the
         | Strassen algorithm's complexity of O(N^2.8074), which _is_ an
         | improvement.
        
           | Nihmie wrote:
           | To be pedantic about this complexity estimate for the 4 x 4
           | case, both of them reduce to a O(1). If N == 4, it's a
           | constant, so we have O(4^2.778) == O(4^2.8074) == O(1).
           | 
           | Talking about _scaling_ for a problem that has no scaling
           | factor is a bit odd.
        
             | vlovich123 wrote:
             | If you have to process many such matrices, you're not at
             | O(1). I would imagine that's what the O is referring to in
             | this case.
        
               | cochne wrote:
               | If you look at it that way then any algorithm is just
               | o(n) where n is the number of matrices. O does not care
               | about constant factors
        
             | est31 wrote:
             | Yeah the paper uses O notation for complexity, but never
             | when N is constant. For constant N and M, the paper uses
             | the number of scalar multiplication operations performed as
             | complexity measure. According to Figure 3, their algorithm
             | has decreased the best known value from 49 to 47 for 4x4
             | matrices, and from 98 to 96 for 5x5 matrices (as well as
             | some further state of the art improvements).
        
           | dgacmu wrote:
           | In Z_2 - modular matrix multiplication, not conventional
           | matrix multiplication. And in this case, modular over what I
           | assume is GF(2), i.e., single bits.
        
             | aaaaaaaaaaab wrote:
             | Modular multiplication over GF(2)... also known as AND?
        
           | pxx wrote:
           | what, no, you have this all wrong. the complexity of
           | multiplying two 4x4 matrices is clearly constant. the
           | important thing about the constant, however, is that you can
           | use this primitive on the block matrices recursively to
           | possibly improve the asymptotic complexity.
           | 
           | the complexity improvement in this paper is for arithmetic in
           | Z_2 (modular arithmetic over single bits). in standard
           | arithmetic, there is no asymptotic complexity improvement but
           | the developed algorithms involve more efficiently fusible[0]
           | operations, which is useful for the target hardware.
           | 
           | [0] ed: this used to read 'with fewer multiplications,' which
           | is clearly wrong: fewer multiplications would lower
           | complexity. Interestingly, the discovered algorithm seems to
           | have a larger number of additions, but still runs faster.
        
             | ColinWright wrote:
             | I thought that savings on the number of multiplications
             | leads to savings in complexity. For example, multiplying
             | 2x2 matrices using 7 multiplications instead of 8 gives an
             | improvement in complexity.
             | 
             | See my comment here:
             | https://news.ycombinator.com/item?id=33098192
             | 
             | It's possible that my understanding is completely wrong,
             | but your comment is at odds with my other reading, so it
             | would be useful to get some clarification.
        
               | pxx wrote:
               | sorry; whoops, I don't know why I said multiplications
               | there. I somehow thought the "more additions" somehow
               | meant "fewer multiplications," but no, it's just "more
               | additions but faster performance on target." edited.
               | 
               | your other comment notes that for some shapes, even in
               | standard arithmetic, they have found algorithms with
               | fewer multiplications than best-known. but those don't
               | seem to extend to an algorithm with better asymptotic
               | complexity for multiplying general matrices. otherwise
               | I'd assume they'd claim them :)
        
               | sdenton4 wrote:
               | This is correct, as I understand it.
               | 
               | Most complexity measures are defined in terms of
               | asymptotic behavior, so any specific finite algorithm has
               | 'constant' complexity. This is an obviously unhelpful bit
               | of pedantry in the theory side.
               | 
               | In fact, it's as you say; we can compute a 'raw'
               | complexity for a finite operation (eg, raw count of
               | arithmetic ops), and then use that operation as a
               | primitive to create an algorithm with an asymptotic
               | complexity which depends directly on the finite
               | operation's 'raw' complexity.
               | 
               | (complexity theory, and specifically for matrix
               | multiplication, is a great example of metrics becoming
               | targets becoming bad incentives, a la Goodhart's Law. The
               | so-called best algorithms are 'galactic' and therefore of
               | no use to anyone. There's some fuzzy-headed hope that one
               | day the people chasing the big-O exponent metric will
               | come up with something practically useful, buuuuuut...
               | you actually gotta work practical problem to solve the
               | practical problem, and that requires different metrics.
               | This seems to be what's motivating the work under
               | discussion here.)
        
             | boole1854 wrote:
             | You are correct. I will edit.
        
             | a-dub wrote:
             | if you're in gf2 on a traditional cpu and care about wall
             | clock time, i start to wonder how something like simd xors
             | and parity would compare...
        
           | skybrian wrote:
           | Isn't multiplying two 4x4 matrices a finite problem? What
           | does big O complexity mean in that case?
        
             | Rioghasarig wrote:
             | My understanding of this situation is that they found a way
             | to multiply 4x4 matrices that requires fewer
             | multiplications than Strassen's . This implies that that
             | can generalize to a way to multiply n x n matrices with a
             | better theoretical complexity than Strassen's algorithm.
        
             | adgjlsfhk1 wrote:
             | any fixed size multiplication algorithm has a corresponding
             | general algorithm where you recursively divide a matrix
             | into blocks and treat each block as an element. for any nxn
             | algorithm that takes k multiplications, this yields a
             | general algorithm with runtime n^log_n(k)
        
         | ColinWright wrote:
         | How can I reconcile your comment here with the comment in
         | https://news.ycombinator.com/item?id=33098192 ??
        
         | throwawaymaths wrote:
         | No, they count multiplications, 47 vs 49 in the 4x4 case. This
         | assumes that multiplications are the heavy lift relative to
         | additions, memory moves, etc. this is not a bad assumption, and
         | is hardware independent, although there are some proposed
         | 'exotic' hardware solutions where addition and multiplication
         | are equally difficult (via lookup table, limited to about 8-bit
         | x 8-bit) or where addition is harder than multiplication (log
         | space number systems) but I don't think these have been
         | deployed
        
           | lifthrasiir wrote:
           | That assumption didn't age very well. In many if not most
           | modern architectures multiplication still does take more
           | cycles than addition but can have a higher throughput if
           | scheduled well, and fused multiply-add can be as fast as a
           | single multiplication, essentially giving a free addition.
        
             | throwawaymaths wrote:
             | > but can have a higher throughput if scheduled well
             | 
             | Unlikely to be true for matrix multiplications, which have
             | well-defined data dependencies.
             | 
             | > and fused multiply-add can be as fast as a single
             | multiplication, essentially giving a free addition.
             | 
             | Yes, this supports the assumption that multiplication is
             | the heavy lift.
        
               | tlb wrote:
               | With sufficiently parallel hardware, you can do an entire
               | mat4 * mat4 multiplication in 3 cycles. First do all 64
               | multiplications in parallel, then do 32 adds, then 16
               | adds to get the final answer.
               | 
               | For operations in GF(2) where they claim a result, a
               | multiply is just an AND gate and an add is an XOR gate.
               | So the fully parallel hardware version is 64 AND gates
               | and 48 XOR gates, with a total gate delay of 3. This is a
               | trivial amount of hardware and could easily be an
               | instruction in some alternate universe where it was
               | useful.
        
               | klyrs wrote:
               | You missed the part where the "4x4 matrix multiplication"
               | refers to multiplication of NxN matrices logically
               | partitioned into sixteen blocks of size (N/4)x(N/4), done
               | recursively.
        
       | WhitneyLand wrote:
       | In general to speed up matrix operations how much is some sort of
       | result caching applied?
       | 
       | For example if you profile calculations done when training a
       | model, is there any significant repetition happening that would
       | allow some kind of benefit from table lookups of certain
       | solutions?
        
       | bitwize wrote:
       | It's begun. AI is improving itself.
        
       | chillee wrote:
       | Quoting myself on twitter:
       | 
       | https://twitter.com/cHHillee/status/1577713102434361344
       | 
       | I'm quite suspicious about their hardware benchmarks. They're not
       | writing custom kernels, they're relying on a graph compiler like
       | XLA to automatically fuse their decomposed matmuls (and my guess
       | is that XLA will not be very good at this).
       | 
       | Moreover, as far as I can tell, they don't report absolute
       | performance numbers anywhere. In other words, I suspect that a
       | naive N^3 matrix multiplication would absolutely smoke them in
       | performance.
        
         | [deleted]
        
       | puffoflogic wrote:
       | Why the heck is a math paper in Nature? I'd put Nature somewhere
       | just below vixra in terms of math credibility.
        
         | 323 wrote:
         | If you could find a way to train cats to solve integrals,
         | wouldn't that be a Nature paper and not a math one?
        
       | optimalsolver wrote:
       | Now, were these algorithms discovered, or invented?
       | 
       | I.e., have they always been there, just sitting in Platonic space
       | waiting for a conscious mind to stumble across them, or have they
       | just now popped into existence?
        
         | WFHRenaissance wrote:
         | Discovered. I'd go as far to say that all "invention" is
         | actually just discovery that we find useful.
        
       | bjourne wrote:
       | Never heard of the "matrix multiplication tensor" before. Is
       | there some digestible blog post or article somewhere explaining
       | what it is and why it is useful?
        
       | HarHarVeryFunny wrote:
       | Can someone explain the approach here?
       | 
       | I understand that they transform the problem into a gamified 3-D
       | matrix decomposition, but what exactly is the motivation for
       | using RL to beat this decomposition game ? Why not just use, for
       | example, an evolution algorithm to find incrementally better
       | decompositions ?
        
       | a-dub wrote:
       | this is cool. i suppose it's only a matter of time until we see
       | optimizing compilers that use transformer-rl style searches for
       | subsets of their optimization and codegen.
        
         | mhh__ wrote:
         | Generating pgo data with graph ML is already a thing
        
           | a-dub wrote:
           | i'm guessing most of that stuff is register allocation and
           | cache management.
           | 
           | rl-searches could potentially rewrite entire algorithms.
           | 
           | edit: (more musing) maybe the future of software engineering
           | will be authoring bullet proof acceptance tests.
        
         | Psyladine wrote:
         | Nevermind compilers, the possibility of updating architectures
         | & code bases by automation. A true feasible attack on tech debt
         | and legacy code.
        
         | danuker wrote:
         | Will also generate some fun bugs!
        
           | mirekrusin wrote:
           | Unless it provides or simply works within formal
           | verification?
        
             | f1shy wrote:
             | Formal verification does not prove lack of bugs. In best
             | case, can only catch one certain type of bugs.
             | 
             | https://smartech.gatech.edu/handle/1853/62855
        
               | adrianN wrote:
               | Formal verification is very good at proving that compiler
               | transformations preserve semantics. Programming language
               | semantics are pretty well specified (at least for some
               | programming languages...), so the chance of bugs in the
               | specification are low.
        
               | wokkel wrote:
               | It can catch all kinds of bugs, but you have to ask the
               | correct questions. So it comes down to define what a bug
               | is and the assumptions you use for testing. And therein
               | lies the problem: what constitutes a bug? A function that
               | add two numbers but bever terminates might be considered
               | bugfree if forget to include as a bug that not giving an
               | answer before the end of the universe is faulty
               | behaviour. We humans are terrible at writing these kind
               | of specifications, so formal verification as a method
               | just pushes the correctness from code to specification.
        
           | hdjjhhvvhga wrote:
           | Relax - we'll soon have self-learning debuggers, too.
        
           | iKlsR wrote:
           | Whenever I see stuff about automated algorithms it reminds me
           | of a movie, can't remember the name or plot exactly but they
           | create a being or upload someones consciousness and the "ai"
           | is solving all the problems the world faces until it reaches
           | the last one, humans. It then begins to synthesize some
           | airborne virus or nukes that would scour us from the earth.
           | We basically created tech that would eventually optimize us
           | out of existence as a cancer to the earth.
           | 
           | Edit, found it Transcendence (2014), also it seems my brain
           | injected some random thoughts into that plot, it's not about
           | that at all but the point stands!
        
             | haolez wrote:
             | Transendence is one of the few movies that present an
             | almighty AI that's benevolent in nature.
        
             | ASalazarMX wrote:
             | Must have been an embarrassing incident for the engineer
             | that commented out the "Don't kill humans" constraint for a
             | quick debug session.
        
           | jeffbee wrote:
           | Superoptimizers are already extremely good at discovering
           | latent flaws in high level source code, and in compilers.
        
       | ColinWright wrote:
       | Quoting:
       | 
       | > _... AlphaTensor finds an algorithm for multiplying 4x4
       | matrices using 47 multiplications in Z_2 , thereby outperforming
       | Strassen's two-level algorithm, which involves 7^2 = 49
       | multiplications. By applying this algorithm recursively, one
       | obtains a practical matrix multiplication algorithm in Z_2 with
       | complexity O(N^2.778)._
       | 
       | > _Moreover, AlphaTensor discovers efficient algorithms for
       | multiplying matrices in standard arithmetic; for example,
       | AlphaTensor finds a rank-76 decomposition of T_{4,5,5}, improving
       | over the previous state-of-the-art complexity of 80
       | multiplications._
        
       | loufe wrote:
       | This is completely besides the matter, but reading "provably" in
       | the abstract is a frank reminder of how terrible English
       | spelling/pronounciation is. I can't imagine I'm the only well-
       | read native English speaker who read this as "prov-ably" on first
       | take. I don't know about most languages, but you just don't get
       | nonsense like this in French, at least.
        
         | eigenschwarz wrote:
         | https://www.merriam-webster.com/dictionary/provably
         | 
         | Edit: Sorry, missed your point on first reading, which perhaps
         | reinforces your point. I read papers like this for a living so
         | the language is unambiguous but I see how it might not be for
         | others, even if English is their first language.
        
         | pwdisswordfish9 wrote:
         | I agree. This is completely besides the matter.
        
         | evmar wrote:
         | English is terrible for sure, but this comment is pretty funny
         | to compare it to the language that spells /kes k@ se/ as
         | "qu'est-ce que c'est".
        
         | elcomet wrote:
         | I don't understand your point, provably means which can be
         | proven, is there another meaning I'm missing ?
        
           | ogogmad wrote:
           | He's talking about the spelling not matching the
           | pronunciation
        
           | 613style wrote:
           | Think "probably" vs "provably" (it's about
           | pronunciation/spelling)
        
       ___________________________________________________________________
       (page generated 2022-10-05 23:00 UTC)