[HN Gopher] Matrix Multiplication
       ___________________________________________________________________
        
       Matrix Multiplication
        
       Author : jonbaer
       Score  : 105 points
       Date   : 2022-01-17 20:40 UTC (2 hours ago)
        
 (HTM) web link (matrixmultiplication.xyz)
 (TXT) w3m dump (matrixmultiplication.xyz)
        
       | d--b wrote:
       | Visually, the rotation part makes it much less intuitive than
       | writing the result matrix next to the first operand and the
       | second operand above the result matrix
        
         | drhodes wrote:
         | Denis Auroux explains that way of seeing it,
         | https://youtu.be/bHdzkFrgRcA?t=1404
        
           | neosat wrote:
           | so useful - thanks!
        
         | tinalumfoil wrote:
         | I agree. Matrix multiplication can pretty succinctly be defined
         | in terms of the dot product between the row and column of the
         | inputs at the corresponding spot in the output. When I first
         | learned I already new a little about the properties of dot
         | products so it made the "magical" properties of matrix
         | multiplication more understandable. Maybe not the way a
         | mathematician would define it -- they might look at dot
         | products as a special case of matrix multiplication thus teach
         | the latter first -- but it was at least intuitive.
        
       | axpy906 wrote:
       | I like the visual look of the sliding window.
        
         | phkahler wrote:
         | I did until I realized it's computing more than one dot product
         | at a time. Then it was suddenly not great. It falls somewhere
         | between standard matrix multiplication and what might
         | eventually be a good way to understand a systolic array
         | multiplier.
        
       | mrtksn wrote:
       | The process itself is visualised here well but can someone
       | explain why anyone would want to multiply matrixes? Yes, I know
       | where it is used and I did my fair share back in college but it
       | never really clicked for me in intuitive manner.
       | 
       | Why simply don't go procedural and skip the matrix representation
       | altogether? Using matrixes in physics calculations, for example,
       | feels like disassociating the intuitiveness of the physics for
       | the sake of the matrix calculation tools. When I look at a matrix
       | of values I don't see anything.
        
         | kzrdude wrote:
         | The benefit is reusing the matrix theory - matrix knowledge(!)
         | - in different contexts.
         | 
         | In some cases making a matrix in physics is a quite sterile
         | representation, like you suggest, like plugging in the numbers
         | in an otherwise generally stated question. The matrix values
         | depend on choices of coordinate system and everything, quite
         | inelegant in that way.
         | 
         | Why practically use matrixes on the computer? It's efficient to
         | compute in blocks instead of one by one.
        
         | gizmo686 wrote:
         | Most of the time, the objects you are actually interested on
         | are not NxM grids of numbers, but linear functions. For
         | simplicity, I will assume we are working with real numbers.
         | 
         | Suppose you have two linear functions:                 f :: R3
         | -> R3       h :: R3 -> R2
         | 
         | and a point:                 x :: R3
         | 
         | The first problem you have if you want to work with these
         | functions is how to represent them. As it turns out, assuming
         | you have a set of basis vectors, any linear function Rn -> Rm
         | can be represented by exactly n * m real numbers. Further, if
         | we arrange these numbers in a grid it is easy to fill in the
         | grid if you know how the function acts on the basis vectors,
         | and it is easy to read off how the function acts on the basis
         | vectors from the grid.
         | 
         | Similarly, a point in Rn can be represented by exactly n real
         | numbers. Again, this representation depends on a choice of
         | basis vectors.
         | 
         | For simplicity, let [?] represent the matrix representation of
         | ?.
         | 
         | The second problem you want to tackle is how to compute
         | function application y = f(x). Given the representations
         | defined above, y is the point in R3 that is represented by the
         | result of the matrix multiplication [f][x].
         | 
         | The third problem you want to solve is how to compute the
         | function composition g = f[?]h. Again, g is the function given
         | by the matrix representation [g] = [f][h]
        
         | paulpauper wrote:
         | a major application is markov chains
        
           | mrtksn wrote:
           | How do you think about it in intuitive way?
        
       | max_ wrote:
       | This is what you should show students before sitting them through
       | a 1 hour class lecture.
        
       | clircle wrote:
       | Is the connotation that this is the most intuitive way to think
       | about or visualize matrix multiplication? If so, I reject.
        
         | windsignaling wrote:
         | Agreed.
         | 
         | Posts like this seem to get upvoted on HN just because "oooh, a
         | pretty animation" and/or they struggled with college math "See?
         | This is how they should've taught it! Pictures are better than
         | lectures!"
        
       | Twisol wrote:
       | I'd guess that the "sliding window" metaphor means you could
       | model matrix multiplication as a convolution. Does this give any
       | intuition for matrix multiplication via the Fast Fourier
       | Transform?
       | 
       | https://en.m.wikipedia.org/wiki/Convolution_theorem
       | 
       | > the convolution theorem states that under suitable conditions
       | the Fourier transform of a convolution of two functions (or
       | signals) is the pointwise product of their Fourier transforms.
        
         | amelius wrote:
         | The fastest known matrix multiplication algorithm is the
         | Coppersmith-Winograd algorithm with a complexity of
         | O(n^2.3737). FFT is O(n log n). So probably not.
        
           | Twisol wrote:
           | Apologies; apparently I was thinking of _integer_
           | multiplication using FFT:
           | 
           | https://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strasse.
           | ..
           | 
           | But my question (which wasn't about asymptotic complexity)
           | stands: can we think about matrix multiplication as a
           | convolution? If so, we can do pointwise multiplication
           | sandwiched between Fourier transforms -- I don't expect it to
           | be _fast_ , I just expect it to be possible.
        
       | ogogmad wrote:
       | My advice:
       | 
       | - Learn how to multiply a matrix M by a column vector v.
       | 
       | - Generalise that by noticing that M[u|v|...] = [Mu|Mv|...].
       | 
       | That's matrix multiplication.
        
         | madars wrote:
         | Yep! Wikipedia has a very useful visual
         | https://en.wikipedia.org/wiki/Matrix_multiplication#/media/F...
        
         | qsort wrote:
         | After all it's called _linear_ algebra for a reason :)
        
       | contravariant wrote:
       | It looks so innocuous when you write it as
       | 
       | (A B)_ij = A_ik B_kj.
        
         | ogogmad wrote:
         | Using the Einstein convention*. Otherwise that can be read
         | wrong.
         | 
         | * https://en.wikipedia.org/wiki/Einstein_notation
        
       | carlosgg wrote:
       | Discussion from 5 years ago:
       | https://news.ycombinator.com/item?id=13036386
        
         | dang wrote:
         | Thanks! Macroexpanded:
         | 
         |  _Matrix Multiplication_ -
         | https://news.ycombinator.com/item?id=24141472 - Aug 2020 (1
         | comment)
         | 
         |  _Matrix Multiplication_ -
         | https://news.ycombinator.com/item?id=13036386 - Nov 2016 (131
         | comments)
         | 
         |  _Matrix Multiplication_ -
         | https://news.ycombinator.com/item?id=13033725 - Nov 2016 (2
         | comments)
        
       | jagrsw wrote:
       | I understand the rotation part and some uses, but what's the
       | intuitive explanation for why it's done this way? Convention?
       | 
       | IOW, what matrix multiplication tries to achieve by doing this
       | manipulation which looks like rotation visually? Why not doing it
       | w/o rotation instead (with the precondition that numbers of
       | columns in two matrices should be equal, instead of column vs
       | row)?
        
         | ogogmad wrote:
         | A different convention would make matrix-matrix multiplication
         | no longer express composition of linear maps. So the geometric
         | -- or deeper meaning -- would be lost.
         | 
         | The operation you're describing is nevertheless equal to A^T B.
         | In other words, it can be expressed using a combination of
         | matrix multiplication and matrix transpose. I don't see what it
         | could be used for, though.
        
         | xyzzyz wrote:
         | When mathematicians think about matrix multiplication (and
         | matrices in general), they don't really think about "rotating"
         | matrices like in the animation above, but rather about
         | operators and their composition. The matrix multiplication is
         | what it is, because that's how function composition works.
         | 
         | Look: consider two functions, f(x, y) = (x + 2y, 3x + 4y), and
         | g(x, y) = (-x + 3y, 4x - y). What's f(g(x, y))? Well, let's
         | work it out, it's simple algebra:
         | 
         | f(g(x, y)) = f(-x+3y, 4x-y) = ((-x+3y)+2(4x-y), 3(-x+3y) +
         | 4(4x-y)) = (-x + 3y + 8x - 2y, -3x + 9y + 16x - 4y) = (7x + y,
         | 13x + 5y).
         | 
         | Whew, that was some hassle to keep track of everything. Now,
         | here's what mathematicians typically do instead: they introduce
         | matrices to make it much easier to keep track of the
         | operations:
         | 
         | Let e_0 = (1, 0), and e_1 = (0, 1). Then f(e_0) = f(1, 0) = (1,
         | 3) = e_0 + 3 e_1, and f(e_1) = f(0, 1) = (2, 4) = 2 e_0 + 4
         | e_1. Thus, mathematicians would write that f in basis e_0, e_1
         | is represented by the matrix
         | 
         | [1 2] [3 4]
         | 
         | so that when you multiply it by the (coulmn) vector [x, y], you
         | get                     [x]         * [y]         [1 2][x + 2y]
         | [3 4][3x + 4y]
         | 
         | Similarly, g(e_0) = (-1, 4) = -e_0 + 4e_1, g(e_1) = (3, -1) =
         | 3e_0 - e_1, so it's represented by the matrix:
         | 
         | [-1 3] [ 4 -1]
         | 
         | Now, let's multiply matrix of f by matrix of g:
         | [-1        3]         * [ 4       -1]      [1 2][-1*1+2*4
         | 3*1-1*2]  = [7  1]      [3 4][-1*3+4*4  3*3-1*4]    [13 5]
         | 
         | and when we multiply the resulting matrix by column vector [x,
         | y]:                      [x]          * [y]      [7  1][7x + y]
         | [13 5][13x + 5]
         | 
         | So, what did we get was in fact our original calculation of
         | f(g(x, y)) = (7x + y, 13x + 5y).
         | 
         | The conclusion here is that _matrix multiplication is what the
         | function composition forces it to be_.
        
         | gizmo686 wrote:
         | The short answer is that it leads to a more consistent
         | notation.
         | 
         | The longer answer is that matrix multiplication is essentially
         | 2 different operations. Consider the linear funtions:
         | f :: R3 -> R3       h :: R3 -> R3
         | 
         | As well as the point                 x :: R3
         | 
         | If we fix a set a basis vectors, we can represent f and h as
         | 3x3 matrices, and x as a 3x1 matrix.
         | 
         | The product [f][x]=[f(x)] then represents the result of a
         | function application, while [f][h] = [f[?]h] represents
         | function composition.
         | 
         | For the function application portion, there is no problem with
         | your proposal. We simply represent the point as a 1x3 vector
         | instead of a a 3x1 vector (or similarly transpose the
         | convention for representing a function as a matrix).
         | 
         | The problem comes with the function composition use-case. For
         | your proposal to work, we would need to transpose only one of
         | the two matrices, which means that the matrix representation of
         | a function is determined both by the basis vectors, and a
         | transposition parity bit. With multiplication as composition
         | only making sense when the transposition parities don't match.
        
       | jmrm wrote:
       | I need this a bit more than 10 years ago, but thanks anyway :-)
        
       | Rickthedick wrote:
       | have you looked into
       | https://en.wikipedia.org/wiki/Strassen_algorithm strassen
       | multiplication and karatsubas algorithm for multiplication?
       | https://en.wikipedia.org/wiki/Karatsuba_algorithm apparently its
       | the one python uses. I just recently learned it and I find it
       | very interesting.
        
       | civilized wrote:
       | A good example of using technology to make the simple
       | complicated.
        
       ___________________________________________________________________
       (page generated 2022-01-17 23:00 UTC)