[HN Gopher] The Big Six Matrix Factorizations ___________________________________________________________________ The Big Six Matrix Factorizations Author : jjgreen Score : 212 points Date : 2022-05-18 12:07 UTC (10 hours ago) (HTM) web link (nhigham.com) (TXT) w3m dump (nhigham.com) | shusaku wrote: | I love Dr. Higham's blog. In case anyone missed it, the whole | series is on GitHub: https://github.com/higham/what-is | howlin wrote: | Now apply them to the data used to identify the "Big Five" | personality traits! This is an interesting application of factor | analysis/matrix factorization: | | https://en.wikipedia.org/wiki/Big_Five_personality_traits | FabHK wrote: | Yes, the Big Five were found by factor analysis, generally the | PCA, which is basically an eigenvalue (spectral) decomposition | of the (symmetric positive definite) covariance matrix of the | data, which coincides with the SVD of the (normalised) data | matrix. | | The SVD is everywhere. IIRC, the first Netflix price was one | won with the SVD: | | https://en.wikipedia.org/wiki/Netflix_Prize | | https://cseweb.ucsd.edu/classes/fa17/cse291-b/reading/Progre... | Royi wrote: | Where's the data? It could be indeed interesting. | jetbooster wrote: | And then sell it to the Big Four! | | https://en.m.wikipedia.org/wiki/Big_Four_accounting_firms | rhyn00 wrote: | Spectral decomposition is pretty cool. | | Application 1 - spectral clustering - an alternative to k-means | for nonlinear clusters. Get a Distance matrix of your data, | spectral decomp, run k-means on your k top eigen vectors and | that's your clusters. | | Application 2 - graph clustering - (run spectral clustering on | adj matrix!) | | There's some tricks to getting it to work in practice like | normalizing but it's a simple and powerful method. Also the | matrices can get big so it helps a lot to use sparse matrix | libraries for the computations. | | [1] https://towardsdatascience.com/spectral-clustering- | aba2640c0.... | | [2] https://www.hindawi.com/journals/ddns/2020/4540302/ | arbitrandomuser wrote: | I've never given a second thought about what the etymology of | "spectral" in spectral decomposition is. Somewhere in the back | of my mind (and I guess many students of physics have the same | notion) subconsciously i assumed it originates from eigenvalues | of the Hamiltonian determining the atomic spectral lines . But | I've never followed up on it and actually looked it up . | the_svd_doctor wrote: | I might be wrong about the exact historical reason. | | But the way I see it "spectral decomposition of A" is a way | to express A as a sum of orthogonal, rank-1, operators. A = | \sum l_i u_i u_i^T. Those l_i are the eigenvalues; u_i are | the eigenvectors. | | The eigenvectors look a whole lot like the "modes" in a | Fourier decomposition. And if you plot (i, l_i), the | eigenvalues are a bit like the "spectrum" (the amplitude of | each mode). | | In fact, the complex exponentials (the modes in the Fourier | decomposition) are also eigenvectors of a specific operator | (the Laplacian). | | Math people are good at finding connections between things. | [deleted] | throwawaymaths wrote: | According to the almighty wikipedia, The connection is | correct but it turned out to be an accident. David Hilbert | who coined spectral theory was surprised when it was found to | be applicable to solving quantum mechanical spectra. | bloqs wrote: | I love that on this website I can find comments that I simply | don't understand a word of. Good on you for doing the stuff you | do sir. | jdeaton wrote: | This is a fantastically clear outline of this topic. Thank you! | | > The terms "factorization" and "decomposition" are synonymous | and it is a matter of convention which is used. Our list | comprises three factorization and three decompositions. | | I can't tell if this is a joke: right after saying that these two | words mean the same thing in this context, they are then used to | categorize the methods. | | Edit: This is the kind of content that proves the "dead internet | theory" is wrong. | FabHK wrote: | It's not a categorisation, but some trivia regarding the name. | For example: By convention, the SVD is called the SVD (singular | value decomposition), not the SVF (singular value | factorisation). That would be synonymous, but that's not what | it's called. Similarly for the other 5 methods. | tomnipotent wrote: | > I can't tell if this is a joke | | It makes sense, the author is just using the original name of | the method to bucket them e.g. spectral decomposition. | yuppiemephisto wrote: | the words are used synonymously in practice, but i think | there's an opportunity to distinguish them usefully. | | suggestion: | | - 'factoring' is a multiplicative breakdown, a 'composition', | like prime factorization. | | - 'decomposition' could also be called 'partition', and is an | additive breakdown, like how 3 could be split into 2 + 1 | isaacfrond wrote: | The article presents a note on the 6 well known matrix | compositions. He states that all of them have cubic complexity, | but practical algorithms with better exponents exist for all of | them. | oh_my_goodness wrote: | Could you link a practical algorithm with an exponent lower | than 3? (I think of these things | https://en.wikipedia.org/wiki/Computational_complexity_of_ma... | as not being practical, but I'd love to be wrong. ) | daniel-cussen wrote: | https://www.fgemm.com, coming soon. | gnufx wrote: | For something Strassen-ish you could look at | https://jianyuhuang.com/papers/sc16.pdf and the GPU | implementation https://apps.cs.utexas.edu/apps/sites/default/ | files/tech_rep... | martinhath wrote: | The fact that they're all cubic isn't really the notable part | of the runtime of computing the different decompositions, | because the constants involved are really different. In | practice, a common reason for computing many of these | decompositions is to solve a linear system `Ax=b`, because with | the decomposition in hand it is really easy to solve the whole | system (using e.g. backsubstitution). For instance, with C++s | Eigen, look at the 100x100 column of [1], and we can see that | there's orders of magnitude difference between the fast and | slow approaches. THey're all still cubic, sure, but we're | talking 168x difference here. | | (of course, it's not so clear cut, since robustness varies, not | all methods are applicable, and the benchmark is for solving | the system, not computing the decomposition, but overall, | knowledge of which decomposition is fast and which is not is | absolutely crucial to practitioners) | | [1]: | https://eigen.tuxfamily.org/dox/group__DenseDecompositionBen... | adgjlsfhk1 wrote: | They're all basically O(M(n)) where M(n) is your matrix | multiplication time. Even though M(n)<=n^2.3...., it's | reasonable to say that it's n^3, because in practice, no one | uses the sub-cubic algorithms. Strassen is possibly workable, | but it isn't widely used, and all of the sub-cubic algorithms | have accuracy tradeoffs. | photon-torpedo wrote: | > but practical algorithms with better exponents exist for all | of them. | | I'm aware of randomized algorithms with better complexity, | which come at the cost of only giving approximate results | (though the approximation may be perfectly good for practical | purposes). See e.g. [1]. Are there other approaches? | | [1] https://doi.org/10.1137/090771806 | owlbite wrote: | If the data is Sparse (which is not uncommon for large | matrices in the real world), you can exploit the sparsity to | do significantly better then O(n**3). | akomtu wrote: | Imo, a better way to present this is to draw a diagram where | various decompositions are connected by arrows that explain how | one decomposition can be turned into another. | jdeaton wrote: | That is fascinating. Do you have an example of this which you | can point to? | swframe2 wrote: | See this playlist on semidefinite programming: | https://www.youtube.com/playlist?list=PLqwozWPBo-FtOsjrla4jk... | for the surprising way factorization can be used to partially | solve a NP problem. | the_svd_doctor wrote: | Higham is an outstanding researcher. One book I really enjoyed | from him is | https://epubs.siam.org/doi/book/10.1137/1.9780898717778. He has | many many other excellent ones. | whatever1 wrote: | Boo LU!!! Go with stable QR! | the_svd_doctor wrote: | But LU, when it works, is less flops (only half though, IIRC). | And doesn't require a square-root. I think that's why it used | to be preferred when compute was a very scarce resource. | Clearly that's not really the case anymore today. | | But otherwise I agree, QR is almost better on every front. | heinrichhartman wrote: | From a theoretical perspective the most fundamental decomposition | is the rank-decomposition: | | A = X D Y | | with X,Y invertible and D diagonal. It's called rank | decomposition, because you can read off the rank from A by | counting the non-zero entries in D. It's also useful to | determining bases for the image and the kernel of A. Every math | student learns a version of that in their first lecture series of | Linear Algebra. | | Curiously, the rank decomposition is not covered in the numerical | literature. Also it's not possible to derive a rank decomposition | from LR, QR decomposition, although the underlying algorithms | (Gauss, Gram-Schmidt) could be used to do so. | | It took me multiple weeks of work, to understand what the problem | with this algorithms is, and what the practical options are to | establish a rank decomposition are. Full details are available on | my blog: | | https://www.heinrichhartmann.com/posts/2021-03-08-rank-decom... | | TL;DR. Your options are (1) SVD, (2) QR factorization with column | pivoting (part of LAPACK), (3) LDU factorization with total | pivoting (not implemented in BLAS/LAPACK/etc.) | whatshisface wrote: | What makes it the most fundamental? | edflsafoiewq wrote: | It's basically the matrix form of dim(domain(f)) = | dim(ker(f)) + dim(im(f)), or domain(f)/ker(f) [?] im(f) with | f (inducing) the isomorphism. | whatshisface wrote: | But the image and kernel are not the only two properties of | a matrix. | heinrichhartman wrote: | Sure, but from an algebraic (categorical) perspective, | taking kernels, co-kernels and images are the most | fundamental operations you can think of. | | The question is: How do you compute them in an effective | (avoiding full SVD) and stable way? | heinrichhartman wrote: | As it turns out, the author does cover the rank-decomposition | (under the name "Rank-Revealing Factorization") in his blog as | well: | | https://nhigham.com/2021/05/19/what-is-a-rank-revealing-fact... | FabHK wrote: | Higham there adds the condition that X, Y be well-conditioned | (as they are in the SVD, for example). That's a problem with | the Jordan decomposition A = X J X^-1: the X is non-singular | in theory, but can be ill-conditioned, and thus the Jordan | decomposition is not stable and not really used in practice | (as highlighted in the original article). | joppy wrote: | This decomposition is not used much because it is not unique in | any sense, it is not numerically stable, and it is fairly | uninteresting: the matrix D can always be taken to consist of a | diagonal of 1s of length rank(A) and the rest zeros. It is much | more interesting once X and Y are constrained to something like | unitary matrices (in which case we get SVD). | | This "rank decomposition" is a bit interesting algebraically | rather than numerically, since then the numerical stability | problems disappear. Also, if we take all matrices to be over | the integers (and require that "invertible" means the inverse | is also an integer matrix), then the rank decomposition is a | lot like the Smith normal form of a matrix. | heinrichhartman wrote: | It's not unique, but it can be implemented in a way that is | stable. E.g. SVD does this (but is quite overkill for the | requirements). | | It's highly relevant from an algebraic perspective, hence | it's curious that it's not covered (at all) in the numeric | literature. | bigbillheck wrote: | Numerical linear algebra is very different from linear | algebra. | whatshisface wrote: | I could be misunderstanding, but I think you'd get all the | algebraic content out of it by computing the dimensions of | the image and kernel, then working with them from there on. | Why would you want to have a matrix decomposition that | separated them? | | Granted, computing the dimensions of the kernel is not so | easy, especially because a pair of vectors can be | arbitrarily close without being linearly dependent. No | wonder there is no stable way to do it, it technically | exceeds the capability of finite-precision numbers. | Multiplying a vector of unequal components by _most_ | numbers, especially those with non-terminating base-two | decimal representations, will produce a vector that is | linearly independent when rounded back to finite precision. | | Clearly then, linear independence on computers has to be | considered in the continuous sense in which singular values | reveal it, where a very small singular value represents an | "almost-kernel," which is the closest thing to a kernel you | are likely to find outside of carefully constructed | examples or integer matrices. | magnio wrote: | Isn't that just Gaussian elimination? | leephillips wrote: | Thank you for this wonderfully concise summary: it's convenient | to have all this in one compact document. | | I suppose "flops" means "floating-point operations" here? | Heretofore I've always encountered this as an abbreviation for | "floating-point operations per second". | mvcalder wrote: | When I used Matlab as an undergrad in the late 80's it used to | report the flop count after each command, and those were | floating point operations, no time units involved. I tried to | find an image of the old command line but found this historical | note first [1] and thought it would be of interest to readers | of the article. | | [1] | http://www.stat.uchicago.edu/~lekheng/courses/309f14/flops/v... | and | [deleted] | the_svd_doctor wrote: | It's always been very ambiguous in practice. To avoid ambiguity | I usually use `flop/s` but not everyone likes that :) | MatteoFrigo wrote: | Speaking of SVD doctors, I heard many years ago (from Alan | Edelman) that Gene Golub's license plate used to be "DR SVD". | Later he switched to "PROF SVD". | | I couldn't confirm the DR SVD part, but the PROF SVD story | appears to be real: https://www.mathworks.com/company/newslet | ters/articles/profe... | the_svd_doctor wrote: | I heard something very similar from some of his very close | (ex-)colleagues, so it appears to be true :) | arinlen wrote: | > To avoid ambiguity I usually use `flop/s` but not everyone | likes that :) | | Flop/s makes no sense and is outright wrong. The whole point | of flops is to express how many floating point operations are | required by an algorithm. How many operations are performed | per second is a property of the hardware you're using to run | an implementation of the algorithm. | | You want to express computational complexity in terms of | floating point operations. | the_svd_doctor wrote: | Well, yeah, I use it in the context of floating-points- | operations-per-seconds. That's the most common use in my | field. I was replying to the parent comment. No need to use | this kind of tone. | zodiac wrote: | Maybe the_svd_doctor uses flop/s to describe hardware, like | you said | gnufx wrote: | The number of operations per second varies by at least an | order of magnitude on the same hardware depending on the | GEMM algorithm you use (reference or Goto), and you quote | the performance of the implementation in terms of FLOP/s | knowing the number of FLOPs required by the computation. | That makes sense to people who implement and measure these | things. | hanche wrote: | Indeed, the meaning of "flops" is ambiguous, but that seems | hard to avoid. Fortunately, the ambiguity is easily resolved | from context in most cases, as it is here. | melling wrote: | Any suggestions on what to learn in Linear Algebra after Gilbert | Strang's 18.06SC? | | https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011... | | My goal is to learn the math behind machine learning. | isaacfrond wrote: | Here are some links from my favourite reference website: | | Book: Mathematics for Machine Learning (mml-book.github.io) | https://news.ycombinator.com/item?id=16750789 | | Mathematics for Machine Learning [pdf] (mml-book.com) | https://news.ycombinator.com/item?id=21293132 | | Mathematics of Machine Learning (2016) (ibm.com) | https://news.ycombinator.com/item?id=15146746 | [deleted] | time_to_smile wrote: | The vast majority of the mathematics required to really | understand ML is just probability, calculus and basic linear | algebra. If you know these already and still struggle it's | because the notation is often terse and assumes a specific | context. Regarding this the only answer is to keep reading key | papers and work through the math, ideally in code as well. | | For most current gen deep learning there's not even that much | math, it's mostly a growing library of what are basically | engineering tricks. For example an LSTM is almost exclusively | very basic linear algebra with some calculus used to optimize | it. But by it's nature the calculus can't be done by hand and | the actual implementation of all that basic linear algebra is | tricky and takes practice. | | You'll learn more by implementing things from scratch based on | the math than you will trying to read through all the | background material hoping that one day it will all make sense. | It only ever makes sense when implementing and by continuous | reading/practice. | natly wrote: | These lectures are fantastic after you've mastered the basics | of linear algebra https://www.youtube.com/watch?v=McLq1hEq3UY | (convex optimization, by a very experienced and often funny | lecturer) | nightski wrote: | Stephen Boyd is a very good lecturer! I watched his videos on | linear dynamical systems almost a decade ago and thought he | did a fantastic job. Would highly recommend. | sjburt wrote: | The problem set for the stanford linear dynamic systems | that he taught is also fantastic. The level of difficulty | and the breadth of applications leaves you with so many | tools. | screye wrote: | I am a big fan of Murphy's ML book [1]. | | [1] https://smile.amazon.com/Probabilistic-Machine-Learning- | Intr... | | It covers almost all the math you'd need to start doing ML | research. I find it to be the ideal 'one book to rule them all' | book for CS people in ML. Although, pure math grads in ML might | find the book to not go deep enough. | ivad wrote: | You might enjoy these notes: | http://mlg.eng.cam.ac.uk/teaching/4f13/2122/ They give (I | think) a good general overview, while also going a little bit | more in-depth in a few areas (e.g., Gaussian Processes). | swframe2 wrote: | The math used in ML papers is very diverse. There is too much | to learn. It is easier if you pick a problem and learn the math | for it. Find a group that is already working on that problem | and ask them for best way to learn the math for it. | DavidSJ wrote: | A strong foundation in linear algebra, multivariable | calculus, probability, and statistics is going to be | generally applicable almost no matter what problem you work | on. | DavidSJ wrote: | If you want a beautiful abstract perspective on linear algebra | to complement Strang's more down-to-earth, matrix- and linear | equation-oriented lectures, pick up Axler's _Linear Algebra | Done Right_. | natly wrote: | He also released a lecture series recently (Axler himself!) | which barely anyone seems to be talking about: https://www.yo | utube.com/playlist?list=PLGAnmvB9m7zOBVCZBUUmS... | ngmc wrote: | You could take Strang's follow-up course on learning from data. | | https://ocw.mit.edu/courses/18-065-matrix-methods-in-data-an... | jstx1 wrote: | I don't know if 18.065 was worth the time, it felt like a | repeat of 18.06 with hardly anything new (and not nearly | enough context around actual applications). | mgaunard wrote: | Missing the LDLT decomposition. | the_svd_doctor wrote: | It can't always be computed accurately; I suspect that's why | it's not mentioned. | | Bunch-Kaufman is the "right" factorization for indefinite | Hermitian/symmetric matrices but it's not as well know. ___________________________________________________________________ (page generated 2022-05-18 23:00 UTC)