[HN Gopher] Principal Component Analysis Explained Visually
       ___________________________________________________________________
        
       Principal Component Analysis Explained Visually
        
       Author : xk3
       Score  : 83 points
       Date   : 2021-05-02 18:37 UTC (4 hours ago)
        
 (HTM) web link (setosa.io)
 (TXT) w3m dump (setosa.io)
        
       | quantstats wrote:
       | This article is relatively popular here (considering the topic).
       | Two previous discussions about it:
       | 
       | From 2015: https://news.ycombinator.com/item?id=9040266.
       | 
       | From 2017: https://news.ycombinator.com/item?id=14405665.
       | 
       | A more recent approach to visualizing high-dimensional data is
       | the t-SNE algorithm, which I normally use together with PCA when
       | exploring big data sets. If you're interested in the differences
       | between both methods, here's a really good answer:
       | https://stats.stackexchange.com/a/249520.
        
       | rcar wrote:
       | PCA is a cool technique mathematically, but in my many years of
       | building models, I've never seen it result in a more accurate
       | model. I could see it potentially being useful in situations
       | where you're forced to use a linear/logistic model since you're
       | going to have to do a lot of feature preprocessing, but tree
       | ensembles, NNs, etc. are all able to tease out pretty complicated
       | relationships among features on their own. Considering that PCA
       | also complicates things from a model interpretability point of
       | view, it feels to me like a method whose time has largely passed.
        
         | mcguire wrote:
         | By definition, it's going to result in a less accurate model,
         | unless you keep all of the dimensions or your data is very
         | weird, right? And NNs are going to complicate your
         | interpretability more?
        
         | ivalm wrote:
         | It is still a nice tool for projecting things (at least to
         | visualize) where you expect the data to be on a lower
         | dimensional hyperplane. I do agree in most cases t-SNE or UMAP
         | are better (esp if you don't care about distances).
        
         | a-dub wrote:
         | i can think of a few places where it's useful:
         | 
         | if you know that your data comes from a stationary
         | distribution, you can use it as a compression technique which
         | reduces the computational demands on your model. sure,
         | computing the initial svd or covariance matrix is expensive,
         | but once you have it, the projection is just a matrix multiply
         | and a vector subtraction. (with the reverse being the same)
         | 
         | if you have some high dimensional data and you just want to
         | look at it, it's a pretty good start. not only does it give you
         | a sense for whether higher dimensions are just noise (by
         | looking at the eigenspectrums) it also makes low dimensional
         | plots possible.
         | 
         | pca, cca and ica have been around for a very long time. i doubt
         | "their time has passed."
         | 
         | but who knows, maybe i'm wrong.
        
         | baron_harkonnen wrote:
         | > Considering that PCA also complicates things from a model
         | interpretability point of view
         | 
         | This is a strange comment since my primary usages of PCA/SVD is
         | as a first step in understanding latent factors which are
         | driving the data. Latent factors typically involve all of the
         | important things that anyone running a business or deciding
         | policy care about: customer engagement, patient well being,
         | employee hapiness, etc all represent latent factors.
         | 
         | If you have ever wanted to perform data analysis and gain some
         | exciting insight into explaining user behavior, PCA/SVD will
         | get you there pretty quickly. It is one of the most powerful
         | tools in my arsenal when I'm working on a project that requires
         | interoperability.
         | 
         | The "loadings" in PC and the V matrix in SVD both contain
         | information about how the original feature space correlates
         | with the new projection. This can easily show thing things like
         | "User's who do X,Y and NOT Z are more likely to purchase".
         | 
         | Likewise in LSA (Latent Semantic Analysis/indexing) on a Term-
         | Frequency matrix you will get a first pass at semantic
         | embedding. You'll notice, for example, that "dog" and "cat"
         | will project onto the new space in a common PC which can be
         | used to interpret "pets".
         | 
         | > I've never seen it result in a more accurate model. I could
         | see it potentially being useful in situations where you're
         | forced to use a linear/logistic model
         | 
         | PCA/SVD are a linear transformation of the data and shouldn't
         | give you any performance increase on a linear model. However
         | they can be very helpful in transforming extremely high
         | dimensional, sparse vectors into lower dimensional, dense
         | representations. This can provide a lot of storage/performance
         | benefits.
         | 
         | > NNs, etc. are all able to tease out pretty complicated
         | relationships among features on their own.
         | 
         | PCA is literally identical to an autoencoder minimizing the MSE
         | with no non-linear layers. It is a very good first step towards
         | understanding what your NN will eventually do. After all, all
         | NNs perform a non-linear matrix transformation so that your
         | final vector space is ultimately linearly separable.
        
           | rcar wrote:
           | Sure, everyone wants to get to the latent factors that really
           | drive the outcome of interest, but I've never seen a
           | situation in which principal components _really_ represent
           | latent factors unless you squint hard at them and want to
           | believe. As for gaining insight and explaining user behavior,
           | I'd much rather just fit a decent model and share some SHAP
           | plots for understanding how your features relate to the
           | target and to each other.
           | 
           | If you like PCA and find it works in your particular domains,
           | all the more power to you. I just don't find it practically
           | useful for fitting better models and am generally suspicious
           | of the insights drawn from that and other unsupervised
           | techniques, especially given how much of the meaning of the
           | results gets imparted by the observer who often has a
           | particular story they'd like to tell.
        
             | fredophile wrote:
             | I've used PCA with good results in the past. My problem
             | essentially simplified down to trying to find nearest
             | neighbours in high dimensional spaces. Distance metrics in
             | high dimensional spaces don't behave nicely. Using PCA to
             | cut reduce the number of dimensions to something more
             | manageable made the problem much more tractable.
        
       | strontian wrote:
       | What was used to make the visualizations?
        
         | alexcnwy wrote:
         | https://d3js.org/ & https://threejs.org/
        
         | gentleman11 wrote:
         | I bet you could use three.js as well
        
       | onurcel wrote:
       | Here is the best PCA explanation I ever read on the web:
       | https://stats.stackexchange.com/questions/2691/making-sense-...
        
         | saeranv wrote:
         | Seconded. This is exactly the same stackexchange post I thought
         | of as well.
        
       | rsj_hn wrote:
       | I put the four dots on the corners of a square and the fifth in
       | the center. This results in the same square in the PCA pane but
       | rotated about 45 degrees. Then, if you take one of the dots on
       | the square corner and move it ever so sligthly in and out, you
       | see the PCA square wildly rotating. Pretty cool to demonstrate
       | sensitivity to small changes in the inputs.
        
       | kwhitefoot wrote:
       | Very interesting. It would have been even better if there had
       | been a link to an explanation of how PCA is performed.
        
         | baron_harkonnen wrote:
         | If you know a bit of linear algebra the transformation is
         | surprisingly intuitive.
         | 
         | Your goal is to create a set of orthogonal vectors, each that
         | captures the highest amount of variance in the original data
         | (the assumption is that variance is where the most information
         | is).
         | 
         | This is achieved by performing Eigen-decomposition on the
         | Covariance matrix of the original data. Essentially you are
         | learning the eigenvectors of the covariance matrix, ordered by
         | their eigenvalues.
        
           | bigdict wrote:
           | Or the singular vectors of the zero-centered data, ordered by
           | singular values.
        
       | gentleman11 wrote:
       | > a transformation no different than finding a camera angle
       | 
       | I've used PCA a bit in the past and it's so abstract that one
       | forgets how to conceptualize it shortly after finishing the task.
       | This is an interesting and memorable way to put it, I like that.
        
       | Sinidir wrote:
       | Question: is there any difference between the highest variance
       | dimension pca finds and a line that linear regression would find?
        
         | hsiang_jih_kueh wrote:
         | if recall yeah there probably will be. linear regression
         | minimises the vertical distance of a point to the regression
         | line whereas PCA minimises the orthogonal distance of the point
         | to the line.
        
         | osipov wrote:
         | Linear regression uses a measure of an "error" for every data
         | point. Visually, the error is the vertical difference between a
         | data point and the line/plane of linear regression. In
         | contrast, PCA measures the distance from the data point along
         | the line perpendicular to the PCA axis. The PCA distance is
         | also known as a "projection".
         | 
         | There is something known as orthogonal regression (total least
         | squares) which uses the same measure as PCA. Unfortunately it
         | doesn't work well across incompatible variables.
        
       ___________________________________________________________________
       (page generated 2021-05-02 23:00 UTC)