[HN Gopher] Principal Component Analysis Explained Visually ___________________________________________________________________ Principal Component Analysis Explained Visually Author : xk3 Score : 83 points Date : 2021-05-02 18:37 UTC (4 hours ago) (HTM) web link (setosa.io) (TXT) w3m dump (setosa.io) | quantstats wrote: | This article is relatively popular here (considering the topic). | Two previous discussions about it: | | From 2015: https://news.ycombinator.com/item?id=9040266. | | From 2017: https://news.ycombinator.com/item?id=14405665. | | A more recent approach to visualizing high-dimensional data is | the t-SNE algorithm, which I normally use together with PCA when | exploring big data sets. If you're interested in the differences | between both methods, here's a really good answer: | https://stats.stackexchange.com/a/249520. | rcar wrote: | PCA is a cool technique mathematically, but in my many years of | building models, I've never seen it result in a more accurate | model. I could see it potentially being useful in situations | where you're forced to use a linear/logistic model since you're | going to have to do a lot of feature preprocessing, but tree | ensembles, NNs, etc. are all able to tease out pretty complicated | relationships among features on their own. Considering that PCA | also complicates things from a model interpretability point of | view, it feels to me like a method whose time has largely passed. | mcguire wrote: | By definition, it's going to result in a less accurate model, | unless you keep all of the dimensions or your data is very | weird, right? And NNs are going to complicate your | interpretability more? | ivalm wrote: | It is still a nice tool for projecting things (at least to | visualize) where you expect the data to be on a lower | dimensional hyperplane. I do agree in most cases t-SNE or UMAP | are better (esp if you don't care about distances). | a-dub wrote: | i can think of a few places where it's useful: | | if you know that your data comes from a stationary | distribution, you can use it as a compression technique which | reduces the computational demands on your model. sure, | computing the initial svd or covariance matrix is expensive, | but once you have it, the projection is just a matrix multiply | and a vector subtraction. (with the reverse being the same) | | if you have some high dimensional data and you just want to | look at it, it's a pretty good start. not only does it give you | a sense for whether higher dimensions are just noise (by | looking at the eigenspectrums) it also makes low dimensional | plots possible. | | pca, cca and ica have been around for a very long time. i doubt | "their time has passed." | | but who knows, maybe i'm wrong. | baron_harkonnen wrote: | > Considering that PCA also complicates things from a model | interpretability point of view | | This is a strange comment since my primary usages of PCA/SVD is | as a first step in understanding latent factors which are | driving the data. Latent factors typically involve all of the | important things that anyone running a business or deciding | policy care about: customer engagement, patient well being, | employee hapiness, etc all represent latent factors. | | If you have ever wanted to perform data analysis and gain some | exciting insight into explaining user behavior, PCA/SVD will | get you there pretty quickly. It is one of the most powerful | tools in my arsenal when I'm working on a project that requires | interoperability. | | The "loadings" in PC and the V matrix in SVD both contain | information about how the original feature space correlates | with the new projection. This can easily show thing things like | "User's who do X,Y and NOT Z are more likely to purchase". | | Likewise in LSA (Latent Semantic Analysis/indexing) on a Term- | Frequency matrix you will get a first pass at semantic | embedding. You'll notice, for example, that "dog" and "cat" | will project onto the new space in a common PC which can be | used to interpret "pets". | | > I've never seen it result in a more accurate model. I could | see it potentially being useful in situations where you're | forced to use a linear/logistic model | | PCA/SVD are a linear transformation of the data and shouldn't | give you any performance increase on a linear model. However | they can be very helpful in transforming extremely high | dimensional, sparse vectors into lower dimensional, dense | representations. This can provide a lot of storage/performance | benefits. | | > NNs, etc. are all able to tease out pretty complicated | relationships among features on their own. | | PCA is literally identical to an autoencoder minimizing the MSE | with no non-linear layers. It is a very good first step towards | understanding what your NN will eventually do. After all, all | NNs perform a non-linear matrix transformation so that your | final vector space is ultimately linearly separable. | rcar wrote: | Sure, everyone wants to get to the latent factors that really | drive the outcome of interest, but I've never seen a | situation in which principal components _really_ represent | latent factors unless you squint hard at them and want to | believe. As for gaining insight and explaining user behavior, | I'd much rather just fit a decent model and share some SHAP | plots for understanding how your features relate to the | target and to each other. | | If you like PCA and find it works in your particular domains, | all the more power to you. I just don't find it practically | useful for fitting better models and am generally suspicious | of the insights drawn from that and other unsupervised | techniques, especially given how much of the meaning of the | results gets imparted by the observer who often has a | particular story they'd like to tell. | fredophile wrote: | I've used PCA with good results in the past. My problem | essentially simplified down to trying to find nearest | neighbours in high dimensional spaces. Distance metrics in | high dimensional spaces don't behave nicely. Using PCA to | cut reduce the number of dimensions to something more | manageable made the problem much more tractable. | strontian wrote: | What was used to make the visualizations? | alexcnwy wrote: | https://d3js.org/ & https://threejs.org/ | gentleman11 wrote: | I bet you could use three.js as well | onurcel wrote: | Here is the best PCA explanation I ever read on the web: | https://stats.stackexchange.com/questions/2691/making-sense-... | saeranv wrote: | Seconded. This is exactly the same stackexchange post I thought | of as well. | rsj_hn wrote: | I put the four dots on the corners of a square and the fifth in | the center. This results in the same square in the PCA pane but | rotated about 45 degrees. Then, if you take one of the dots on | the square corner and move it ever so sligthly in and out, you | see the PCA square wildly rotating. Pretty cool to demonstrate | sensitivity to small changes in the inputs. | kwhitefoot wrote: | Very interesting. It would have been even better if there had | been a link to an explanation of how PCA is performed. | baron_harkonnen wrote: | If you know a bit of linear algebra the transformation is | surprisingly intuitive. | | Your goal is to create a set of orthogonal vectors, each that | captures the highest amount of variance in the original data | (the assumption is that variance is where the most information | is). | | This is achieved by performing Eigen-decomposition on the | Covariance matrix of the original data. Essentially you are | learning the eigenvectors of the covariance matrix, ordered by | their eigenvalues. | bigdict wrote: | Or the singular vectors of the zero-centered data, ordered by | singular values. | gentleman11 wrote: | > a transformation no different than finding a camera angle | | I've used PCA a bit in the past and it's so abstract that one | forgets how to conceptualize it shortly after finishing the task. | This is an interesting and memorable way to put it, I like that. | Sinidir wrote: | Question: is there any difference between the highest variance | dimension pca finds and a line that linear regression would find? | hsiang_jih_kueh wrote: | if recall yeah there probably will be. linear regression | minimises the vertical distance of a point to the regression | line whereas PCA minimises the orthogonal distance of the point | to the line. | osipov wrote: | Linear regression uses a measure of an "error" for every data | point. Visually, the error is the vertical difference between a | data point and the line/plane of linear regression. In | contrast, PCA measures the distance from the data point along | the line perpendicular to the PCA axis. The PCA distance is | also known as a "projection". | | There is something known as orthogonal regression (total least | squares) which uses the same measure as PCA. Unfortunately it | doesn't work well across incompatible variables. ___________________________________________________________________ (page generated 2021-05-02 23:00 UTC)