[HN Gopher] Gradient Descent Models Are Kernel Machines ___________________________________________________________________ Gradient Descent Models Are Kernel Machines Author : dilap Score : 77 points Date : 2021-02-08 19:41 UTC (3 hours ago) (HTM) web link (infoproc.blogspot.com) (TXT) w3m dump (infoproc.blogspot.com) | scythmic_waves wrote: | The paper discussed showed on reddit a few months back [1]. | Another paper showed up shortly after claiming the exact opposite | [2]. Some discussion of contradiction this can be found in this | part of the thread: [3]. | | I myself am very interested in [2]. It's fairly dense, but I've | been meaning to go through it and the larger Tensor Programs | framework ever since. | | [1] | https://www.reddit.com/r/MachineLearning/comments/k7wj5s/r_e... | | [2] | https://www.reddit.com/r/MachineLearning/comments/k8h01q/r_w... | | [3] | https://www.reddit.com/r/MachineLearning/comments/k8h01q/r_w... | Grimm1 wrote: | Thank you! Number 2 was exactly what I was looking for and I | just couldn't find the link. | ur-whale wrote: | > Gradient Descent Models Are Kernel Machines | | ... that also happen to actually work. | tlb wrote: | If you're surprised by this result because you've used kernel | machines and didn't find them very good at generalization, keep | in mind that this assumes a kernel function that accurately | reflects similarity of input samples. Most work with kernel | machines just uses Euclidean distance. For instance, in an image | recognition model it would have to identify 2 images with dogs as | more similar than an image with a dog and an image with a cat. | | With a sufficiently magical kernel function, indeed you can get | great results with a kernel machine. But it's not so easy to | write a kernel function for a domain like image processing, where | shifts, scales, and small rotations shouldn't affect similarity | much. Let alone for text processing, where it should recognize 2 | sentences with similar meaning as similar. | yudlejoza wrote: | I may or may not be surprised by the result, I'm definitely not | surprised by the 'Thing A is thing B' in machine learning. | | Every tom, machine-learner, and harry, is an expert on proving | to the whole world that thing-A is thing-B. The only problem is | people don't hire them with a million dollars a year total | compensation. | memming wrote: | But the converse is not true in this case, so still | interesting. | mywittyname wrote: | >With a sufficiently magical kernel function, indeed you can | get great results with a kernel machine. But it's not so easy | to write a kernel function for a domain like image processing, | where shifts, scales, and small rotations shouldn't affect | similarity much. Let alone for text processing, where it should | recognize 2 sentences with similar meaning as similar. | | I think the key issue at hand is that gradient descent is | easier to train than a model using kernel functions. Someone | could absolutely devise a mechanism for back propagation of | errors with kernel functions, but at that point, it is | basically a neural network. | throwawaysea wrote: | > This result makes it very clear that without regularity imposed | by the ground truth mechanism which generates the actual data | (e.g., some natural process), a neural net is unlikely to perform | well on an example which deviates strongly (as defined by the | kernel) from all training examples. | | Is this another way of saying that neural networks are just | another statistical estimation method and not a path to general | artificial intelligence? Or is it saying that problems like self- | driving cars are not suitable to the current state of the art for | AI since we have to ensure that reality doesn't deviate from | training examples? Or both? | | I'd love to understand the "real life" implications of this | finding better. | phreeza wrote: | Is it not possible to achieve AGI with a good statistical | estimation method? | viraptor wrote: | Depends if you think "a human unlikely to perform well on an | example which deviates strongly (as defined by experience) | from all training examples." is true. | Grimm1 wrote: | When this was discussed about two months ago the conclusion I | took away was not very many at the moment beyond somewhat | formal equivalence. | | https://news.ycombinator.com/item?id=25314830 | 6gvONxR4sf7o wrote: | Neither. It's more akin to the fundamental theorem of calculus. | If you follow your data from point A to point B along some | differential path, you can sum up/integrate the steps to follow | your data in a single jump from point A to point B directly | (with the jump in terms of those integrals). It's a super | interesting viewpoint on gradient descent and models that use | it that could be really useful in looking at and understanding | those models abstractly, but isn't saying anything about | suitability for different tasks. | SubiculumCode wrote: | somedays I feel that, in the final analysis, everything is just | linear regression. | jtmcmc wrote: | given the properly transformed space that may very well be | true... | wenc wrote: | As someone who has studied nonlinear nonconvex optimization, I | don't think linear regression is the final word here. In the | universe of optimization problems for curve-fitting, the linear | case is only one case (albeit a very useful one). | | In many cases though, it is often insufficient. The next step | is piecewise linearity and then convexity. It is said that | convexity is a way to weaken a linearity requirement while | leaving a problem tractable. | | Many real world systems are nonlinear (think physics models), | and often nonconvex. You can approximate them using locally | linear functions to be sure, but you lose a lot of fidelity in | the process. Sometimes this is ok, sometimes this is not, so it | depends on the final application. | | It happens that linear regression is good enough for a lot of | stuff out there, but there are many places where it doesn't | work. | nightcracker wrote: | > Many real world systems are nonlinear (think physics | models) | | Technically, only if you don't zoom in too far, quantum | mechanics is linear. | SubiculumCode wrote: | Yes there are non-linear functions...but often these result | from combinations of linear functions. | SubiculumCode wrote: | Or rather, that all statistical methods are dressed up | regression. | IdiocyInAction wrote: | FC neural nets are iterated linear regression, in some sense. | tobmlt wrote: | hey, hey!, we can over-fit higher order approximations too. The | nerve on some of ya. | | (Absolutely just kidding around here) | tqi wrote: | https://twitter.com/theotheredmund/status/134945323076219699... | 6gvONxR4sf7o wrote: | Discussion of the actual paper here: | | https://news.ycombinator.com/item?id=25314830 | | It's a really neat one for people who care about what's going on | under the hood, but not immediately applicable to the more | applied folks. I saw some good quotes at the time to the tune of | "I can't wait to see the papers citing this one in a year or | two." | kdisorte wrote: | Quoting an expert from the last time this was posted: | | So at the end, it rephrased a statement from "Neural Tangent | Kernel: Convergence and Generalization in Neural Networks" | [https://arxiv.org/abs/1806.07572], besides in a way which is | kind of miss-leading. | | The assertion is known by the community at least since 2018, if | not even well before. | | I find this article and the buzz around a little awkward. | robrenaud wrote: | Yannic Kilcher's paper explained series covers this pretty paper | well. I feel like I have a decent understanding of it just after | watching the video with a few pauses/rewinds. | | https://www.youtube.com/watch?v=ahRPdiCop3E ___________________________________________________________________ (page generated 2021-02-08 23:00 UTC)