[HN Gopher] Neural Network Loss Landscapes: What do we know? (2021) ___________________________________________________________________ Neural Network Loss Landscapes: What do we know? (2021) Author : bitforger Score : 22 points Date : 2022-07-17 20:55 UTC (2 hours ago) (HTM) web link (damueller.com) (TXT) w3m dump (damueller.com) | charleshmartin wrote: | https://calculatedcontent.com/2015/03/25/why-does-deep-learn... | evolvingstuff wrote: | Here are some "animated" loss landscapes I made quite a long time | ago: | | http://evolvingstuff.blogspot.com/2011/02/animated-fractal-f... | | These are related to recurrent neural networks evolved to | maximize fitness whilst wandering through a randomly generated | maze and picking up food pellets (the advantage being to remember | not to revisit where you have already been.) | MauranKilom wrote: | The "wedge" part under "3. Mode Connectivity" has at least one | obvious component: Neural networks tend to be invariant to | permuting nodes (together with their connections) within a layer. | Simply put, it doesn't matter in what order you number the K | nodes of e.g. a fully connected layer, but that alone already | means there are K! different solutions with exactly the same | behavior. Equivalently, the loss landscape is symmetric to | certain permutations of its dimensions. | | This means that, at the very least, there are _many_ global | optima (well, unless all permutable weights end up with the same | value, which is obviously not the case). The fact that different | initializations /early training steps can end up in different but | equivalent optima follows directly from this symmetry. But | whether all their basins are connected, or whether there are just | multiple equivalent basins, is much less clear. The "non-linear" | connection stuff does seem to imply that they are all in some | (high-dimensional, non-linear) valley. | | To be clear, this is just me looking at these results from the | "permutation" perspective above, because it leads to a few | obvious conclusions. But I am not qualified to judge which of | these results are more or less profound. | evolvingstuff wrote: | Completely agree! Plus, less trivially, there can be a bunch of | different link weight settings (for an assumed distribution of | inputs) that result in nearly-symmetric behaviors, and then | that is multiplied by the permutation results you have just | mentioned! So, it's complicated... ___________________________________________________________________ (page generated 2022-07-17 23:00 UTC)