[HN Gopher] Neural Network Loss Landscapes: What do we know? (2021)
       ___________________________________________________________________
        
       Neural Network Loss Landscapes: What do we know? (2021)
        
       Author : bitforger
       Score  : 22 points
       Date   : 2022-07-17 20:55 UTC (2 hours ago)
        
 (HTM) web link (damueller.com)
 (TXT) w3m dump (damueller.com)
        
       | charleshmartin wrote:
       | https://calculatedcontent.com/2015/03/25/why-does-deep-learn...
        
       | evolvingstuff wrote:
       | Here are some "animated" loss landscapes I made quite a long time
       | ago:
       | 
       | http://evolvingstuff.blogspot.com/2011/02/animated-fractal-f...
       | 
       | These are related to recurrent neural networks evolved to
       | maximize fitness whilst wandering through a randomly generated
       | maze and picking up food pellets (the advantage being to remember
       | not to revisit where you have already been.)
        
       | MauranKilom wrote:
       | The "wedge" part under "3. Mode Connectivity" has at least one
       | obvious component: Neural networks tend to be invariant to
       | permuting nodes (together with their connections) within a layer.
       | Simply put, it doesn't matter in what order you number the K
       | nodes of e.g. a fully connected layer, but that alone already
       | means there are K! different solutions with exactly the same
       | behavior. Equivalently, the loss landscape is symmetric to
       | certain permutations of its dimensions.
       | 
       | This means that, at the very least, there are _many_ global
       | optima (well, unless all permutable weights end up with the same
       | value, which is obviously not the case). The fact that different
       | initializations /early training steps can end up in different but
       | equivalent optima follows directly from this symmetry. But
       | whether all their basins are connected, or whether there are just
       | multiple equivalent basins, is much less clear. The "non-linear"
       | connection stuff does seem to imply that they are all in some
       | (high-dimensional, non-linear) valley.
       | 
       | To be clear, this is just me looking at these results from the
       | "permutation" perspective above, because it leads to a few
       | obvious conclusions. But I am not qualified to judge which of
       | these results are more or less profound.
        
         | evolvingstuff wrote:
         | Completely agree! Plus, less trivially, there can be a bunch of
         | different link weight settings (for an assumed distribution of
         | inputs) that result in nearly-symmetric behaviors, and then
         | that is multiplied by the permutation results you have just
         | mentioned! So, it's complicated...
        
       ___________________________________________________________________
       (page generated 2022-07-17 23:00 UTC)