hngopher.com

       [HN Gopher] The convolution empire strikes back
       ___________________________________________________________________
        
       The convolution empire strikes back
        
       Author : che_shr_cat
       Score  : 63 points
       Date   : 2023-10-27 19:39 UTC (3 hours ago)
        
 (HTM) web link (gonzoml.substack.com)
 (TXT) w3m dump (gonzoml.substack.com)
        
       | adamnemecek wrote:
       | All machine learning is just convolution in the context of Hopf
       | algebra convolution.
        
         | mensetmanusman wrote:
         | Is this an intellectual leap aiming to make the field more
         | cohesive, like the quest for unifying theories in physics?
        
           | adamnemecek wrote:
           | That is one of the goals, yes. In addition, it seems like you
           | get neural architecture search (architecture is optimized),
           | faster training, inference and interpretability. I'm working
           | it out as we speak.
           | 
           | Ironically, convolution provides some unification in physics
           | too, e.g. renormalization is a convolution.
        
         | dpflan wrote:
         | Interesting, please do elaborate...
        
           | cwillu wrote:
           | I read it as a riff on "monads are just monoids in the
           | category of endofunctors", but maybe that wasn't intended.
        
             | uoaei wrote:
             | It kind of is. The commenter has been working on this
             | formalism for a year or more. I'm sure he will come by with
             | his link for the Discord channel where he discusses with
             | and finds collaborators soon.
        
               | adamnemecek wrote:
               | I has been less than 9 months. But yeah there is a
               | discord if you want to follow progress
               | https://discord.cofunctional.ai.
        
       | visarga wrote:
       | My theory is that architecture doesn't matter - convolutional,
       | transformer or recurrent, as long as you can efficiently train
       | models of the same size, what counts is the dataset.
       | 
       | Similarly, humans achieve about the same results when they have
       | the same training. Small variations. What matters is not the
       | brain but the education they get.
       | 
       | Of course I am exaggerating a bit, just saying there are a
       | multitude of architectures of brain and neural nets with similar
       | abilities, and the differentiating factor is the data not the
       | model.
       | 
       | For years we have seen hundreds of papers trying to propose sub-
       | quadratic attention. They all failed to get traction, big labs
       | still use almost vanilla transformer. At some point a paper
       | declared "mixing is all you need" (MLP-Mixers) to replace
       | "attention is all you need". Just mixing, the optimiser adapts to
       | what it gets.
       | 
       | If you think about it, maybe language creates a virtual layer
       | where language operations are performed. And this works similarly
       | in humans and AIs. That's why the architecture doesn't matter,
       | because it is running the language-OS on top. Similarly for
       | vision.
       | 
       | I place 90% the merits of AI on language and 10% on the model
       | architecture. Finding intelligence was inevitable, it was hiding
       | in language, that's how we get to be intelligent as well. A human
       | raised without language is even worse than a primitive.
       | Intelligence is encoded in software, not hardware. Our language
       | software has more breadth and depth than any one of us can create
       | or contain.
        
         | kookamamie wrote:
         | Dataset counts, but also the number of total parameters in the
         | network, i.e. capacity.
        
           | visarga wrote:
           | Agreed, it's in my first phrase "as long as you can
           | efficiently train models of the same size, what counts is the
           | dataset". But useful sizes are just a few. 7, 13, 35, 70,
           | 120B - because they are targeted to various families of GPUs.
           | A 2T model I can't run or too expensive to use on APIs is of
           | no use. Not just dataset size, but data quality matters just
           | as much, and diversity.
           | 
           | I believe LLMs will train mostly on synthetic data engineered
           | to have extreme diversity and very high quality. This kind of
           | data confers 5x gains in efficiency as demonstrated by
           | Microsoft in the Phi-1.5 paper.
        
         | rdedev wrote:
         | I wish someone performed a large scale experiment to evaluate
         | all these alternate architectures. I kind of feel that they get
         | drowned out by new sota results from openai and others. What I
         | wish is something that tries to see if emergent behaviors pop
         | up with enough data and parameters.
         | 
         | Maybe vision is special enough that convnets and approache
         | transformer level performance or it could be generalized to any
         | modality. I haven't read enough papers to know if someone has
         | already done something like this but everywhere I look on the
         | application side of things, vanilla transformers seems to be
         | dominating
        
       | gradascent wrote:
       | This is great, but what is a possible use-case of these massive
       | classifier models? I'm guessing they won't be running at the
       | edge, which precludes them from real-time applications like self-
       | driving cars, smartphones, or military. So then what? Facial
       | recognition for police/governments or targeted advertisement
       | based on your Instagram/Google photos? I'm genuinely curious.
        
         | constantly wrote:
         | Hard to classify items. Subclasses of subclasses that have
         | little to differentiate them and possibly few pixels or lots of
         | noise in the data.
        
         | currymj wrote:
         | 1) it's basic research, 2) you can always chop off the last
         | layer and use the embeddings, which I guess might be useful for
         | something
        
       | pjs_ wrote:
       | https://external-preview.redd.it/du7KQXLvBmVqc5G0T3tIEbWsYn8...
        
       ___________________________________________________________________
       (page generated 2023-10-27 23:00 UTC)