[HN Gopher] The convolution empire strikes back ___________________________________________________________________ The convolution empire strikes back Author : che_shr_cat Score : 63 points Date : 2023-10-27 19:39 UTC (3 hours ago) (HTM) web link (gonzoml.substack.com) (TXT) w3m dump (gonzoml.substack.com) | adamnemecek wrote: | All machine learning is just convolution in the context of Hopf | algebra convolution. | mensetmanusman wrote: | Is this an intellectual leap aiming to make the field more | cohesive, like the quest for unifying theories in physics? | adamnemecek wrote: | That is one of the goals, yes. In addition, it seems like you | get neural architecture search (architecture is optimized), | faster training, inference and interpretability. I'm working | it out as we speak. | | Ironically, convolution provides some unification in physics | too, e.g. renormalization is a convolution. | dpflan wrote: | Interesting, please do elaborate... | cwillu wrote: | I read it as a riff on "monads are just monoids in the | category of endofunctors", but maybe that wasn't intended. | uoaei wrote: | It kind of is. The commenter has been working on this | formalism for a year or more. I'm sure he will come by with | his link for the Discord channel where he discusses with | and finds collaborators soon. | adamnemecek wrote: | I has been less than 9 months. But yeah there is a | discord if you want to follow progress | https://discord.cofunctional.ai. | visarga wrote: | My theory is that architecture doesn't matter - convolutional, | transformer or recurrent, as long as you can efficiently train | models of the same size, what counts is the dataset. | | Similarly, humans achieve about the same results when they have | the same training. Small variations. What matters is not the | brain but the education they get. | | Of course I am exaggerating a bit, just saying there are a | multitude of architectures of brain and neural nets with similar | abilities, and the differentiating factor is the data not the | model. | | For years we have seen hundreds of papers trying to propose sub- | quadratic attention. They all failed to get traction, big labs | still use almost vanilla transformer. At some point a paper | declared "mixing is all you need" (MLP-Mixers) to replace | "attention is all you need". Just mixing, the optimiser adapts to | what it gets. | | If you think about it, maybe language creates a virtual layer | where language operations are performed. And this works similarly | in humans and AIs. That's why the architecture doesn't matter, | because it is running the language-OS on top. Similarly for | vision. | | I place 90% the merits of AI on language and 10% on the model | architecture. Finding intelligence was inevitable, it was hiding | in language, that's how we get to be intelligent as well. A human | raised without language is even worse than a primitive. | Intelligence is encoded in software, not hardware. Our language | software has more breadth and depth than any one of us can create | or contain. | kookamamie wrote: | Dataset counts, but also the number of total parameters in the | network, i.e. capacity. | visarga wrote: | Agreed, it's in my first phrase "as long as you can | efficiently train models of the same size, what counts is the | dataset". But useful sizes are just a few. 7, 13, 35, 70, | 120B - because they are targeted to various families of GPUs. | A 2T model I can't run or too expensive to use on APIs is of | no use. Not just dataset size, but data quality matters just | as much, and diversity. | | I believe LLMs will train mostly on synthetic data engineered | to have extreme diversity and very high quality. This kind of | data confers 5x gains in efficiency as demonstrated by | Microsoft in the Phi-1.5 paper. | rdedev wrote: | I wish someone performed a large scale experiment to evaluate | all these alternate architectures. I kind of feel that they get | drowned out by new sota results from openai and others. What I | wish is something that tries to see if emergent behaviors pop | up with enough data and parameters. | | Maybe vision is special enough that convnets and approache | transformer level performance or it could be generalized to any | modality. I haven't read enough papers to know if someone has | already done something like this but everywhere I look on the | application side of things, vanilla transformers seems to be | dominating | gradascent wrote: | This is great, but what is a possible use-case of these massive | classifier models? I'm guessing they won't be running at the | edge, which precludes them from real-time applications like self- | driving cars, smartphones, or military. So then what? Facial | recognition for police/governments or targeted advertisement | based on your Instagram/Google photos? I'm genuinely curious. | constantly wrote: | Hard to classify items. Subclasses of subclasses that have | little to differentiate them and possibly few pixels or lots of | noise in the data. | currymj wrote: | 1) it's basic research, 2) you can always chop off the last | layer and use the embeddings, which I guess might be useful for | something | pjs_ wrote: | https://external-preview.redd.it/du7KQXLvBmVqc5G0T3tIEbWsYn8... ___________________________________________________________________ (page generated 2023-10-27 23:00 UTC)