[HN Gopher] A modern self-referential weight matrix that learns ...
       ___________________________________________________________________
        
       A modern self-referential weight matrix that learns to modify
       itself
        
       Author : lnyan
       Score  : 113 points
       Date   : 2022-04-13 16:12 UTC (6 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | savant_penguin wrote:
       | Just skimmed the paper but the benchmarks are super weird
        
       | ricardobayes wrote:
       | It's a super weird feeling to click on a hacker news top post and
       | find out I know one of the authors. The world is a super small
       | place.
        
       | heyitsguay wrote:
       | I know Schmidhuber is famously miffed for missing out on the AI
       | revolution limelight, and despite that he runs a pretty famous
       | and well-resourced group. So with a paper like this demonstrating
       | a new fundamental technique, you'd think they would eat the labor
       | and compute costs of getting this up and running on a full
       | gauntlet of high-profile benchmarks, in comparison with existing
       | SOTA methods, vs the sort of half-hearted benchmarking that
       | happens in this paper. It's a hassle, but all it would take for
       | something like this to catch the community's attention would be a
       | clear demonstration of viability in line with what groups at any
       | of the other large research institutions do.
       | 
       | The failure to put something like that front and center makes me
       | wonder how strong the method is, because you have to assume that
       | someone on the team has tried more benchmarks. Still, the idea of
       | learning a better update rule than gradient descent is
       | intriguing, so maybe something cool will come from this :)
        
         | nullc wrote:
         | Or they hurried the publication to avoid getting scooped and
         | will follow up with interesting benchmarks later.
        
       | mark_l_watson wrote:
       | I haven't really absorbed this paper yet, but first thoughts were
       | Hopfield Networks we used in the 1980s.
       | 
       | For unsupervised learning algorithms like masked models (BERT and
       | some other Transformers), it makes sense to train in parallel
       | with prediction. Why not?
       | 
       | My imagination can't wrap around using this for supervised
       | (labeled data) learning.
        
       | jdeaton wrote:
       | I'm having a hard time reading this paper without hearing you-
       | again's voice in my head.
        
       | TekMol wrote:
       | I have been playing with alternative ways to do machine learning
       | on and off for a few years now. Some experiments went very well.
       | 
       | I am never sure if it is a waste of time or has some value.
       | 
       | If you guys had some unique ML technology that is different to
       | what all the others do, what would you do with it?
        
         | hwers wrote:
         | Write a paper about it. Post it on arxiv.org. Contact some open
         | minded researchers on twitter or here (show HN) for critique.
        
         | javajosh wrote:
         | Host it on a $5 VPS with full internet access and "see what
         | happens".
        
         | ggerganov wrote:
         | I would make a "Show HN" post
        
         | [deleted]
        
         | swagasaurus-rex wrote:
         | Create a demo of it doing -something-. Literally anything. Then
         | show it off and see where it goes.
        
         | daveguy wrote:
         | Demo speaks louder than words. If you don't want to go into the
         | details of how it works, it would still be interesting to just
         | see where it over and under performs compared to existing
         | systems.
        
           | mark_l_watson wrote:
           | Absolutely! Also, if possible, a Colab (or plain Jupiter
           | notebook) and data would be good.
        
         | nynx wrote:
         | I'd make a blog and post about my experiments.
        
           | andai wrote:
           | And a video too, please :)
        
         | Scene_Cast2 wrote:
         | If you do end up posting any sort of musings on this topic, I'd
         | be really interested in taking a look.
        
         | drewm1980 wrote:
         | Start with the assumption that someone has already done it...
         | Do a thorough literature survey... Ask experts working on the
         | most similar thing. Don't be disheartened if you weren't the
         | first; ideas don't have to be original to have value; some
         | ideas need reviving from time to time, or were ahead of their
         | time when first discovered.
        
       | codelord wrote:
       | I haven't read the paper yet, no comment on the content. But it's
       | amusing that more than 30% of references are self-citation.
        
         | lol1lol wrote:
         | Hinton et al. self cite. Schmidhuber et al. self cite. One got
         | Turing, the other got angry.
        
       | nh23423fefe wrote:
       | It's only a matter of time until the technological singularity
       | 
       | > The WM of a self-referential NN, however, can keep rapidly
       | modifying all of itself during runtime. In principle, such NNs
       | can meta-learn to learn, and metameta-learn to meta-learn to
       | learn, and so on, in the sense of recursive self-improvement.
       | 
       | Everyone who doubts is hanging everything on "in principle" being
       | too hard. Seems ridiculous to me, a failure of imagination.
        
         | Tr3nton wrote:
         | There's a saying that goes, as soon as we can build it, it's
         | not AI any more
        
         | godelski wrote:
         | I think there's some context missing when we're talking about
         | the singularity, this is the whole Marcus "AI is hitting a
         | wall" debate (maybe I'm reading part of that reference in your
         | comment and it isn't there). For different people "hitting the
         | wall" means different things and we're not really communicating
         | well with one another. Marcus is concerned about AGI while
         | others are concerned about ML in general and what it can do. So
         | LLMs and LGMs (like Dall-E) are showing massive improvements
         | and seem to be counter to Marcus's claim. But from the other
         | side, there's still issues with solving AGI with things like
         | causal learning and symbolic learning. But what's bugged me a
         | bit about Marcus's claim is that those areas are also rapidly
         | improving. I just think it is silly to say that Dall-E is a
         | proof that Marcus is wrong rather than pointing towards our
         | improvements in causal learning. But I guess few are interested
         | in CL and it isn't nearly as flashy. I know Marcus reads HN, so
         | maybe you don't think we've been making enough strides in
         | CL/SL? I can agree that it doesn't get enough attention, but ML
         | is very hyped at this point.
        
         | synquid wrote:
         | Schmidhuber has written about recursive self-improvement since
         | his diploma thesis in the 80s: "Evolutionary principles in
         | self-referential learning, or on learning how to learn: The
         | meta-meta-... hook".
         | 
         | Your quote sounds like it could just as well have been from
         | that thesis.
        
           | [deleted]
        
         | pizza wrote:
         | Singularity is not that important. Scale and locality is.
         | Information that has to travel across the world suffers from
         | misordering/relativity. Same for across the room or across a
         | single wire that is nondeterministically but carelessly left
         | unplugged. An oracle doesn't help in that case. Instead what
         | you want is a new kind of being.
        
         | jmmcd wrote:
         | "In principle" is not only (1) hard in practice but also even
         | in principle it is (2) limited by the capacity of the NN. It's
         | (2) which gives me some reassurance.
        
         | _Microft wrote:
         | _" It suddenly stopped self-improving."_ "What happened?" _" It
         | ... it looks like it found a way to autogenerate content that
         | is optimized for its reward function and now binges on it
         | 24/7..."_ ><
        
         | erdos4d wrote:
         | Even if the algorithms were here to do that job, where will the
         | hardware come from? I'm staring at almost a decade of
         | commercial processor (i7) lineage right now and the jump has
         | been from 4 to 6 cores with no change in clock speed (maybe the
         | newer one is even slower actually). There definitely won't be
         | any singularity jumping off unless today's hardware gets
         | another few decades of Moore's law, and that is not happening.
        
           | dotnet00 wrote:
           | It's a bit dishonest to be looking at specifically the
           | manufacturer who spent most of the past decade enjoying its
           | effective monopoly on desktop CPUs as a reference for how
           | computers have improved. Even moreso since 4 and 6 core CPUs
           | are not representative of the high end systems used to train
           | even current state of the art ML models.
        
           | armchair_ wrote:
           | Well, the code that was published alongside the article is
           | written in Python and CUDA, so you're not looking at the
           | right kind of processor to start.
           | 
           | My 5-year-old, consumer-grade GPU does 1.5 MHz * 2300 cores,
           | whereas the equivalent released this year does 1.7 Mhz * 8900
           | cores. Granted, not the best way to measure GPU performance,
           | but it is roughly keeping pace with Moore's law, and it's
           | going to be a better indicator of the future than Intel CPU
           | capabilities, especially for machine learning applications.
        
             | erdos4d wrote:
             | So you are saying that we can get another 3 decades of
             | Moore's law by switching to GPUs made of silicon rather
             | than CPUs made of silicon? Well fuck, problem solved then.
             | I was completely unaware it was so easy.
        
               | VyperCard wrote:
               | Yeah, but mostly for matrix multiplications
        
       ___________________________________________________________________
       (page generated 2022-04-13 23:00 UTC)