[HN Gopher] A modern self-referential weight matrix that learns ... ___________________________________________________________________ A modern self-referential weight matrix that learns to modify itself Author : lnyan Score : 113 points Date : 2022-04-13 16:12 UTC (6 hours ago) (HTM) web link (arxiv.org) (TXT) w3m dump (arxiv.org) | savant_penguin wrote: | Just skimmed the paper but the benchmarks are super weird | ricardobayes wrote: | It's a super weird feeling to click on a hacker news top post and | find out I know one of the authors. The world is a super small | place. | heyitsguay wrote: | I know Schmidhuber is famously miffed for missing out on the AI | revolution limelight, and despite that he runs a pretty famous | and well-resourced group. So with a paper like this demonstrating | a new fundamental technique, you'd think they would eat the labor | and compute costs of getting this up and running on a full | gauntlet of high-profile benchmarks, in comparison with existing | SOTA methods, vs the sort of half-hearted benchmarking that | happens in this paper. It's a hassle, but all it would take for | something like this to catch the community's attention would be a | clear demonstration of viability in line with what groups at any | of the other large research institutions do. | | The failure to put something like that front and center makes me | wonder how strong the method is, because you have to assume that | someone on the team has tried more benchmarks. Still, the idea of | learning a better update rule than gradient descent is | intriguing, so maybe something cool will come from this :) | nullc wrote: | Or they hurried the publication to avoid getting scooped and | will follow up with interesting benchmarks later. | mark_l_watson wrote: | I haven't really absorbed this paper yet, but first thoughts were | Hopfield Networks we used in the 1980s. | | For unsupervised learning algorithms like masked models (BERT and | some other Transformers), it makes sense to train in parallel | with prediction. Why not? | | My imagination can't wrap around using this for supervised | (labeled data) learning. | jdeaton wrote: | I'm having a hard time reading this paper without hearing you- | again's voice in my head. | TekMol wrote: | I have been playing with alternative ways to do machine learning | on and off for a few years now. Some experiments went very well. | | I am never sure if it is a waste of time or has some value. | | If you guys had some unique ML technology that is different to | what all the others do, what would you do with it? | hwers wrote: | Write a paper about it. Post it on arxiv.org. Contact some open | minded researchers on twitter or here (show HN) for critique. | javajosh wrote: | Host it on a $5 VPS with full internet access and "see what | happens". | ggerganov wrote: | I would make a "Show HN" post | [deleted] | swagasaurus-rex wrote: | Create a demo of it doing -something-. Literally anything. Then | show it off and see where it goes. | daveguy wrote: | Demo speaks louder than words. If you don't want to go into the | details of how it works, it would still be interesting to just | see where it over and under performs compared to existing | systems. | mark_l_watson wrote: | Absolutely! Also, if possible, a Colab (or plain Jupiter | notebook) and data would be good. | nynx wrote: | I'd make a blog and post about my experiments. | andai wrote: | And a video too, please :) | Scene_Cast2 wrote: | If you do end up posting any sort of musings on this topic, I'd | be really interested in taking a look. | drewm1980 wrote: | Start with the assumption that someone has already done it... | Do a thorough literature survey... Ask experts working on the | most similar thing. Don't be disheartened if you weren't the | first; ideas don't have to be original to have value; some | ideas need reviving from time to time, or were ahead of their | time when first discovered. | codelord wrote: | I haven't read the paper yet, no comment on the content. But it's | amusing that more than 30% of references are self-citation. | lol1lol wrote: | Hinton et al. self cite. Schmidhuber et al. self cite. One got | Turing, the other got angry. | nh23423fefe wrote: | It's only a matter of time until the technological singularity | | > The WM of a self-referential NN, however, can keep rapidly | modifying all of itself during runtime. In principle, such NNs | can meta-learn to learn, and metameta-learn to meta-learn to | learn, and so on, in the sense of recursive self-improvement. | | Everyone who doubts is hanging everything on "in principle" being | too hard. Seems ridiculous to me, a failure of imagination. | Tr3nton wrote: | There's a saying that goes, as soon as we can build it, it's | not AI any more | godelski wrote: | I think there's some context missing when we're talking about | the singularity, this is the whole Marcus "AI is hitting a | wall" debate (maybe I'm reading part of that reference in your | comment and it isn't there). For different people "hitting the | wall" means different things and we're not really communicating | well with one another. Marcus is concerned about AGI while | others are concerned about ML in general and what it can do. So | LLMs and LGMs (like Dall-E) are showing massive improvements | and seem to be counter to Marcus's claim. But from the other | side, there's still issues with solving AGI with things like | causal learning and symbolic learning. But what's bugged me a | bit about Marcus's claim is that those areas are also rapidly | improving. I just think it is silly to say that Dall-E is a | proof that Marcus is wrong rather than pointing towards our | improvements in causal learning. But I guess few are interested | in CL and it isn't nearly as flashy. I know Marcus reads HN, so | maybe you don't think we've been making enough strides in | CL/SL? I can agree that it doesn't get enough attention, but ML | is very hyped at this point. | synquid wrote: | Schmidhuber has written about recursive self-improvement since | his diploma thesis in the 80s: "Evolutionary principles in | self-referential learning, or on learning how to learn: The | meta-meta-... hook". | | Your quote sounds like it could just as well have been from | that thesis. | [deleted] | pizza wrote: | Singularity is not that important. Scale and locality is. | Information that has to travel across the world suffers from | misordering/relativity. Same for across the room or across a | single wire that is nondeterministically but carelessly left | unplugged. An oracle doesn't help in that case. Instead what | you want is a new kind of being. | jmmcd wrote: | "In principle" is not only (1) hard in practice but also even | in principle it is (2) limited by the capacity of the NN. It's | (2) which gives me some reassurance. | _Microft wrote: | _" It suddenly stopped self-improving."_ "What happened?" _" It | ... it looks like it found a way to autogenerate content that | is optimized for its reward function and now binges on it | 24/7..."_ >< | erdos4d wrote: | Even if the algorithms were here to do that job, where will the | hardware come from? I'm staring at almost a decade of | commercial processor (i7) lineage right now and the jump has | been from 4 to 6 cores with no change in clock speed (maybe the | newer one is even slower actually). There definitely won't be | any singularity jumping off unless today's hardware gets | another few decades of Moore's law, and that is not happening. | dotnet00 wrote: | It's a bit dishonest to be looking at specifically the | manufacturer who spent most of the past decade enjoying its | effective monopoly on desktop CPUs as a reference for how | computers have improved. Even moreso since 4 and 6 core CPUs | are not representative of the high end systems used to train | even current state of the art ML models. | armchair_ wrote: | Well, the code that was published alongside the article is | written in Python and CUDA, so you're not looking at the | right kind of processor to start. | | My 5-year-old, consumer-grade GPU does 1.5 MHz * 2300 cores, | whereas the equivalent released this year does 1.7 Mhz * 8900 | cores. Granted, not the best way to measure GPU performance, | but it is roughly keeping pace with Moore's law, and it's | going to be a better indicator of the future than Intel CPU | capabilities, especially for machine learning applications. | erdos4d wrote: | So you are saying that we can get another 3 decades of | Moore's law by switching to GPUs made of silicon rather | than CPUs made of silicon? Well fuck, problem solved then. | I was completely unaware it was so easy. | VyperCard wrote: | Yeah, but mostly for matrix multiplications ___________________________________________________________________ (page generated 2022-04-13 23:00 UTC)