[HN Gopher] Socratic Models - Composing Zero-Shot Multimodal Rea... ___________________________________________________________________ Socratic Models - Composing Zero-Shot Multimodal Reasoning with Language Author : parsadotsh Score : 67 points Date : 2022-04-10 16:51 UTC (6 hours ago) (HTM) web link (socraticmodels.github.io) (TXT) w3m dump (socraticmodels.github.io) | mountainriver wrote: | This is really awesome, multimodal is definitely where | transformers are headed and holds the promise of solving a lot of | the grounding issues we see with the current sota | robbedpeter wrote: | Elon's robots might actually work out, at least in software. | | This type of methodology, doing meta-cognitive programming by | linking together different models, is awesome. They're | constructing low resolution imitations of brains - gpt-3 and | BERT and the like can do things that no individual model can | achieve. A predicate logic layer can document and explain | decision history, and the other modules start to resemble | something like the subconscious mind. | nynx wrote: | This is super impressive. Transformers have consistently done | better than almost anyone thought. | | I still hold the opinion that we're going to need to move to | spiking neuron (SNN) models in the future to keep growing the | networks. Spiking networks require lots of storage, but a lot, | lot less compute. They also propagate additional information in | the _timing_ of the spikes, not just the values. There are a lot | of low-hanging fruit in SNNs and I think people are still trying | to copy biological systems too much. | | Unfortunately, the main issue with SNNs is that no one has | figured out a way to train them as effectively as ANNs. | vagabund wrote: | The comments of every ML paper posted on this site are | dominated by people either baselessly discounting the results | as a party trick or illusion, or shoehorning in their | conjecture about what approach the field is overlooking. | | As someone just trying to learn more about the implications of | new research, I find myself resorting to /r/machinelearning, or | even twitter threads, to get timely and informed discussions. | That's a shame, given what HN sets out to be. | ceeplusplus wrote: | As a community grows it attracts people who don't have the | same background that drew the original members of the | community together, so it becomes inevitable to see this kind | of layman commentary. I've seen it happen to r/hardware which | has been taken over by gamers with no CS background and AMD | shareholders when it used to have a lot of knowledgable | people commenting. | nynx wrote: | I don't claim to be an expert, but I actually do | undergraduate neuromorphic computing research. So, I don't | know much, but I do know a little about what I'm talking | about. | mountainriver wrote: | As an ML engineer I found the comment insightful. I agree HN | takes a critical approach to list ML but that's largely | because there's been so much snake oil with it | nynx wrote: | I'm certainly not discounting the results and I don't see | anything wrong with suggesting what I think would generally | be a good path to look at in the future. | vagabund wrote: | It's not wrong per se, and I'm obviously in no place to | police the discussion, but it's only tangentially related | to the post and often clouds out what would be a more | pointed deliberation over this research. | | Maybe I'm expecting too much of HN, but I've seen these | same two top level comments under myriad ML posts. | | Sorry for the meta-discussion that's gotten us further away | from this really remarkable paper. | nynx wrote: | Point taken, I do agree with you that it's probably best | to stay on topic in these kinds of posts. | gwern wrote: | Don't forget /r/mlscaling! | derefr wrote: | > a lot of storage | | Is this fundamental, or just a problem with mapping these | models to our current serially-bottlenecked compute | architectures? Could a move to "hyperconverged infrastructure | in-the-small" -- striping DRAM or NVMe and tiny RISC cores | together on a die, where each CPU gets its own storage (or, you | might say, where each small cluster of storage cells has its | own tiny CPU attached), such that one stick has millions of | independent+concurrent [+slow+memory-constrained] processors -- | resolve these difficulties? | nynx wrote: | They require roughly the same amount of storage as modern ANN | networks except that "neurons/synapses" may have some | additional state that needs to be stored. Compared to the | compute they require in relation to the compute needed for | large-scale ANNs though, the storage is a lot. | arjvik wrote: | We've come to the consensus that large language models are just | stochastic parrots... What makes us think that we can achieve a | higher level of intelligence by putting them in conversation? | | I think the next step in NLP will be a drastic innovation on | today's learning model. | gjm11 wrote: | "Stochastic parrots" -- have you seen, e.g., the examples in | the PaLM paper of how it does on "chained inference" tasks? I | don't see how you can classify that as mere parroting. | robbedpeter wrote: | There is no such consensus. Transformers navigate problem | spaces with various mechanisms that include recursion, and | multi-pass inference means the depth can be arbitrary. This | means that models pick up on the functions that generate | answers, not simple statistical relationships you see in Markov | chains. | | "Stochastic parrot" is a derogatory term and I've never seen | anyone who actually understands the technology use that phrase | unironically. If anything, it's a shibboleth for bias or | ignorance. | mountainriver wrote: | We have not come to that consensus and large language models | display really interesting capabilities like few shot learning, | which before we thought would require a widely different | architecture | moconnor wrote: | This is not the consensus among ML researchers. Transformers | are showing strong generalisation[1] and their performance | continues to surprise us as they scale[2]. | | The Socratic paper is not about "higher intelligence", it's | about demonstrating useful behaviour purely by connecting | several large models via language. | | [1] https://arxiv.org/abs/2201.02177 | | [2] https://arxiv.org/abs/2204.02311 | exdsq wrote: | I asked something similar previously on HN and a researcher in | the field said that scaling size/computation actually does keep | showing significant improvements ___________________________________________________________________ (page generated 2022-04-10 23:00 UTC)