[HN Gopher] Socratic Models - Composing Zero-Shot Multimodal Rea...
       ___________________________________________________________________
        
       Socratic Models - Composing Zero-Shot Multimodal Reasoning with
       Language
        
       Author : parsadotsh
       Score  : 67 points
       Date   : 2022-04-10 16:51 UTC (6 hours ago)
        
 (HTM) web link (socraticmodels.github.io)
 (TXT) w3m dump (socraticmodels.github.io)
        
       | mountainriver wrote:
       | This is really awesome, multimodal is definitely where
       | transformers are headed and holds the promise of solving a lot of
       | the grounding issues we see with the current sota
        
         | robbedpeter wrote:
         | Elon's robots might actually work out, at least in software.
         | 
         | This type of methodology, doing meta-cognitive programming by
         | linking together different models, is awesome. They're
         | constructing low resolution imitations of brains - gpt-3 and
         | BERT and the like can do things that no individual model can
         | achieve. A predicate logic layer can document and explain
         | decision history, and the other modules start to resemble
         | something like the subconscious mind.
        
       | nynx wrote:
       | This is super impressive. Transformers have consistently done
       | better than almost anyone thought.
       | 
       | I still hold the opinion that we're going to need to move to
       | spiking neuron (SNN) models in the future to keep growing the
       | networks. Spiking networks require lots of storage, but a lot,
       | lot less compute. They also propagate additional information in
       | the _timing_ of the spikes, not just the values. There are a lot
       | of low-hanging fruit in SNNs and I think people are still trying
       | to copy biological systems too much.
       | 
       | Unfortunately, the main issue with SNNs is that no one has
       | figured out a way to train them as effectively as ANNs.
        
         | vagabund wrote:
         | The comments of every ML paper posted on this site are
         | dominated by people either baselessly discounting the results
         | as a party trick or illusion, or shoehorning in their
         | conjecture about what approach the field is overlooking.
         | 
         | As someone just trying to learn more about the implications of
         | new research, I find myself resorting to /r/machinelearning, or
         | even twitter threads, to get timely and informed discussions.
         | That's a shame, given what HN sets out to be.
        
           | ceeplusplus wrote:
           | As a community grows it attracts people who don't have the
           | same background that drew the original members of the
           | community together, so it becomes inevitable to see this kind
           | of layman commentary. I've seen it happen to r/hardware which
           | has been taken over by gamers with no CS background and AMD
           | shareholders when it used to have a lot of knowledgable
           | people commenting.
        
             | nynx wrote:
             | I don't claim to be an expert, but I actually do
             | undergraduate neuromorphic computing research. So, I don't
             | know much, but I do know a little about what I'm talking
             | about.
        
           | mountainriver wrote:
           | As an ML engineer I found the comment insightful. I agree HN
           | takes a critical approach to list ML but that's largely
           | because there's been so much snake oil with it
        
           | nynx wrote:
           | I'm certainly not discounting the results and I don't see
           | anything wrong with suggesting what I think would generally
           | be a good path to look at in the future.
        
             | vagabund wrote:
             | It's not wrong per se, and I'm obviously in no place to
             | police the discussion, but it's only tangentially related
             | to the post and often clouds out what would be a more
             | pointed deliberation over this research.
             | 
             | Maybe I'm expecting too much of HN, but I've seen these
             | same two top level comments under myriad ML posts.
             | 
             | Sorry for the meta-discussion that's gotten us further away
             | from this really remarkable paper.
        
               | nynx wrote:
               | Point taken, I do agree with you that it's probably best
               | to stay on topic in these kinds of posts.
        
           | gwern wrote:
           | Don't forget /r/mlscaling!
        
         | derefr wrote:
         | > a lot of storage
         | 
         | Is this fundamental, or just a problem with mapping these
         | models to our current serially-bottlenecked compute
         | architectures? Could a move to "hyperconverged infrastructure
         | in-the-small" -- striping DRAM or NVMe and tiny RISC cores
         | together on a die, where each CPU gets its own storage (or, you
         | might say, where each small cluster of storage cells has its
         | own tiny CPU attached), such that one stick has millions of
         | independent+concurrent [+slow+memory-constrained] processors --
         | resolve these difficulties?
        
           | nynx wrote:
           | They require roughly the same amount of storage as modern ANN
           | networks except that "neurons/synapses" may have some
           | additional state that needs to be stored. Compared to the
           | compute they require in relation to the compute needed for
           | large-scale ANNs though, the storage is a lot.
        
       | arjvik wrote:
       | We've come to the consensus that large language models are just
       | stochastic parrots... What makes us think that we can achieve a
       | higher level of intelligence by putting them in conversation?
       | 
       | I think the next step in NLP will be a drastic innovation on
       | today's learning model.
        
         | gjm11 wrote:
         | "Stochastic parrots" -- have you seen, e.g., the examples in
         | the PaLM paper of how it does on "chained inference" tasks? I
         | don't see how you can classify that as mere parroting.
        
         | robbedpeter wrote:
         | There is no such consensus. Transformers navigate problem
         | spaces with various mechanisms that include recursion, and
         | multi-pass inference means the depth can be arbitrary. This
         | means that models pick up on the functions that generate
         | answers, not simple statistical relationships you see in Markov
         | chains.
         | 
         | "Stochastic parrot" is a derogatory term and I've never seen
         | anyone who actually understands the technology use that phrase
         | unironically. If anything, it's a shibboleth for bias or
         | ignorance.
        
         | mountainriver wrote:
         | We have not come to that consensus and large language models
         | display really interesting capabilities like few shot learning,
         | which before we thought would require a widely different
         | architecture
        
         | moconnor wrote:
         | This is not the consensus among ML researchers. Transformers
         | are showing strong generalisation[1] and their performance
         | continues to surprise us as they scale[2].
         | 
         | The Socratic paper is not about "higher intelligence", it's
         | about demonstrating useful behaviour purely by connecting
         | several large models via language.
         | 
         | [1] https://arxiv.org/abs/2201.02177
         | 
         | [2] https://arxiv.org/abs/2204.02311
        
         | exdsq wrote:
         | I asked something similar previously on HN and a researcher in
         | the field said that scaling size/computation actually does keep
         | showing significant improvements
        
       ___________________________________________________________________
       (page generated 2022-04-10 23:00 UTC)