[HN Gopher] "Attention", "Transformers", in Neural Network "Larg...
       ___________________________________________________________________
        
       "Attention", "Transformers", in Neural Network "Large Language
       Models"
        
       Author : macleginn
       Score  : 21 points
       Date   : 2023-12-24 21:10 UTC (1 hours ago)
        
 (HTM) web link (bactra.org)
 (TXT) w3m dump (bactra.org)
        
       | low_tech_love wrote:
       | Really interesting, I like the kind of "stream of consciousness"
       | approach to the content, it's refreshing. What's also interesting
       | is the fact that the author felt the need to apologize and
       | preface it with some forced deference due to some kind of
       | internet bashing he certainly received. I hope this doesn't
       | discourage him to keep publishing his notes (although I think it
       | will). Why are we getting so human-phobic?
        
         | defrost wrote:
         | It's an understanderable deference when stumbling through a
         | huge new field and its freshly minted jargon when tidying up
         | and tying the new jargon to long standing terms in older
         | fields.
         | 
         | "As near as I can tell when the new guard says X they're pretty
         | much talking about what we called Y"
         | 
         | Does 'attention' in the AI bleeding edge really correspond to
         | kernal smoothing | mapping attenuation | damping ?
         | 
         | This is (one of) the elephants in a darkened room that Cosma is
         | groping around and showing his thoughts as he goes.
         | 
         | > I hope this doesn't discourage him to keep publishing his
         | notes
         | 
         | Doubtful, aside from the inevitable attenuation with age, he's
         | been airing his thoughts for at least two decades, eg: his
         | wonderful little:
         | 
         |  _A Rare Blend of Monster Raving Egomania and Utter Batshit
         | Insanity_ (2002)
         | 
         | http://bactra.org/reviews/wolfram/
        
         | panarchy wrote:
         | It is nice and it's interesting how if you go read stuff like
         | Einstein's general relativity paper you (or at least I did)
         | find that it's actually quite similar and not so dense.
        
       | brcmthrowaway wrote:
       | Biggest takeaway: extraction of prompts seems to be complete
       | bullshit.
        
       | haltist wrote:
       | This person doesn't understand that large neural networks are
       | somewhat conscious and a stepping stone to AGI. Why else would
       | OpenAI be worth so much money if it wasn't a stepping stone to
       | AGI? No one can answer this without making it obvious they do not
       | understand that large numbers can be conscious and sentient.
       | Checkmate atheists.
        
         | ChainOfFools wrote:
         | I probably would agree with the unsnarkified version of what
         | you're saying to some extent, but I think it's worth mentioning
         | that the argument you seem to be dismissing can take a much
         | stronger form, questioning latent premises about free will by
         | proposing that _neither_ computers nor humans are sentient,
         | that they are both entirely deterministic and utimately amount
         | to interference patterns of ancient thermodynamic gradients
         | created in the formation of the universe.
        
       | seydor wrote:
       | And what do the different heads represent? Why are query, key,
       | and values simply linear transforms of the input.
        
       ___________________________________________________________________
       (page generated 2023-12-24 23:00 UTC)