[HN Gopher] I disagree with Geoff Hinton regarding "glorified au...
       ___________________________________________________________________
        
       I disagree with Geoff Hinton regarding "glorified autocomplete"
        
       Author : magoghm
       Score  : 122 points
       Date   : 2023-11-18 15:39 UTC (7 hours ago)
        
 (HTM) web link (statmodeling.stat.columbia.edu)
 (TXT) w3m dump (statmodeling.stat.columbia.edu)
        
       | 11235813213455 wrote:
       | I still want to read a ChatGPT text that doesn't look stupidly
       | verbose
        
         | chpatrick wrote:
         | Why don't you ask it not to be?
        
           | coffeebeqn wrote:
           | The default style really annoys me with its verbosity and
           | overuse of numbered lists. Describing a tone and length of
           | responses and anything else you want - think step by step,
           | first principals, show don't tell - all work pretty well.
           | 
           | You can even ask it to only reply with a Boolean true or
           | false - which certainly doesn't play to its strengths
        
             | satuke wrote:
             | Actually, that's a by-product of RLHF. A base model is
             | usually not that verbose.
        
         | JoshTko wrote:
         | Use custom instructions. I use "Answers should be very concise
         | by default. Never provide disclosures. Answers should get to
         | the point immediately." and get super concise answers. If I
         | want more detail I just follow up with can you expand on X.
        
         | xcv123 wrote:
         | Follow up with the prompt "rewrite that so that it doesn't look
         | stupidly verbose" and it will do exactly as requested.
        
         | WillPostForFood wrote:
         | Give this GPT called Briefly a try. Responses come back as
         | bullet point lists.I prefer more verbose, but interesting to
         | see this in contrast.
         | 
         | https://chat.openai.com/g/g-LNsEQH5rz-briefly
        
       | mbrumlow wrote:
       | > If you want to be really good, you have to understand what's
       | being said. That's the only way.
       | 
       | This is simply not true. Predicting the next letter or word, or
       | id you abstract it away from things that mean something to you,
       | like the next color of a block in a long chain of colored blocks.
       | You would realize that all we are doing is using statistics to
       | predict what the next item might be. There simply is no need or
       | requirement for the system doing the prediction to understand.
        
         | albertzeyer wrote:
         | > You would realize that all we are doing is using statistics
         | to predict what the next item might be.
         | 
         | So what does "understanding" really means then?
         | 
         | "Understanding" is not really well defined. Either we (humans)
         | do it, but then LLMs might just do it as well, depending on the
         | definition of "understanding", or we both don't do it. But if
         | no-one is really "understanding" anything, then this definition
         | of "understanding" is maybe not useful.
        
           | dboreham wrote:
           | Bingo
        
           | Joeri wrote:
           | Nobody actually understands how humans actually understand
           | something, just like nobody actually understands how LLMs do
           | what they do.
           | 
           | Everybody opining about it is doing just that: offering an
           | opinion. Geoff Hinton's opinion is worth more than someone
           | else's, but it is still an opinion.
        
             | im_down_w_otp wrote:
             | I don't know about the human part, but we absolutely
             | understand how LLMs do what they do. They're not magic.
        
               | og_kalu wrote:
               | No we don't. No it's not "magic". No we don't understand
               | what the black box is doing.
        
               | brookst wrote:
               | For some values of "we"
        
               | og_kalu wrote:
               | For every value of we. "I understand the internals of
               | GPT" is the fastest way to demonstrate you have no idea
               | what you're talking about.
        
               | chpatrick wrote:
               | We understand the architecture but we don't understand
               | the weights.
        
               | usrbinbash wrote:
               | We also understand, down to a very very very microscopic
               | level, how neurons work. We also know a helluva lot about
               | the architecture of the brain. Does that mean we can
               | explain our own intelligence, how our minds actually
               | work? Nope.
        
           | mannykannot wrote:
           | I feel that LLMs raise some very interesting challenges for
           | anyone trying to figure out what it means to understand
           | something and how we do it, but I am not yet ready to agree
           | with Hinton.
           | 
           | For example, we are aware that some, but by no means all, of
           | what people say is about an external world that may or may
           | not conform to what the words say. We can also doubt that we
           | have understood things correctly, and take steps to either
           | confirm or refute our opinions. We see ourselves as entities
           | in an external reality containing other individuals who also
           | do this, and that we, and they, have a limited ability to
           | influence what happens in that world. Do LLMs do these
           | things, or is what they produce a result of having a lot of
           | information about the purely formal properties of human
           | language use, independently of semantics?
        
             | hackinthebochs wrote:
             | >I feel that LLMs raise some very interesting challenges
             | for anyone trying to figure out what it means to understand
             | something and how we do it, but I am not yet ready to agree
             | with Hinton.
             | 
             | Agreed. What LLMs say about understanding deserves a lot
             | more attention than it has received. I wrote down some of
             | my thoughts on the matter:
             | 
             | https://www.reddit.com/r/naturalism/comments/1236vzf
             | 
             | >Do LLMs do these things, or is what they produce a result
             | of having a lot of information about the purely formal
             | properties of human language use, independently of
             | semantics?
             | 
             | These two points aren't necessarily in opposition, and
             | understanding why is I think key to solving a lot of
             | important problems around intelligence, sentience, etc. To
             | compute is to operate on formal properties. But this
             | doesn't exclude semantic properties from having causal
             | relevance to the behavior of the system. What we need is a
             | way to conceptualize how a system can have multiple related
             | descriptions at different levels. A description at the
             | level of semantics doesn't exclude a description in terms
             | of formal properties or vice versa. I think of it in terms
             | of constraints: the higher level descriptions constrain the
             | lower level behavior. What the computational description
             | does is ensures the higher level semantic constraint is
             | maintained. It does this by the particular space of
             | computational dynamics it follows. Essentially, the
             | information that picks out this programs space of branching
             | dynamics embeds the semantic description in question, and
             | this description realizes the computational dynamic
             | necessary to maintain the higher level semantic constraint.
             | Rather than semantics being in opposition to formal
             | properties, they are two sides of the same coin.
        
         | satuke wrote:
         | Isn't that exactly how humans learn to respond to stimuli?
         | Don't we just try to predict the best next response to
         | everything? Yes, It's statistics but the fun part is nobody is
         | writing this statistical function by hand.
        
           | civilized wrote:
           | LLMs don't have a concept of "best". Only most likely in what
           | they've been trained on.
           | 
           | I think LLMs ultimately just take imitation to a creative and
           | sophisticated extreme. And imitation simply doesn't comprise
           | the whole of human intelligence at all, no matter how much it
           | is scaled up.
           | 
           | The sophistication of the imitation has some people confused
           | and questioning whether everything can be reduced to
           | imitation. It can't.
           | 
           | The ability to imitate seeking a goal isn't identical to the
           | ability to seek a goal.
           | 
           | The ability to imitate solving a problem isn't identical to
           | the ability to solve a problem.
           | 
           | Imitation is very useful, and the reduction of everything to
           | imitation is an intriguing possibility to consider, but it's
           | ultimately just wrong.
        
             | lordnacho wrote:
             | Ok, so now we need an example that separates humans from
             | LLMs?
             | 
             | I struggle to think of one, maybe someone on HN has a good
             | example.
             | 
             | Eg if I'm in middle school and learning quadratic
             | equations, am I imitating solving the problem by plugging
             | in the coefficients? Or am I understanding it?
             | 
             | Most of what I see coming out of chatGPT and copilot could
             | be said to be either. If you're generous, it's
             | understanding. If not, it's imitation.
        
               | civilized wrote:
               | It is very easy to separate humans from LLMs. Humans
               | created math without being given all the answers
               | beforehand. LLMs can't do that yet.
               | 
               | When an LLM can create math to solve a problem, we will
               | be much closer to AGI.
        
               | abm53 wrote:
               | You can ask ChatGPT to solve maths problems which are not
               | in its training data, and it will answer an astonishing
               | amount of them correctly.
               | 
               | The fact that we have trained it on examples of human-
               | produced maths texts (rather than through interacting
               | with the world over several millennia) seems like more of
               | an implementation detail and not piece of evidence about
               | whether it has "understood" or not.
        
               | IshKebab wrote:
               | _Some_ humans created maths. And it took thousands of
               | years of thinking and interaction with the real world.
               | 
               | Seems like goalpost moving to me.
               | 
               | I think the _real_ things that separate LLMs from humans
               | at the moment are:
               | 
               | * Humans can do online learning. They have long term
               | memory. I guess you could equate evolution to the
               | training phase of AI but it still seems like they don't
               | have quite the same on-line learning capabilities as us.
               | This is what probably prevents them from doing things
               | like inventing maths.
               | 
               | * They seem to be incapable of saying "I don't know". Ok
               | to be fair lots of humans struggle with this! I'm sure
               | this will be solved fairly soon though.
               | 
               | * They don't have a survival instinct that drives
               | proactive action. Sure you can _tell_ them what to do but
               | that doesn 't seem quite the same.
        
               | naveen99 wrote:
               | Interestingly some humans will admit to not knowing but
               | are allergic to admitting being wrong (and can get fairly
               | vindictive if forced to admit being wrong).
               | 
               | LLM's actually admit to being wrong easily, but aren't
               | great at introspection and confabulate too often. also
               | their Meta cognition is poor still.
        
               | IshKebab wrote:
               | I guess LLM's don't have the social pressure to avoid
               | admitting errors. And those sort of interactions aren't
               | common in text so they don't learn them strongly.
               | 
               | Also ChatGPT is trained specifically to be helpful and
               | subservient.
        
               | civilized wrote:
               | About this goalpost moving thing. It's become very
               | popular to say this, but I have no idea what it's
               | supposed to mean. It's like a metaphor with no underlying
               | reality.
               | 
               | Did a wise arbiter of truth set up goalposts that I
               | moved? I guess I didn't get the memo.
               | 
               | If the implied claim is "GPT would invent math too given
               | enough time", go ahead and make that claim.
        
               | IshKebab wrote:
               | > Did a wise arbiter of truth set up goalposts that I
               | moved?
               | 
               | Collectively, yes. The criticism of AI has always been
               | "well it isn't AI because it can't do [thing just beyond
               | its abilities].
               | 
               | Maybe individually your goalpost hasn't moved, and as
               | soon as it invents some maths you'll say "yep, it's
               | intelligent" (though I strongly doubt it). But
               | collectively the naysayers in general will find another
               | reason why it's not _really_ intelligent. Not like us.
               | 
               | It's very tedious.
        
             | corethree wrote:
             | You need to think deeper.
             | 
             | There are levels of sophistication in "imitation". It
             | follows a gradient. At the low end of this gradient is a
             | bad imitation.
             | 
             | At the high end of this gradient is a perfect imitation.
             | Completely indistinguishable from what it's imitating.
             | 
             | If an imitation is perfect than is it really an imitation?
             | 
             | If I progressively make my imitation more and more accurate
             | am I progressively building an imitation or am I
             | progressively building the real thing?
             | 
             | See what's going on here? You fell for a play on words.
             | It's a common trope. Sometimes language and vocabulary
             | actually tricks the brain into thinking in a certain
             | direction. This word "imitation" is clouding your thoughts.
             | 
             | Think about it. A half built house can easily be called an
             | imitation of a real house.
        
             | canjobear wrote:
             | > LLMs don't have a concept of "best". Only most likely in
             | what they've been trained on.
             | 
             | At temperature 0 they are effectively producing the token
             | that maximizes a weighted sum of base LM probability and
             | model reward.
        
             | genman wrote:
             | I don't think that also humans in general have this concept
             | of "best".
             | 
             | But humans are able to build certain routines within their
             | own system to help them to rationalize.
        
           | RandomLensman wrote:
           | Yes, if you are in the no free will school of thought, then
           | that would be what humans do.
        
           | usrbinbash wrote:
           | > Isn't that exactly how humans learn to respond to stimuli?
           | 
           | Maybe it is, maybe it isn't. Maybe we are "just" an
           | incredibly powerful prediction engine. Or maybe we work from
           | a completely different modus operandi, and our ability to
           | predict things is an emergent capability of it.
           | 
           | The thing is, no one actually knows what makes us
           | intelligent, or even how to define intelligence for that
           | matter.
        
         | randyrand wrote:
         | the human brain is just a statistical machine.
        
           | brookst wrote:
           | Odds are you're right
        
         | continuational wrote:
         | I think the most useful definition of understanding something
         | is that you can explain it and use it in context.
         | 
         | ChatGPT routinely does both.
        
           | jstummbillig wrote:
           | And while AI gets better and better and we will remain as
           | touchy as ever about abstract concepts that make us oh so
           | human, how about we say it just can't be understanding,
           | unless a human does, eh, it.
        
           | BobaFloutist wrote:
           | Someone sufficiently fast and skilled at googling can explain
           | and use in context a lot of things that they don't really
           | properly understand.
           | 
           | So unless you're saying that the _composite_ of the googler
           | and of google understand something that neither does
           | individually, your definition has some holes.
        
             | continuational wrote:
             | This is a variation of the Chinese room argument.
             | 
             | If you consider understanding an observable property, then
             | the Chinese room in aggregate displays understanding of
             | Chinese.
             | 
             | Would you say that humans understand nothing, because atoms
             | don't understand anything, and we're made up of atoms?
        
               | BobaFloutist wrote:
               | I would say that there is a stronger consensus that a
               | human being can be reasonably described as a single
               | entity than a human being using a reference resource.
               | 
               | A more apt comparison to my mind would be if a human
               | being can be described as personally exerting strong
               | nuclear force, just because their subatomic particles do,
               | which I would happily answer "no."
        
           | skepticATX wrote:
           | How about this: understanding is the ability to generalize
           | knowledge and apply it to novel scenarios.
           | 
           | This definition is something that humans, and animals for
           | that matter, do every day - both in small and large ways. And
           | this is something that current language models aren't very
           | good at.
        
             | continuational wrote:
             | What is the test for this?
             | 
             | I taught it Firefly, which is an undocumented programming
             | language I'm working on, through conversion.
             | 
             | I find it's a lot quicker than any human at picking up
             | syntax and semantics, both in real time and in number of
             | messages, and makes pretty good attempts at writing code in
             | it, as much as you could expect from a human programmer.
             | 
             | That is, until you run out of context - is this what you
             | mean?
        
               | skepticATX wrote:
               | There are plenty of results supporting my assertion; but
               | the tests must be carefully designed. Of course, LLMs are
               | not databases that store exact answers - so it's not
               | enough to ask it something that it hasn't seen, if it's
               | seen something similar (as is likely the case with your
               | programming language).
               | 
               | One benchmark that I track closely is ConceptARC, which
               | aims to test generalization and abstraction capabilities.
               | 
               | Here is a very recent result that uses the benchmark:
               | https://arxiv.org/abs/2311.09247. Humans correctly solved
               | 91% of the problems, GPT-4 solved 33%, and GPT-4V did
               | much worse than GPT-4.
        
               | continuational wrote:
               | I wouldn't be surprised if GPT-4 is not too good at
               | visual patterns, given that it's trained on text.
               | 
               | Look at the actual prompt in figure 2. I doubt humans
               | would get a 91% score on that.
        
               | stubybubs wrote:
               | I gave it the three lightbulbs in a closet riddle.
               | 
               | https://puzzles.nigelcoldwell.co.uk/seven.htm
               | 
               | The key complication is "once you've opened the door, you
               | may no longer touch a switch." It gets this. There are
               | many examples of it written out on the web. When I give
               | it a variation and say "you can open the door to look at
               | the bulbs and use the switches all you want" and it is
               | absolutely unable to understand this. To a human it's
               | simple: look at the bulbs and flick the switches. It kept
               | giving me answers about using a special lens to examine
               | the bulbs, using something to detect heat. I explained it
               | in many ways and tried several times. I was paying for
               | GPT-4 at the time as well.
               | 
               | I would not consider this thinking. It's unable to make
               | this simple abstraction from its training data. I think 4
               | looks better than 3 simply because it's got more data,
               | but we're reaching diminishing returns on that, as has
               | been stated.
        
         | Kiro wrote:
         | How do you define understanding when predicting the next color?
        
         | karaterobot wrote:
         | > You would realize that all we are doing is using statistics
         | to predict what the next item might be.
         | 
         | I agree that Hinton's original quote doesn't make sense to me
         | either. I suspect he would leverage the phrase "really good" to
         | explain the difference between ChatGPT and, say, a Markov
         | chain. I think that's a little disingenuous, if that's how he
         | means it, but I don't know if I'm right about that.
         | 
         | But I also do not agree that humans use statistics to predict
         | what the next item in a series might be. As evidence, there is
         | the classic example of asking people to predict the next coin
         | toss in the series: "heads, heads, heads, heads, heads, heads,
         | heads...". They'll either guess heads because it's come up so
         | many times already, or because they assume the coin isn't fair,
         | or tails because it's "overdue" to come up, but none of those
         | are based on statistics _per se_.
        
         | troelsSteegin wrote:
         | From the article, Gelman's money quote is this: " So I'm not
         | knocking auto-complete; I'm just disagreeing with Hinton's
         | statement that "by training something to be really good at
         | predicting the next word, you're actually forcing it to
         | understand." As a person who does a lot of useful associative
         | reasoning and also a bit of logical understanding, I think
         | they're different, both in how they feel and also in what they
         | do."
        
         | hackerlight wrote:
         | Why is "using statistics" mutually exclusive with
         | "understanding"? It would help to carefully define terms. Note
         | that "understanding" doesn't mean it's necessarily conscious.
         | 
         | These systems learn high-level representations/abstractions of
         | concepts we humans also use as part of our cognition. The
         | concept of an object, an intuitive physics, the role of
         | specific objects. I don't criticize him for using the word
         | "understanding" to describe this.
        
         | porknubbins wrote:
         | As a translator I have come to think of it like this- (without
         | the need for defining what "understanding" means) the ability
         | to translate correctly depends on the size of your context. So
         | humans and LLMs are both on the same spectrum with different
         | context sizes of their training set and all facts known in the
         | universe (for a theoretical perfect human) respectively.
         | 
         | Doing moderately technical translations, state of the art LLMs
         | may get it about 99% right but you find about 1 word per page
         | where they would have needed to "zoom out" to a wider context
         | to choose the word correctly. Often its an important word that
         | is missed.
        
         | usrbinbash wrote:
         | > You would realize that all we are doing is using statistics
         | to predict what the next item might be.
         | 
         | Two questions:
         | 
         | 1. How can humans come up with novel ideas then? Statistically
         | predicting the next item means I am limited to the training
         | set. How do humans formulate new concepts in this scenario?
         | 
         | 2. Why can humans learn from their own output then? If all we
         | do is statistically predicting the next token, then, by
         | necessity, feeding our own output back into the model (aka.
         | letting humans learn from it) should degrade human capabilities
         | by overfitting to the training set. But we observe exactly the
         | opposite.
        
           | micromacrofoot wrote:
           | > How can humans come up with novel ideas then?
           | 
           | hallucinations, emotions
           | 
           | > Why can humans learn from their own output then?
           | 
           | if we don't, we suffer
        
           | jameshart wrote:
           | > Statistically predicting the next item means I am limited
           | to the training set
           | 
           | Not at all. Extrapolation is perfectly possible in a purely
           | predictive model. It's one of the things GPTs are _best_ at.
           | 
           | In the stream of tokens output by an LLM it's completely
           | possible for new concepts to emerge, and for it then to
           | continue to use and build on them in the remainder of the
           | stream. You see this simply executed in programming contexts
           | with LLMs where it is able to declare a novel function and
           | then use it.
        
             | usrbinbash wrote:
             | > with LLMs where it is able to declare a novel function
             | and then use it.
             | 
             | Novel as in "implements a new algorithm that has never seen
             | in any form and is actually an improvement over existing
             | methodology"?
             | 
             | Here is a little thought experiment: If all the training
             | data in the set says that manned, powered flight is
             | impossible, is a statistical prediction engine trained on
             | that data capable of developing an airplane?
             | 
             | In other words, could a very powerful stochastic parrot do
             | what the Wright Brothers did?
             | 
             | My best guess is that no, it cannot.
        
               | lovecg wrote:
               | Now I'm wondering if birds didn't exist how much longer
               | would it take us to catch on that flight is possible at
               | all.
        
         | canjobear wrote:
         | The best "statistic" you could compute to predict the next
         | token would be the meaning of the prompt. A predictor computing
         | meaning would outperform one that is not.
        
         | xcv123 wrote:
         | By "understanding" he means it builds a model of higher order
         | abstractions over the text in order to model the meaning and
         | context of each token in relation to other tokens. This is more
         | sophisticated than a Markov process.
        
           | Jensson wrote:
           | > This is more sophisticated than a Markov process.
           | 
           | Nothing prevents a markov process from having a world model.
           | A markov process lacks state and generates a sequence based
           | on statistics and previous entries, so LLMs as they are now
           | are markov processes.
           | 
           | Many of the dumb behaviors we see from LLMs today comes from
           | their lack of internal state between tokens, so it don't
           | remember what reason it had for generating the previous token
           | and that means it can easily generate inconsistent answers.
           | So LLMs being a markov process is an important point to
           | highlight since it makes its thinking very different from how
           | humans think.
        
             | xcv123 wrote:
             | The transformer architecture is NOT a Markov process, by
             | mathematical definition of a Markov process. This is not
             | even debatable. It's a mathematical fact.
             | 
             | > Many of the dumb behaviors we see from LLMs today comes
             | from their lack of internal state between tokens, so it
             | don't remember what reason it had for generating the
             | previous token and that means it can easily generate
             | inconsistent answers
             | 
             | The attention mechanism in the transformer architecture
             | models relations between tokens within the context window,
             | and does the exact opposite of what you are describing
             | here. This is one aspect of LLMs that violates the Markov
             | property.
        
               | Jensson wrote:
               | > The transformer architecture is NOT a Markov process,
               | by mathematical definition of a Markov process. This is
               | not even debatable. It's a mathematical fact.
               | 
               | What, yes it is.
               | 
               | > The attention mechanism in the transformer architecture
               | models relations between tokens within the context
               | window, and does the exact opposite of what you are
               | describing here. This is one aspect of LLMs that violates
               | the Markov property.
               | 
               | The context window is finite, so that is the previous
               | step. You know the dumb markov chains that are based on
               | bag of words? They also look several words back, they
               | don't just go based on a single word. LLMs are just that
               | but with a way larger lookback and some extra logic
               | there, but none of that changes the fundamental parts to
               | make it no be a markov process.
               | 
               | With a large enough context size you could argue it is
               | now fundamentally different in practice, but in theory it
               | is the same. There is no "hidden state", its just the
               | previous n words defines the next word.
        
               | xcv123 wrote:
               | If you define the state broadly enough to include the
               | state of the entire machine itself, including all of its
               | internal representations, weights, activations, etc, then
               | you are playing a funny trick here.
               | 
               | By the same reasoning, a human brain is also Markov
               | process.
               | 
               | What you are doing here is a vast oversimplification and
               | it is practically useless for understanding how LLMs
               | work.
        
       | tayo42 wrote:
       | This kind of was on my mind recently, I was wondering, when I
       | speak am I just spitting out the next word that makes sense or is
       | there more to it.
       | 
       | I think there is for people, I can think before I speak, I can
       | plan out my thought entirely before turning it into words. Each
       | invocation of the chat bot/llm is a new set of probabilities. I
       | can plan what my 2nd token output will be and stick to it. Llm
       | models dont have the ability to think past the first.
       | 
       | To me that is auto completing, just done better, unless models
       | really do have a sense of planning ahead. But I don't think the
       | really complex game playing ai do that either.
        
         | pixl97 wrote:
         | >I can think before I speak, I can plan out my thought entirely
         | before turning it into words.
         | 
         | In theory models can/could do the same thing. Think of the
         | current text output of a model being those thoughts inside your
         | head, you have an internal scratch space you work on ideas,
         | then after you perform 'chain of thought' on them, you output
         | to the world. What you're outputting to the world isn't your
         | directly thoughts, but more of a calculation of what you wanted
         | to achieve.
         | 
         | Really for LLMs you just need to have the model put it's output
         | to an internal buffer, read that buffer and make sure it makes
         | sense, then output that to the end user.
         | 
         | It's probably not fast or compute efficient to do that at this
         | time, but at some point it likely will be.
        
           | drwiggly wrote:
           | >Really for LLMs you just need to have the model put it's
           | output to an internal buffer, read that buffer and make sure
           | it makes sense, then output that to the end user.
           | 
           | Makes sense to what. The LLM doesn't have a goal, other then
           | to spew text that looks like it should be there.
        
             | stevenhuang wrote:
             | The analogy lies in the fact that, much like evolution
             | through natural selection, deliberate intelligence/ability
             | of organisms to comprehend reality is not the objective,
             | but something else entirely is.
             | 
             | For evolution, it's fitness. For LLMs, it's the next token.
             | 
             | Yet despite that, the ability to reason emerges as a means
             | to an end.
        
             | pixl97 wrote:
             | To the terminal or instrumental goal of the statement it is
             | working on.
             | 
             | Question to LLM, "I have one hundred and eleven eggs in the
             | store and another two hundred and twenty two are showing up
             | in an hour, how many eggs will I have in total"
             | 
             | Internal response "this looks like math problem that
             | requires addition. the answer is 333. use a calculator
             | validate 111 + 222. (send 111+222, receive 333). Tool
             | returns 333 validing previous response"
             | 
             | External response: "The answer is 333"
             | 
             | This chain of logic is internally consistent, hence makes
             | sense.
        
           | tayo42 wrote:
           | Yeah thinking about it more, it does seem possible.
           | 
           | Like the attention masking done in transformers, I get why,
           | but I also never understood why you wouldn't want attention
           | to happen, at least about concepts and abstract ideas about
           | what is coming up in the phrase before predicting a word
        
         | TerrifiedMouse wrote:
         | > when I speak am I just spitting out the next word that makes
         | sense or is there more to it.
         | 
         | There is more to it. Specifically you are doing so to advance
         | towards a specific goal. LLMs don't have goals. They just pick
         | from a list of likely tokens - based on their training data -
         | at random to generate the next token.
        
           | brookst wrote:
           | So the externally provided metaprompt doesn't establish a
           | goal that generation works toward?
        
             | TerrifiedMouse wrote:
             | I don't think LLMs work towards anything. It just picks
             | from a list of likely "next tokens" at random.
        
               | brookst wrote:
               | "Likely" as defined by...?
        
               | TerrifiedMouse wrote:
               | Statistics as defined by training data.
               | 
               | https://writings.stephenwolfram.com/2023/02/what-is-
               | chatgpt-...
        
         | nkingsy wrote:
         | I think people get tricked by the forward pass mechanics into
         | thinking a single generation is comparable to a human thought
         | process.
         | 
         | I think we have a llm like mechanism we can employ and lean on
         | as heavily as we like, but we also have an executive function,
         | like thousands of specialized instant Boolean checks, which can
         | adjust and redirect the big talky model's context on the fly.
         | 
         | My hunch is it's turtles all the way down. "Gut feelings" are
         | hyper-optimized ASICS with tiny parameter counts, but all using
         | the same intelligence mechanisms.
         | 
         | Extrapolating from that hunch, we are already witnessing AGI,
         | and in fact we've started at the top.
         | 
         | I believe that current llms are actually far far superior to a
         | human language center and current architectures are more than
         | sufficient in terms of raw intelligence.
         | 
         | The challenge that remains is to understand, train, integrate,
         | and orchestrate all the different flavors of intelligence that
         | humans wield so elegantly and naturally as to make them opaque
         | to our own understanding.
        
       | overactor wrote:
       | I think this misses the point made by Hinton. The point is that,
       | if a model is to reach a certain level of autocomplete, it needs
       | to understand what it is talking about. Imagine the result of
       | what required deep thought from you is included in the training
       | data, along with billions of other such exchanges. At some point
       | the only way to improve training score is to develop the ability
       | to "understand" things. (That is, if the model isn't big enough
       | to memorize the training data.)
       | 
       | Universal autocomplete implies understanding and the only ending
       | question is if our current architectures and model sizes are able
       | to achieve any level of it.
        
         | hopfenspergerj wrote:
         | I agree. We should prompt the model with the statement of the
         | Riemann hypothesis. If the autocomplete is good, the model will
         | output a proof.
        
           | overactor wrote:
           | No human can solve the Riemann Hypothesis. Why do you expect
           | an AI to do it before you consider it able to understand
           | things?
        
             | scarmig wrote:
             | The key bit is constructing a hard, novel proof. The fact
             | that AI doesn't (yet) do this isn't evidence that it
             | doesn't reason, but if it did so it would be strong
             | evidence that it does reason.
             | 
             | (I also take the pessimistic point of view that most humans
             | don't reason, so YMMV.)
        
             | lordnacho wrote:
             | OTOH, doing something that only humans have done thus far
             | would be a huge step in demonstrating understanding.
             | 
             | Does that mean when a computer outputs a new proof it
             | understands?
        
               | corethree wrote:
               | You can do something similar to this without giving it a
               | problem that might be impossible.
               | 
               | Train the LLM on a bunch of things but avoid certain
               | things...Things that humans already know about.
               | 
               | The you query the model about that thing. See if the
               | model can come to the same conclusions humans do. You can
               | actually do this right now with chatGPT.
        
               | lordnacho wrote:
               | So what happens?
        
               | corethree wrote:
               | This is a response from chatGPT. I asked it to come up
               | with something novel and creative. I went through several
               | iterations and eventually it started doing math stuff and
               | I had it focus on that:
               | 
               | Understood, let's propose a distinct mathematical concept
               | within the realm of pure mathematics: "Infinite-
               | Dimensional Fractal Algebra." This new branch of algebra
               | focuses on operations, equations, and functions within an
               | infinitely dimensional fractal space.
               | 
               | In Infinite-Dimensional Fractal Algebra, traditional
               | algebraic operations like addition, multiplication, and
               | exponentiation are redefined and extended to work within
               | an infinite fractal structure. Each point in this fractal
               | space represents a unique algebraic entity, and the
               | relationships between these points follow complex, self-
               | repeating patterns characteristic of fractals.
               | 
               | The concept challenges and expands the traditional
               | boundaries of algebra by integrating the self-similarity
               | and scale invariance of fractals into algebraic
               | operations. For example, solving an equation in this
               | system would involve navigating through the infinitely
               | repeating patterns of the fractal, where each level of
               | the fractal represents a different dimension or layer of
               | the problem.
               | 
               | This approach could offer new insights into complex
               | systems and phenomena that exhibit fractal-like behavior,
               | and potentially lead to breakthroughs in areas like chaos
               | theory, dynamic systems, and perhaps even in
               | understanding the fabric of the universe itself.
               | 
               | To illustrate the concept of Infinite-Dimensional Fractal
               | Algebra, let's consider a simple algebraic operation:
               | addition.
               | 
               | In traditional algebra, adding two numbers is
               | straightforward: \\(3 + 4 = 7\\). In Infinite-Dimensional
               | Fractal Algebra, however, this operation takes place in a
               | fractal space with infinite dimensions.
               | 
               | 1. *Fractal Representation*: Instead of representing
               | numbers as points on a line, each number is represented
               | by a unique fractal structure. The fractal for '3' and
               | the fractal for '4' each have infinitely complex, self-
               | similar patterns that vary across infinite dimensions.
               | 
               | 2. *Adding Fractals*: The operation of addition involves
               | combining these fractal structures. This isn't a simple
               | sum but a fusion of their infinite patterns. The
               | resulting fractal for '7' contains elements of both the
               | '3' and '4' fractals, but in a way that the patterns and
               | complexities of each are preserved and merged.
               | 
               | 3. *Infinite Dimensions*: Each level of depth in the
               | fractals represents a different dimension of the problem.
               | As you dive deeper into the fractals, you encounter more
               | intricate patterns, representing more complex layers of
               | the addition.
               | 
               | 4. *Result Interpretation*: The outcome of the operation
               | is a new fractal representing '7'. But this '7' is not
               | just a number; it's a multi-dimensional entity with a
               | pattern that encodes not just the value '7', but also the
               | history of how it was derived from '3' and '4'.
               | 
               | This approach would require a rethinking of how we
               | visualize and understand mathematical operations. The
               | fractal patterns in Infinite-Dimensional Fractal Algebra
               | carry more information than just the numerical value;
               | they encode the relationships and interactions between
               | numbers in a deeply interconnected, infinite-dimensional
               | space.
               | 
               | Let's choose the Sierpinski Triangle, a well-known
               | fractal, to represent the identity element for addition
               | in Infinite-Dimensional Fractal Algebra. In traditional
               | algebra, the identity element for addition is '0', as
               | adding zero to any number doesn't change its value. Here,
               | we'll conceptualize this using the Sierpinski Triangle.
               | 
               | 1. *Sierpinski Triangle as the Identity*: In this
               | framework, the Sierpinski Triangle represents the '0'.
               | It's a fractal with the property of self-similarity: any
               | part of the triangle resembles the whole.
               | 
               | 2. *Adding to the Identity*: When another number
               | (fractal) is added to this Sierpinski Triangle, the
               | process involves integrating the fractal patterns of this
               | number with the Sierpinski Triangle. However, since the
               | Sierpinski Triangle represents '0', this integration
               | process must preserve the original number's fractal
               | structure intact.
               | 
               | 3. *Concrete Example*: Suppose we're adding the number
               | '2', represented by a different fractal structure, to '0'
               | (the Sierpinski Triangle). The addition operation would
               | overlay the '2' fractal onto the Sierpinski Triangle. The
               | key here is that the Sierpinski Triangle, with its
               | repetitive and self-similar gaps, would not alter the
               | structure of the '2' fractal. Instead, it would fill into
               | the gaps of the Sierpinski Triangle, maintaining its own
               | pattern.
               | 
               | 4. *Resulting Fractal*: The resulting fractal from this
               | addition is the '2' fractal itself, as the Sierpinski
               | Triangle (being the identity element) does not alter the
               | other number in the addition. The intricate details of
               | the '2' fractal remain unchanged, signifying that adding
               | '0' did not change its value.
               | 
               | This is a very abstract and conceptual way to represent
               | mathematical operations, and the choice of the Sierpinski
               | Triangle for the identity is primarily for its self-
               | similar and recursive properties, which align well with
               | the concepts in fractal algebra.
        
               | corethree wrote:
               | Basically it formed a fuzzy idea of a algebra using
               | different fractals as entities.
               | 
               | I'm sure this can be mapped out further into very
               | concrete detail. It's a highly realistic idea. we have
               | algebras for all kinds of things from complex numbers to
               | lists.
               | 
               | Choosing the triangle for identity is probably the wrong
               | choice though. The identity fractal should be zero
               | dimensional or nothing. I think that will in actuality
               | end up fitting the rules of the identity fractal if we
               | ever decided to map out this algebra.
               | 
               | If you're not familiar with abstract algebra basically
               | it's choosing some fractal that's equivalent to a zero
               | value and coming up with ways to combine fractals with
               | operations that hold the same properties of associativity
               | and commutativity that multiplication/addition does for
               | numbers.
        
         | dboreham wrote:
         | > it needs to understand what it is talking about
         | 
         | "to understand" is a concept invented by the human brain to
         | help it operate. It doesn't actually mean anything concrete in
         | the physical world. See: the Chinese room argument
         | (paradoxically it proves the converse of what Searle was
         | arguing -- that there is no such thing as understanding,
         | because you can emulate a system with identical outputs from
         | the same set of inputs without caring or labeling what happens
         | inside the room).
        
           | scudsworth wrote:
           | that's, uh, not the conclusion of the chinese room argument
        
           | ImHereToVote wrote:
           | The Chinese room doesn't have a model; it has an infinite
           | dictionary. A sufficiently large LLM isn't a dictionary, it
           | has to contain a model of some sort.
        
         | attilakun wrote:
         | I agree, Hinton's point is more about next-token-prediction
         | being a good training objective for understanding, and not
         | whether LLMs reproduce every aspect of a human's thought
         | processes. They clearly don't (at least not at present scale),
         | otherwise we'd have AGI now!
        
       | zeroonetwothree wrote:
       | We have quantum mechanics, a model of the universe that makes
       | predictions with absurd levels of accuracy. Yet we don't really
       | understand "why" it works---I don't think we can claim that we
       | really "understand" the universe.
       | 
       | This disproves the claim that making accurate predictions implies
       | understanding. Or perhaps a better way to put it is that the
       | model has some understanding but a model doesn't necessarily
       | reflect underlying truth.
        
         | raincom wrote:
         | Best examples come from medicine. There are many drugs that
         | work, but we don't understand how they work. For example:
         | https://en.wikipedia.org/wiki/Category:Drugs_with_unknown_me...
        
         | marginalia_nu wrote:
         | What are you even asking when you're asking why it works.
         | Ultimately, it works the way it does because that's the way
         | things are. I'm not sure what sort of answers you expect when
         | asking why physics acts in a particular fashion.
         | 
         | Also here's Feynman ranting about why questions:
         | https://www.youtube.com/watch?v=36GT2zI8lVA
        
           | firebaze wrote:
           | Asking "why" may lead to really new physics, though. I assume
           | the reason for discovering the theory of relativity was
           | asking "Why is the speed of light constant in all
           | directions".
        
             | marginalia_nu wrote:
             | My point is, why isn't really a question that makes sense.
             | 
             | There isn't a "why" the speed of light is constant and
             | isotropic. The answers you typically see for that question
             | is just different ways of saying the same thing; there is
             | no causal relationship.
             | 
             | In the end, it just is. There are things that must be true
             | if this is the case, and things that can't be true if this
             | is the case.
        
         | overactor wrote:
         | Making accurate predictions in a specific domain doesn't imply
         | understanding, sure. Universal autocomplete can make accurate
         | predictions in all domains. Apples to oranges.
        
           | xcv123 wrote:
           | By "understanding" he simply means it builds a model of
           | higher order abstractions over the text in order to model the
           | meaning and context of each token in relation to other
           | tokens. This is more sophisticated than a Markov process.
        
       | zeroonetwothree wrote:
       | I found this post to be overly long and unclear in its focus.
       | Also who is this person that only serves as an autocomplete in
       | meetings? Maybe don't go to those meetings then? In my meetings
       | I'm mainly having to decide what our approach will be in some
       | ambiguous problem space.
        
         | coldtea wrote:
         | > _In my meetings I'm mainly having to decide what our approach
         | will be in some ambiguous problem space._
         | 
         | Maybe that's because you are just some manager or exec in some
         | company, and not a professor that has to transfer his
         | knowledge, desribe, and explain things for others in his
         | meetings?
         | 
         | The post has a very specific subject it discusses. It's also
         | not required reading, if someone finds it long, they can skip
         | it.
        
       | patcon wrote:
       | Yeah, I agree there are two types:
       | 
       | 1) repeating things he's reflected in or seen in the past (more
       | like autocomplete), or
       | 
       | 2) mixing two semantic concepts that he's seen in the past, and
       | using it as a bridge to walk over to get to the next word/concept
       | (which is back to autocomplete mode)
       | 
       | The second is like crafting in Minecraft, instead of just taking
       | things out of inventory (but all of that is happening in high
       | dimensional semantic space :) )
        
       | cmdli wrote:
       | The argument "a sufficiently capable autocomplete must contain a
       | level of general intelligence" is correct but also not very
       | useful. It is a lot like saying "a sufficiently fast horse can
       | fly".
       | 
       | It is technically correct that when you take things to the
       | extreme you can accomplish great things, but we may not reach
       | those levels. We may require completely different technology to
       | reach those levels of autocomplete, and we have simply reached a
       | new plateau at this point in time.
        
         | morkalork wrote:
         | Just need a jet engine strapped to your horse for that
        
         | wyago wrote:
         | Funnily enough, "a sufficiently fast horse can fly" sounds sort
         | of like a decent way to convey the idea of planes to a pre-car
         | world.
        
         | og_kalu wrote:
         | The argument is simpler than that. Prediction requires a model,
         | completely accurate or not. There's a projection of the world
         | in text. A model of the text data we feed it is a model of the
         | world as humans see it. The trend of loss is more and more
         | accurate models of the dataset. So it won't stop at any
         | arbitrary competency level. Indeed, there are already a few
         | abilities GPT possess that are deemed Super Human. It's not a
         | distinction that matters to the machine. It's all just data to
         | be modelled.
         | 
         | We have reached those levels lol. That's why we're having this
         | argument.
        
           | cmdli wrote:
           | I think the trouble is that "model" is a very general term.
           | If you had a computer doing simulations of artillery shots
           | back in the 50s, then it would have a "model" of the world in
           | terms of variables tracking projectiles, but this model
           | doesn't generalize to anything else. If a computer does image
           | recognition from the 90s and 2000s to recognize faces, then
           | the computer has a "model" of visual information in the
           | world, but this model only lets it recognize faces.
           | 
           | ChatGPT has a model of all the text information on the
           | internet, but it remains to be seen what the hard limits of
           | this model are. Does this model let it do logic or predict
           | the future well, or will no amount of training give it those
           | abilities? Simply being good in one task doesn't imply a
           | general ability to do everything, or even most of everything.
           | LLM's would simply be the last advancement in a field with a
           | lot of similar advancements.
        
             | og_kalu wrote:
             | >ChatGPT has a model of all the text information on the
             | internet, but it remains to be seen what the hard limits of
             | this model are.
             | 
             | Before training is complete and loss is maxed, there will
             | be limits on what the "learned so far" model can do that
             | say absolutely nothing about the limits of a perfect(or
             | very close to it) model.
             | 
             | It really looks like anything will converge with enough
             | compute. I don't think architecture is particularly
             | important except as "how much compute will this one take?"
             | question.
             | 
             | https://nonint.com/2023/06/10/the-it-in-ai-models-is-the-
             | dat...
             | 
             | >Does this model let it do logic or predict the future
             | well, or will no amount of training give it those
             | abilities?
             | 
             | There's nothing special about logic. Basically, any
             | sequence is fair game. It literally does not matter to the
             | machine.
             | 
             | Boolformer: Symbolic Regression of Logic Functions with
             | Transformers(https://arxiv.org/abs/2309.12207)
             | 
             | That said, GPT-4 can already do logic. It's not perfect but
             | if perfect logic were a requirement then humans cannot do
             | logic either.
             | 
             | >Simply being good in one task doesn't imply a general
             | ability to do everything, or even most of everything.
             | 
             | It's not one task. It's one modality (text) that a plethora
             | of tasks could be learned in.
             | 
             | Coding and playing chess did not suddenly become a single
             | task just because we found the common ground that allows a
             | machine to learn both.
             | 
             | The text, image, video and audio data we could feed a
             | transformer will cover anything we care about.
        
           | ijidak wrote:
           | I've noticed that when I speak I really don't control each
           | word.
           | 
           | I have an idea that I want to convey, but how each word comes
           | to my mind as I form a sentence has always felt like it's
           | controlled by an unconscious algorithm.
           | 
           | So I don't understand why people find this prediction
           | mechanism so alien.
           | 
           | It isn't clear to me how much of communication is really in
           | our control.
           | 
           | With the current tools, it feels like we still provide the
           | ideas we want the AI to convey, and it may be using a nearly
           | identical mechanism to us to form the words.
           | 
           | Consciousness would be the computer being able to come up
           | with the ideas.
           | 
           | So, it seems to me we've gotten close enough on the
           | communication side of intelligence.
           | 
           | But the machine is not conscious. When it is, it seems like
           | it will generate its own ideas.
           | 
           | Are people debating whether the machine is conscious?
           | 
           | Otherwise, it feels very straightforward to grasp what we've
           | made up to now.
        
       | gmuslera wrote:
       | Is not black or white. Probably most of what we do is System 1,
       | most of the time we are mostly meme machines, and so is a good
       | part of the activity on some sectors.
       | 
       | But are able to make steps forward, intuition, hard step by step
       | reasoning, finding connections between dots, etc. GPT can do some
       | of that, and in some point of the road someone must decide if we
       | reached somewhere else. Even if making the full road may or not
       | be possible in a foreseeable future.
        
         | arketyp wrote:
         | Indeed. What has caught researchers off guard is the way system
         | 2 properties seem to appear as emergent phenomena in LLMs. This
         | is also what has prompted people like Hinton and Sutskever to
         | make this condensed point about statistical modelling and
         | understanding (hypothetically) being simply matter of a
         | spectrum.
        
       | Kiro wrote:
       | Reminds me of this scene from Westworld (slightly NSFW):
       | https://www.youtube.com/watch?v=ZnxJRYit44k
        
         | tester457 wrote:
         | Also spoilers
        
       | canjobear wrote:
       | The argument is: "deep thinking" isn't like autocomplete because
       | it feels different.
        
         | bananaflag wrote:
         | Deep thinking is how autocomplete feels like from the inside.
        
       | fritzo wrote:
       | Andrew's distinction between associative vs logical thinking
       | reminds me of two kinds of programming thinking. Half of my time
       | while programming is spent churning out glue code, satisfying a
       | type checker and test suite, and implementing the simple solution
       | that turns out to work. The other half of my time is spent slowly
       | thinking through why some simple solution is wrong, and step by
       | step proceeding towards the correct but complex solution. The
       | former phase is intuitive, the latter phase is scientific, where
       | I hypothesize, test, and repeat.
       | 
       | Reading through the code-as-transcript afterwards it's unclear
       | which bits of code required shallow associative vs deep rational
       | thinking, pure autocomplete vs latent chain of thought.
        
       | merizian wrote:
       | The fallacy being made in this argument is that computers need to
       | perform tasks the same way as humans to achieve equal or better
       | performance on them. While having better "system 2" abilities may
       | improve performance, it's plausible that scaled-up next-token
       | prediction along with a bit of scaffolding and finetuning could
       | match human performance on the same diversity of tasks while
       | doing them a completely different way.
       | 
       | If I had to critique Hinton's claims, I would say his usage of
       | the word "understand" can be vague and communicate assumptions
       | because it's from an ontology used for reasoning about human
       | reasoning, not this new alien form of reasoning which language
       | models embody.
        
         | BurningFrog wrote:
         | > _The fallacy being made in this argument is that computers
         | need to perform tasks the same way as humans to achieve equal
         | or better performance_
         | 
         | Especially since I don't think we know that much about how
         | human intelligence actually works.
        
         | edot wrote:
         | I believe it was Feynman who said something to the effect of
         | "airplanes do not fly like birds do, but they fly much faster
         | and can carry much more". So yes, we do not need to exactly
         | replicate how humans do things in order to do human-like things
         | in a useful manner. Planes do not flap their wings, but the jet
         | engine (which is completely unnatural) does a great job of
         | making things fly when paired with fixed wings of a certain
         | shape.
        
         | metanonsense wrote:
         | In addition to that, the "system 2" abilities might already be
         | there with "epi" strategies like chain-of-thought prompting.
         | Talking / writing to yourself might not be the most efficient
         | way to think but at least I do it often enough when pondering a
         | problem.
        
       | francois14 wrote:
       | And what about "stochastic parrot" ?
        
       | ffwd wrote:
       | I think there is another aspect to human thinking other than
       | system 1/system 2 though, which is the abstract world model
       | humans have. system 1 / 2 is more like the process, while the
       | world model is the actual data being 'processed'.
       | 
       | And I think basically, humans have a much simplified, 'low
       | dimensional' world model that consists of a set of objects (let's
       | call them patterns), and then a "list" of essential properties
       | that those objects have, that leads to a constraint on how each
       | object can behave in the world model as a whole.
       | 
       | And this is sort of hierarchical or at least, we can zoom in and
       | out in detail depending on the level of knowledge we have about a
       | particular pattern.
       | 
       | So problem 1 is: It's not clear to me that text or any sort of
       | data would contain all the necessary constraints so that any
       | particular prompt would result in a world model that exactly
       | takes into account the constrains of each object and 2) Even if
       | was, I'm not sure the process of step by step thinking
       | (system1/2) about each object and computing world states could
       | occur in current architectures. This is especially important for
       | computing a set of objects, then abstracting the result, then
       | doing another round of computing with that result, or something
       | like this.
       | 
       | I'm not hard set on this but this is my current thinking.
        
       | robbrown451 wrote:
       | I agree with Hinton, although a lot hinges on your definition of
       | "understand."
       | 
       | I think to best wrap your head around this stuff, you should look
       | to the commonalities of LLM's, image, generators, and even things
       | like Alpha Zero and how it learned to play Go.
       | 
       | Alpha Zero is kind of the extreme in terms of not imitating
       | anything that humans have done. It learns to play the game simply
       | by playing itself -- and what they found is that there isn't
       | really a limit to how good it can get. There may be some
       | theoretical limit of a "perfect" Go player, or maybe not, but it
       | will continue to converge towards perfection by continuing to
       | train. And it can go far beyond what the best human Go player can
       | ever do. Even though very smart humans have spent their lifetimes
       | deeply studying the game, and Alpha Zero had to learn everything
       | from scratch.
       | 
       | One other thing to take into consideration, is that to play the
       | game of Go you can't just think of the next move. You have to
       | think far forward in the game -- even though technically all it's
       | doing is picking the next move, it is doing so using a model that
       | has obviously looked forward more than just one move. And that
       | model is obviously very sophisticated, and if you are going to
       | say that it doesn't understand the game of Go, I would argue that
       | you have a very, oddly restricted definition of the word,
       | understand, and one that isn't particularly useful.
       | 
       | Likewise, with large language models, while on the surface, they
       | may be just predicting the next word one after another, to do so
       | effectively they have to be planning ahead. As Hinton says, there
       | is no real limit to how sophisticated they can get. When
       | training, it is never going to be 100% accurate in predicting
       | text it hasn't trained on, but it can continue to get closer and
       | closer to 100% the more it trains. And the closer it gets, the
       | more sophisticated model it needs. In the sense that Alpha Zero
       | needs to "understand" the game of Go to play effectively, the
       | large language model needs to understand "the world" to get
       | better at predicting.
        
         | jon_richards wrote:
         | > to play the game of Go you can't just think of the next move.
         | You have to think far forward in the game -- even though
         | technically all it's doing is picking the next move, it is
         | doing so using a model that has obviously looked forward more
         | than just one move.
         | 
         | While I imagine alpha go does some brute force and some tree
         | exploration, I think the main "intelligent" component of alpha
         | go is the ability to recognize a "good" game state from a "bad"
         | game state based on _that moment in time_ , not any future
         | plans or possibilities. That pattern recognition is all it has
         | once its planning algorithm has reached the leaves of the
         | trees. Correct me if I'm wrong, but I doubt alpha go has a
         | neural net evaluating an entire tree of moves all at once to
         | discover meta strategies like "the opponent focusing on this
         | area" or "the opponent feeling on the back foot."
         | 
         | You can therefore imagine a pattern recognition algorithm _so
         | good_ that it is able to pick a move by only looking 1 move
         | into the future, based solely on local stone densities and
         | structures. Just play wherever improves the board state the
         | most. It does not even need to  "understand" that a game is
         | being played.
         | 
         | > while on the surface, they may be just predicting the next
         | word one after another, to do so effectively they have to be
         | planning ahead.
         | 
         | So I don't think this statement is necessarily true.
         | "Understanding" is a major achievement, but I don't think it
         | requires planning. A computer can understand that 2+2=4 or
         | where to play in tic-tac-toe without any "planning".
         | 
         | That said, there's probably not much special about the concept
         | of planning either. If it's just simulating a tree of future
         | possibilities and pruning it based on evaluation, then many
         | algorithms have already achieved that.
        
         | Jensson wrote:
         | A big difference between a game like Go and writing text is
         | that text is single player. I can write out the entire text,
         | look at it and see where I made mistakes on the whole and edit
         | those. I can't go back in a game of Go and change one of my
         | moves that turned out to be a mistake.
         | 
         | So trying to make an AI that solves the entire problem before
         | writing the first letter will likely not result in a good
         | solution while also making it compute way too much since it
         | solves the entire problem for every token generated. That is
         | the kind of AI we know how to train so for now that is what we
         | have to live with, but it isn't the kind of AI that would be
         | efficient or smart.
        
           | bytefactory wrote:
           | This doesn't seem like a major difference, since LLMs are
           | also choosing from a probability distribution of tokens for
           | the most likely one, which is why they respond a token at a
           | time. They can't "write out' the entire text at a time, which
           | is why fascinating methods like "think step by step" work at
           | all.
        
             | Jensson wrote:
             | But it can't improve its answer after it has written it,
             | that is a major limitation. When a human writes an article
             | or response or solution, that is likely not the first thing
             | the human thought of, instead they write something down and
             | works on it until it is tight and neat and communicates
             | just what the human wants to communicate.
             | 
             | Such answers will be very hard for an LLM to find, instead
             | you mostly get very verbose messages since that is how our
             | current LLM thinks.
        
               | bytefactory wrote:
               | Completely agree. The System 1/System 2 distinction seems
               | relevant here. As powerful as transformers are with just
               | next-token generation and context, which can be hacked to
               | form a sort of short-term memory, some time of real-time
               | learning + long-term memory storage seems like an
               | important research direction.
        
         | klodolph wrote:
         | > As Hinton says, there is no real limit to how sophisticated
         | they can get.
         | 
         | There's no limit to how sophisticated a model can get, but,
         | 
         | 1. That's a property shared with many architectures, and not
         | really that interesting,
         | 
         | 2. There _are_ limits to the specific ways that we train
         | models,
         | 
         | 3. We care about the relative improvement that these models
         | deliver, for a given investment of time and money.
         | 
         | From a mathematical perspective, you can just kind of keep
         | multiplying the size of your model, and you can prove that it
         | can represent arbitrary complicated structures (like, internal
         | mental models of the world). That doesn't mean that your
         | training methods will produce those complicated structures.
         | 
         | With Go, I can see how the model itself can be used to generate
         | new, useful training data. How such a technique could be
         | applied to LLMs is less clear, and its benefits are more
         | dubious.
        
         | lsy wrote:
         | The difference is that "the world" is not exhaustible in the
         | same way as Go is. While it's surely true that the number of
         | possible overall Go game states is extremely large, the game
         | itself is trivially representable as a set of legal moves and
         | rules. The "world model" of the Go board is actually just
         | already exhaustive and finite, and the computer's work in
         | playing against itself is to generate more varied data within
         | that model rather than to develop that model itself. We know
         | that when Alpha Zero plays a game against itself it is valuable
         | data because it is a legitimate game which most likely
         | represents a new situation it hasn't seen before and thus
         | expands its capacity.
         | 
         | For an LLM, this is not even close to being the case. The sum
         | of all human artifacts ever made (or yet to be made) doesn't
         | exhaust the description of a rock in your front yard, let alone
         | the world in all its varied possibility. And we certainly
         | haven't figured out a "model" which would let a computer
         | generate new and valid data that expands its understanding of
         | the world beyond its inputs, so self-training is a non-starter
         | for LLMs. What the LLM is "understanding", and what it is
         | reinforced to "understand" is not the world but the format of
         | texts, and while it may get very good at understanding the
         | format of texts, that isn't equivalent to an understanding of
         | the world.
        
           | og_kalu wrote:
           | >The sum of all human artifacts ever made (or yet to be made)
           | doesn't exhaust the description of a rock in your front yard,
           | let alone the world in all its varied possibility.
           | 
           | No human or creature we know of has a "true" world model so
           | this is irrelevant. You don't experience the "real world".
           | You experience a tiny slice of it, a few senses that is
           | further slimmed down and even fabricated at parts.
           | 
           | To the bird who can intuitively sense and use electromagnetic
           | waves for motion and guidance, your model of the world is
           | fundamentally incomplete.
           | 
           | There is a projection of the world in text. Moreover training
           | on additional modalities is trivial for a transformer. That's
           | all that matters.
        
             | lsy wrote:
             | That's the difference though. I _know_ my world model is
             | fundamentally incomplete. Even more foundationally, I know
             | that there _is_ a world, and when my world model and the
             | world disagree, the world wins. To a neural network there
             | is no distinction. The closest the entire dynamic comes is
             | the very basic annotation of RLHF which itself is done by
             | an external human who is providing the value judgment, but
             | even that is absent once training is over.
             | 
             | Despite not having the bird's sense for electromagnetic
             | waves, I have an understanding that they are there, because
             | humans saw behavior they couldn't describe and
             | investigated, in a back-and-forth with a world that has
             | some capacity to disprove hypotheses.
             | 
             | Additional modalities are really just reducible to more
             | kinds of text. That still doesn't exhaust the world, and
             | unless a machine has some ability to integrate new data in
             | real time alongside a meaningful commitment and
             | accountability to the world as a world, it won't be able to
             | cope with the real world in a way that would constitute
             | genuine intelligence.
        
               | og_kalu wrote:
               | >I know my world model is fundamentally incomplete. Even
               | more foundationally, I know that there is a world, and
               | when my world model and the world disagree, the world
               | wins.
               | 
               | Yeah this isn't really true. There's not how humans work.
               | For a variety of reasons, Plenty stick with their
               | incorrect model despite the world indicating otherwise.
               | In fact, this seems to be normal enough human behaviour.
               | Everyone does it, for something or the other. You are no
               | exception.
               | 
               | And yes LLMs can in fact tell truth from fiction.
               | 
               | GPT-4 logits calibration pre RLHF -
               | https://imgur.com/a/3gYel9r
               | 
               | Just Ask for Calibration: Strategies for Eliciting
               | Calibrated Confidence Scores from Language Models Fine-
               | Tuned with Human Feedback -
               | https://arxiv.org/abs/2305.14975
               | 
               | Teaching Models to Express Their Uncertainty in Words -
               | https://arxiv.org/abs/2205.14334
               | 
               | Language Models (Mostly) Know What They Know -
               | https://arxiv.org/abs/2207.05221
               | 
               | The Geometry of Truth: Emergent Linear Structure in Large
               | Language Model Representations of True/False Datasets -
               | https://arxiv.org/abs/2310.06824
               | 
               | Your argument seems to boil down to "they can't perform
               | experiments" but that isn't true either.
        
               | astrange wrote:
               | There is no such thing as a world model, and you don't
               | have one of them. This is a leftover bad psychological
               | concept from the 70s AI researchers who never got
               | anywhere. People and other creatures do very little
               | modeling things, they mostly just do stuff.
        
           | tazjin wrote:
           | The world on our plane of existence absolutely is
           | exhaustible, just on a much, much larger scale. Doesn't mean
           | that the process is fundamentally different, and for the
           | human perspective there might be diminishing returns.
        
         | huytersd wrote:
         | Next word generation is one way to put it. The key point here
         | is we have no idea what's happening in the black box that is
         | the neural network. It could be forming very strong connections
         | between concepts in there with multi tiered abstractions.
        
         | SkiFire13 wrote:
         | > One other thing to take into consideration, is that to play
         | the game of Go you can't just think of the next move. You have
         | to think far forward in the game -- even though technically all
         | it's doing is picking the next move, it is doing so using a
         | model that has obviously looked forward more than just one
         | move.
         | 
         | It doesn't necessarily have to look ahead. Since Go is a
         | deterministic game there is always a best move (or moves that
         | are better than others) and hence a function that goes from the
         | state of the game to the best move. We just don't have a way to
         | compute this function, but it exists. And that function doesn't
         | need the concept of lookahead, that's just an intuitive way of
         | how could find some of its values. Likewise ML algorithms don't
         | necessarily need lookahead, they can just try to approximate
         | that function with enough precision by exploiting patterns in
         | it. And that's why we can still craft puzzles that some AIs
         | can't solve but humans can, by exploiting edge cases in that
         | function that the ML algorithm didn't notice but are solvable
         | with understanding of the game.
         | 
         | The thing is though, does this really matter if eventually we
         | won't be able to notice the difference?
        
           | bytefactory wrote:
           | > It doesn't necessarily have to look ahead. Since Go is a
           | deterministic game there is always a best move
           | 
           | Is there really a difference between the two? If a certain
           | move shapes the opponent's remaining possible moves into a
           | smaller subset, hasn't AlphaGo "looked ahead"? In other
           | words, when humans strategize and predict what happens in the
           | real world, aren't they doing the same thing?
           | 
           | I suppose you could argue that humans also include additional
           | world models in their planning, but it's not clear to me that
           | these models are missing and impossible for machine learning
           | models to generate during training.
        
       | Joel_Mckay wrote:
       | Well structured nonsense is indistinguishable from the assumption
       | of sentience for the undisciplined. This means a 95% LLM
       | generated article is indistinguishable from an illogical
       | contradictory chaotic rant.
       | 
       | Best of luck, and remember to tip your bot on the way out =)
        
       | breadwinner wrote:
       | There is evidence that the human brain is also doing
       | "autocomplete" (prediction). The human brain uses predictive
       | mechanisms when processing language, and these mechanisms play an
       | important role in forming thoughts.
       | 
       | When we hear or read a word, our brain quickly generates a set of
       | predictions about what word might come next, based on the context
       | of the sentence and our past experiences with language. These
       | predictions are constantly updated as we receive new information,
       | and they help us to process language more efficiently and
       | accurately.
       | 
       | In addition, research has shown that the brain engages in similar
       | predictive processes when we are forming thoughts or planning
       | actions. For example, when we plan a complex movement, such as
       | reaching for a cup, our brain generates a set of predictions
       | about the movements required to complete the action. These
       | predictions are constantly updated as we receive feedback from
       | our muscles and our environment, allowing us to make adjustments
       | and achieve our goal.
       | 
       | See links below for additional details:
       | 
       | https://www.earth.com/news/our-brains-are-constantly-working...
       | 
       | https://www.psycholinguistics.com/gerry_altmann/research/pap...
       | 
       | https://www.tandfonline.com/doi/pdf/10.1080/23273798.2020.18...
       | 
       | https://onlinelibrary.wiley.com/doi/10.1111/j.1551-6709.2009...
        
         | Probiotic6081 wrote:
         | And curiously, those predictions that are made during language
         | comprehension are made by the language production system
         | itself!
        
         | lsy wrote:
         | This is of course sometimes true, we take shortcuts to minimize
         | cognitive effort. However, when the situation warrants it we
         | think more carefully about which words to use to achieve goals,
         | or to reach correspondence with the situation at hand. Or we
         | move more precisely and carefully to do something we haven't
         | done before. I've no doubt that an LLM can approximate whatever
         | model of language a person has from their life experience, but
         | I don't think this type of model is capable of active coping,
         | making judgments, or of having accountability to the world it's
         | meant to operate in.
        
         | lacrimacida wrote:
         | > When we hear or read a word, our brain quickly generates a
         | set of predictions about what word might come next, based on
         | the context of the sentence
         | 
         | Yes a big part of it is prediction but the brain also does
         | something else which LLMs by themselves completely eschew. The
         | human brain imagines in pictures, used abstractions, studies
         | things and produces new knowledge.
        
       | Eggpants wrote:
       | As long as it's just returning the tokens of the statistical mean
       | of previous tokens, it is just a clever autocomplete.
       | 
       | A somewhat useful internet search engine without all the ads/seo
       | garbage. Of course, the first rule of the internet is don't
       | believe everything on the internet.
       | 
       | I believe AI won't overcome its statistic mask until it can self
       | tune its coefficients in real time. That requires an error
       | function not yet invented that can mimic animals pain feedback
       | error function.
       | 
       | Baby steps can be taken with attempting to run GPT generated code
       | then adjusting coefficients based on the returned errors. Aka
       | compiler and unit test failures are basic "pain" functions, which
       | is pretty much how humans learn to code.
        
       | esafak wrote:
       | Autocompletion is a red herring of a dismissal. Once you have a
       | general purpose model, you can do many things with it, including
       | next token prediction. And in that respect we are no different.
        
       | eviks wrote:
       | > Suppose you want to be really good at predicting the next word.
       | If you want to be really good, you have to understand what's
       | being said. That's the only way. So by training something to be
       | really good at predicting the next word, you're actually forcing
       | it to understand.
       | 
       | Ok, so in this framing you'll just get stuck below the "really
       | good" level because the training isn't based on understanding
        
       | Falkon1313 wrote:
       | I think that insight is an important feature that GPT doesn't
       | seem to have, at least not yet.
       | 
       | For instance, I've seen people saying they can ask it for help
       | with how to code something and it will help them. Although the
       | results aren't perfect, they can be helpful.
       | 
       | However, I recall years ago asking a more senior developer how to
       | do something. They could've just told me how, but instead they
       | asked why I was doing that. Then they told me not to bother with
       | writing code at all, and to instead do this other, much simpler
       | thing which would have the same effect at solving the client's
       | request. ChatGPT wouldn't have had that insight.
       | 
       | Years later, a junior dev came to me asking for help with his
       | code because it wasn't working. We went over the code line by
       | line and he explained what it was trying to do and it all looked
       | good. But when he ran it, the web server crashed. I told him to
       | try adding a comment:                   // Do not remove this
       | comment.
       | 
       | Sure enough, it worked perfectly and the server no longer
       | crashed. Why? Because I realized that if his code was wrong he
       | should get an error message, not crash the server. But sometime
       | back I had read about a bug in a version of the interpreter that
       | would crash when parsing a file that was an exact multiple of
       | 4096 bytes. Would chatGPT have thought of that? Probably not. It
       | would've just talked about the code.
       | 
       | Which is not to say that it's useless. But it lacks greater
       | context and the insight to go beyond the parameters of the
       | question on its own initiative. Then again, so do humans most of
       | the time.
        
         | bytefactory wrote:
         | > I think that insight is an important feature that GPT doesn't
         | seem to have, at least not yet.
         | 
         | I actually think this is a limitation of the RLHF that GPT has
         | been put through. With open-ended questions, I've seen GPT4
         | come up with reasonable alternatives instead of just answering
         | the question I've asked. This is often seen as the infamous,
         | "however, please consider..." bits that it tacks on, which
         | occasionally do consider actual insights into the problem I'm
         | trying to solve.
         | 
         | In most cases it seems to try very hard to mold the answer into
         | what I want to hear, which in many cases isn't necessarily the
         | best answer. A more powerful version of GPT with a less-
         | restrictive RLHF seems like it would be more open to suggesting
         | novel solutions, although this is just my speculation.
        
       | Animats wrote:
       | The big insight we've obtained from LLMs is how much of human
       | discourse is glorified autocomplete. Humans are usually not very
       | profound.
        
         | lacrimacida wrote:
         | But some are. Could we have that as AI? Would LLMs ever have
         | it?
        
       | 1vuio0pswjnm7 wrote:
       | Here is a question: What is the practical significance of viewing
       | "AI" as autocomplete versus some other idea. Why try to influence
       | how others view using a computer. Why anthromorphise. These are
       | questions for which I have answers, but of course they are
       | personal opinions. Historically, programmers often like to refer
       | to programming as "magic". But magic is illusion,
       | entertaintainment, tricks. Believing in "magic" is a personal
       | choice.
       | 
       | Why not describe things in terms of what they do instead of what
       | they "are". The latter is highly subjective and open to abuse.
        
       ___________________________________________________________________
       (page generated 2023-11-18 23:01 UTC)