[HN Gopher] I disagree with Geoff Hinton regarding "glorified au... ___________________________________________________________________ I disagree with Geoff Hinton regarding "glorified autocomplete" Author : magoghm Score : 122 points Date : 2023-11-18 15:39 UTC (7 hours ago) (HTM) web link (statmodeling.stat.columbia.edu) (TXT) w3m dump (statmodeling.stat.columbia.edu) | 11235813213455 wrote: | I still want to read a ChatGPT text that doesn't look stupidly | verbose | chpatrick wrote: | Why don't you ask it not to be? | coffeebeqn wrote: | The default style really annoys me with its verbosity and | overuse of numbered lists. Describing a tone and length of | responses and anything else you want - think step by step, | first principals, show don't tell - all work pretty well. | | You can even ask it to only reply with a Boolean true or | false - which certainly doesn't play to its strengths | satuke wrote: | Actually, that's a by-product of RLHF. A base model is | usually not that verbose. | JoshTko wrote: | Use custom instructions. I use "Answers should be very concise | by default. Never provide disclosures. Answers should get to | the point immediately." and get super concise answers. If I | want more detail I just follow up with can you expand on X. | xcv123 wrote: | Follow up with the prompt "rewrite that so that it doesn't look | stupidly verbose" and it will do exactly as requested. | WillPostForFood wrote: | Give this GPT called Briefly a try. Responses come back as | bullet point lists.I prefer more verbose, but interesting to | see this in contrast. | | https://chat.openai.com/g/g-LNsEQH5rz-briefly | mbrumlow wrote: | > If you want to be really good, you have to understand what's | being said. That's the only way. | | This is simply not true. Predicting the next letter or word, or | id you abstract it away from things that mean something to you, | like the next color of a block in a long chain of colored blocks. | You would realize that all we are doing is using statistics to | predict what the next item might be. There simply is no need or | requirement for the system doing the prediction to understand. | albertzeyer wrote: | > You would realize that all we are doing is using statistics | to predict what the next item might be. | | So what does "understanding" really means then? | | "Understanding" is not really well defined. Either we (humans) | do it, but then LLMs might just do it as well, depending on the | definition of "understanding", or we both don't do it. But if | no-one is really "understanding" anything, then this definition | of "understanding" is maybe not useful. | dboreham wrote: | Bingo | Joeri wrote: | Nobody actually understands how humans actually understand | something, just like nobody actually understands how LLMs do | what they do. | | Everybody opining about it is doing just that: offering an | opinion. Geoff Hinton's opinion is worth more than someone | else's, but it is still an opinion. | im_down_w_otp wrote: | I don't know about the human part, but we absolutely | understand how LLMs do what they do. They're not magic. | og_kalu wrote: | No we don't. No it's not "magic". No we don't understand | what the black box is doing. | brookst wrote: | For some values of "we" | og_kalu wrote: | For every value of we. "I understand the internals of | GPT" is the fastest way to demonstrate you have no idea | what you're talking about. | chpatrick wrote: | We understand the architecture but we don't understand | the weights. | usrbinbash wrote: | We also understand, down to a very very very microscopic | level, how neurons work. We also know a helluva lot about | the architecture of the brain. Does that mean we can | explain our own intelligence, how our minds actually | work? Nope. | mannykannot wrote: | I feel that LLMs raise some very interesting challenges for | anyone trying to figure out what it means to understand | something and how we do it, but I am not yet ready to agree | with Hinton. | | For example, we are aware that some, but by no means all, of | what people say is about an external world that may or may | not conform to what the words say. We can also doubt that we | have understood things correctly, and take steps to either | confirm or refute our opinions. We see ourselves as entities | in an external reality containing other individuals who also | do this, and that we, and they, have a limited ability to | influence what happens in that world. Do LLMs do these | things, or is what they produce a result of having a lot of | information about the purely formal properties of human | language use, independently of semantics? | hackinthebochs wrote: | >I feel that LLMs raise some very interesting challenges | for anyone trying to figure out what it means to understand | something and how we do it, but I am not yet ready to agree | with Hinton. | | Agreed. What LLMs say about understanding deserves a lot | more attention than it has received. I wrote down some of | my thoughts on the matter: | | https://www.reddit.com/r/naturalism/comments/1236vzf | | >Do LLMs do these things, or is what they produce a result | of having a lot of information about the purely formal | properties of human language use, independently of | semantics? | | These two points aren't necessarily in opposition, and | understanding why is I think key to solving a lot of | important problems around intelligence, sentience, etc. To | compute is to operate on formal properties. But this | doesn't exclude semantic properties from having causal | relevance to the behavior of the system. What we need is a | way to conceptualize how a system can have multiple related | descriptions at different levels. A description at the | level of semantics doesn't exclude a description in terms | of formal properties or vice versa. I think of it in terms | of constraints: the higher level descriptions constrain the | lower level behavior. What the computational description | does is ensures the higher level semantic constraint is | maintained. It does this by the particular space of | computational dynamics it follows. Essentially, the | information that picks out this programs space of branching | dynamics embeds the semantic description in question, and | this description realizes the computational dynamic | necessary to maintain the higher level semantic constraint. | Rather than semantics being in opposition to formal | properties, they are two sides of the same coin. | satuke wrote: | Isn't that exactly how humans learn to respond to stimuli? | Don't we just try to predict the best next response to | everything? Yes, It's statistics but the fun part is nobody is | writing this statistical function by hand. | civilized wrote: | LLMs don't have a concept of "best". Only most likely in what | they've been trained on. | | I think LLMs ultimately just take imitation to a creative and | sophisticated extreme. And imitation simply doesn't comprise | the whole of human intelligence at all, no matter how much it | is scaled up. | | The sophistication of the imitation has some people confused | and questioning whether everything can be reduced to | imitation. It can't. | | The ability to imitate seeking a goal isn't identical to the | ability to seek a goal. | | The ability to imitate solving a problem isn't identical to | the ability to solve a problem. | | Imitation is very useful, and the reduction of everything to | imitation is an intriguing possibility to consider, but it's | ultimately just wrong. | lordnacho wrote: | Ok, so now we need an example that separates humans from | LLMs? | | I struggle to think of one, maybe someone on HN has a good | example. | | Eg if I'm in middle school and learning quadratic | equations, am I imitating solving the problem by plugging | in the coefficients? Or am I understanding it? | | Most of what I see coming out of chatGPT and copilot could | be said to be either. If you're generous, it's | understanding. If not, it's imitation. | civilized wrote: | It is very easy to separate humans from LLMs. Humans | created math without being given all the answers | beforehand. LLMs can't do that yet. | | When an LLM can create math to solve a problem, we will | be much closer to AGI. | abm53 wrote: | You can ask ChatGPT to solve maths problems which are not | in its training data, and it will answer an astonishing | amount of them correctly. | | The fact that we have trained it on examples of human- | produced maths texts (rather than through interacting | with the world over several millennia) seems like more of | an implementation detail and not piece of evidence about | whether it has "understood" or not. | IshKebab wrote: | _Some_ humans created maths. And it took thousands of | years of thinking and interaction with the real world. | | Seems like goalpost moving to me. | | I think the _real_ things that separate LLMs from humans | at the moment are: | | * Humans can do online learning. They have long term | memory. I guess you could equate evolution to the | training phase of AI but it still seems like they don't | have quite the same on-line learning capabilities as us. | This is what probably prevents them from doing things | like inventing maths. | | * They seem to be incapable of saying "I don't know". Ok | to be fair lots of humans struggle with this! I'm sure | this will be solved fairly soon though. | | * They don't have a survival instinct that drives | proactive action. Sure you can _tell_ them what to do but | that doesn 't seem quite the same. | naveen99 wrote: | Interestingly some humans will admit to not knowing but | are allergic to admitting being wrong (and can get fairly | vindictive if forced to admit being wrong). | | LLM's actually admit to being wrong easily, but aren't | great at introspection and confabulate too often. also | their Meta cognition is poor still. | IshKebab wrote: | I guess LLM's don't have the social pressure to avoid | admitting errors. And those sort of interactions aren't | common in text so they don't learn them strongly. | | Also ChatGPT is trained specifically to be helpful and | subservient. | civilized wrote: | About this goalpost moving thing. It's become very | popular to say this, but I have no idea what it's | supposed to mean. It's like a metaphor with no underlying | reality. | | Did a wise arbiter of truth set up goalposts that I | moved? I guess I didn't get the memo. | | If the implied claim is "GPT would invent math too given | enough time", go ahead and make that claim. | IshKebab wrote: | > Did a wise arbiter of truth set up goalposts that I | moved? | | Collectively, yes. The criticism of AI has always been | "well it isn't AI because it can't do [thing just beyond | its abilities]. | | Maybe individually your goalpost hasn't moved, and as | soon as it invents some maths you'll say "yep, it's | intelligent" (though I strongly doubt it). But | collectively the naysayers in general will find another | reason why it's not _really_ intelligent. Not like us. | | It's very tedious. | corethree wrote: | You need to think deeper. | | There are levels of sophistication in "imitation". It | follows a gradient. At the low end of this gradient is a | bad imitation. | | At the high end of this gradient is a perfect imitation. | Completely indistinguishable from what it's imitating. | | If an imitation is perfect than is it really an imitation? | | If I progressively make my imitation more and more accurate | am I progressively building an imitation or am I | progressively building the real thing? | | See what's going on here? You fell for a play on words. | It's a common trope. Sometimes language and vocabulary | actually tricks the brain into thinking in a certain | direction. This word "imitation" is clouding your thoughts. | | Think about it. A half built house can easily be called an | imitation of a real house. | canjobear wrote: | > LLMs don't have a concept of "best". Only most likely in | what they've been trained on. | | At temperature 0 they are effectively producing the token | that maximizes a weighted sum of base LM probability and | model reward. | genman wrote: | I don't think that also humans in general have this concept | of "best". | | But humans are able to build certain routines within their | own system to help them to rationalize. | RandomLensman wrote: | Yes, if you are in the no free will school of thought, then | that would be what humans do. | usrbinbash wrote: | > Isn't that exactly how humans learn to respond to stimuli? | | Maybe it is, maybe it isn't. Maybe we are "just" an | incredibly powerful prediction engine. Or maybe we work from | a completely different modus operandi, and our ability to | predict things is an emergent capability of it. | | The thing is, no one actually knows what makes us | intelligent, or even how to define intelligence for that | matter. | randyrand wrote: | the human brain is just a statistical machine. | brookst wrote: | Odds are you're right | continuational wrote: | I think the most useful definition of understanding something | is that you can explain it and use it in context. | | ChatGPT routinely does both. | jstummbillig wrote: | And while AI gets better and better and we will remain as | touchy as ever about abstract concepts that make us oh so | human, how about we say it just can't be understanding, | unless a human does, eh, it. | BobaFloutist wrote: | Someone sufficiently fast and skilled at googling can explain | and use in context a lot of things that they don't really | properly understand. | | So unless you're saying that the _composite_ of the googler | and of google understand something that neither does | individually, your definition has some holes. | continuational wrote: | This is a variation of the Chinese room argument. | | If you consider understanding an observable property, then | the Chinese room in aggregate displays understanding of | Chinese. | | Would you say that humans understand nothing, because atoms | don't understand anything, and we're made up of atoms? | BobaFloutist wrote: | I would say that there is a stronger consensus that a | human being can be reasonably described as a single | entity than a human being using a reference resource. | | A more apt comparison to my mind would be if a human | being can be described as personally exerting strong | nuclear force, just because their subatomic particles do, | which I would happily answer "no." | skepticATX wrote: | How about this: understanding is the ability to generalize | knowledge and apply it to novel scenarios. | | This definition is something that humans, and animals for | that matter, do every day - both in small and large ways. And | this is something that current language models aren't very | good at. | continuational wrote: | What is the test for this? | | I taught it Firefly, which is an undocumented programming | language I'm working on, through conversion. | | I find it's a lot quicker than any human at picking up | syntax and semantics, both in real time and in number of | messages, and makes pretty good attempts at writing code in | it, as much as you could expect from a human programmer. | | That is, until you run out of context - is this what you | mean? | skepticATX wrote: | There are plenty of results supporting my assertion; but | the tests must be carefully designed. Of course, LLMs are | not databases that store exact answers - so it's not | enough to ask it something that it hasn't seen, if it's | seen something similar (as is likely the case with your | programming language). | | One benchmark that I track closely is ConceptARC, which | aims to test generalization and abstraction capabilities. | | Here is a very recent result that uses the benchmark: | https://arxiv.org/abs/2311.09247. Humans correctly solved | 91% of the problems, GPT-4 solved 33%, and GPT-4V did | much worse than GPT-4. | continuational wrote: | I wouldn't be surprised if GPT-4 is not too good at | visual patterns, given that it's trained on text. | | Look at the actual prompt in figure 2. I doubt humans | would get a 91% score on that. | stubybubs wrote: | I gave it the three lightbulbs in a closet riddle. | | https://puzzles.nigelcoldwell.co.uk/seven.htm | | The key complication is "once you've opened the door, you | may no longer touch a switch." It gets this. There are | many examples of it written out on the web. When I give | it a variation and say "you can open the door to look at | the bulbs and use the switches all you want" and it is | absolutely unable to understand this. To a human it's | simple: look at the bulbs and flick the switches. It kept | giving me answers about using a special lens to examine | the bulbs, using something to detect heat. I explained it | in many ways and tried several times. I was paying for | GPT-4 at the time as well. | | I would not consider this thinking. It's unable to make | this simple abstraction from its training data. I think 4 | looks better than 3 simply because it's got more data, | but we're reaching diminishing returns on that, as has | been stated. | Kiro wrote: | How do you define understanding when predicting the next color? | karaterobot wrote: | > You would realize that all we are doing is using statistics | to predict what the next item might be. | | I agree that Hinton's original quote doesn't make sense to me | either. I suspect he would leverage the phrase "really good" to | explain the difference between ChatGPT and, say, a Markov | chain. I think that's a little disingenuous, if that's how he | means it, but I don't know if I'm right about that. | | But I also do not agree that humans use statistics to predict | what the next item in a series might be. As evidence, there is | the classic example of asking people to predict the next coin | toss in the series: "heads, heads, heads, heads, heads, heads, | heads...". They'll either guess heads because it's come up so | many times already, or because they assume the coin isn't fair, | or tails because it's "overdue" to come up, but none of those | are based on statistics _per se_. | troelsSteegin wrote: | From the article, Gelman's money quote is this: " So I'm not | knocking auto-complete; I'm just disagreeing with Hinton's | statement that "by training something to be really good at | predicting the next word, you're actually forcing it to | understand." As a person who does a lot of useful associative | reasoning and also a bit of logical understanding, I think | they're different, both in how they feel and also in what they | do." | hackerlight wrote: | Why is "using statistics" mutually exclusive with | "understanding"? It would help to carefully define terms. Note | that "understanding" doesn't mean it's necessarily conscious. | | These systems learn high-level representations/abstractions of | concepts we humans also use as part of our cognition. The | concept of an object, an intuitive physics, the role of | specific objects. I don't criticize him for using the word | "understanding" to describe this. | porknubbins wrote: | As a translator I have come to think of it like this- (without | the need for defining what "understanding" means) the ability | to translate correctly depends on the size of your context. So | humans and LLMs are both on the same spectrum with different | context sizes of their training set and all facts known in the | universe (for a theoretical perfect human) respectively. | | Doing moderately technical translations, state of the art LLMs | may get it about 99% right but you find about 1 word per page | where they would have needed to "zoom out" to a wider context | to choose the word correctly. Often its an important word that | is missed. | usrbinbash wrote: | > You would realize that all we are doing is using statistics | to predict what the next item might be. | | Two questions: | | 1. How can humans come up with novel ideas then? Statistically | predicting the next item means I am limited to the training | set. How do humans formulate new concepts in this scenario? | | 2. Why can humans learn from their own output then? If all we | do is statistically predicting the next token, then, by | necessity, feeding our own output back into the model (aka. | letting humans learn from it) should degrade human capabilities | by overfitting to the training set. But we observe exactly the | opposite. | micromacrofoot wrote: | > How can humans come up with novel ideas then? | | hallucinations, emotions | | > Why can humans learn from their own output then? | | if we don't, we suffer | jameshart wrote: | > Statistically predicting the next item means I am limited | to the training set | | Not at all. Extrapolation is perfectly possible in a purely | predictive model. It's one of the things GPTs are _best_ at. | | In the stream of tokens output by an LLM it's completely | possible for new concepts to emerge, and for it then to | continue to use and build on them in the remainder of the | stream. You see this simply executed in programming contexts | with LLMs where it is able to declare a novel function and | then use it. | usrbinbash wrote: | > with LLMs where it is able to declare a novel function | and then use it. | | Novel as in "implements a new algorithm that has never seen | in any form and is actually an improvement over existing | methodology"? | | Here is a little thought experiment: If all the training | data in the set says that manned, powered flight is | impossible, is a statistical prediction engine trained on | that data capable of developing an airplane? | | In other words, could a very powerful stochastic parrot do | what the Wright Brothers did? | | My best guess is that no, it cannot. | lovecg wrote: | Now I'm wondering if birds didn't exist how much longer | would it take us to catch on that flight is possible at | all. | canjobear wrote: | The best "statistic" you could compute to predict the next | token would be the meaning of the prompt. A predictor computing | meaning would outperform one that is not. | xcv123 wrote: | By "understanding" he means it builds a model of higher order | abstractions over the text in order to model the meaning and | context of each token in relation to other tokens. This is more | sophisticated than a Markov process. | Jensson wrote: | > This is more sophisticated than a Markov process. | | Nothing prevents a markov process from having a world model. | A markov process lacks state and generates a sequence based | on statistics and previous entries, so LLMs as they are now | are markov processes. | | Many of the dumb behaviors we see from LLMs today comes from | their lack of internal state between tokens, so it don't | remember what reason it had for generating the previous token | and that means it can easily generate inconsistent answers. | So LLMs being a markov process is an important point to | highlight since it makes its thinking very different from how | humans think. | xcv123 wrote: | The transformer architecture is NOT a Markov process, by | mathematical definition of a Markov process. This is not | even debatable. It's a mathematical fact. | | > Many of the dumb behaviors we see from LLMs today comes | from their lack of internal state between tokens, so it | don't remember what reason it had for generating the | previous token and that means it can easily generate | inconsistent answers | | The attention mechanism in the transformer architecture | models relations between tokens within the context window, | and does the exact opposite of what you are describing | here. This is one aspect of LLMs that violates the Markov | property. | Jensson wrote: | > The transformer architecture is NOT a Markov process, | by mathematical definition of a Markov process. This is | not even debatable. It's a mathematical fact. | | What, yes it is. | | > The attention mechanism in the transformer architecture | models relations between tokens within the context | window, and does the exact opposite of what you are | describing here. This is one aspect of LLMs that violates | the Markov property. | | The context window is finite, so that is the previous | step. You know the dumb markov chains that are based on | bag of words? They also look several words back, they | don't just go based on a single word. LLMs are just that | but with a way larger lookback and some extra logic | there, but none of that changes the fundamental parts to | make it no be a markov process. | | With a large enough context size you could argue it is | now fundamentally different in practice, but in theory it | is the same. There is no "hidden state", its just the | previous n words defines the next word. | xcv123 wrote: | If you define the state broadly enough to include the | state of the entire machine itself, including all of its | internal representations, weights, activations, etc, then | you are playing a funny trick here. | | By the same reasoning, a human brain is also Markov | process. | | What you are doing here is a vast oversimplification and | it is practically useless for understanding how LLMs | work. | tayo42 wrote: | This kind of was on my mind recently, I was wondering, when I | speak am I just spitting out the next word that makes sense or is | there more to it. | | I think there is for people, I can think before I speak, I can | plan out my thought entirely before turning it into words. Each | invocation of the chat bot/llm is a new set of probabilities. I | can plan what my 2nd token output will be and stick to it. Llm | models dont have the ability to think past the first. | | To me that is auto completing, just done better, unless models | really do have a sense of planning ahead. But I don't think the | really complex game playing ai do that either. | pixl97 wrote: | >I can think before I speak, I can plan out my thought entirely | before turning it into words. | | In theory models can/could do the same thing. Think of the | current text output of a model being those thoughts inside your | head, you have an internal scratch space you work on ideas, | then after you perform 'chain of thought' on them, you output | to the world. What you're outputting to the world isn't your | directly thoughts, but more of a calculation of what you wanted | to achieve. | | Really for LLMs you just need to have the model put it's output | to an internal buffer, read that buffer and make sure it makes | sense, then output that to the end user. | | It's probably not fast or compute efficient to do that at this | time, but at some point it likely will be. | drwiggly wrote: | >Really for LLMs you just need to have the model put it's | output to an internal buffer, read that buffer and make sure | it makes sense, then output that to the end user. | | Makes sense to what. The LLM doesn't have a goal, other then | to spew text that looks like it should be there. | stevenhuang wrote: | The analogy lies in the fact that, much like evolution | through natural selection, deliberate intelligence/ability | of organisms to comprehend reality is not the objective, | but something else entirely is. | | For evolution, it's fitness. For LLMs, it's the next token. | | Yet despite that, the ability to reason emerges as a means | to an end. | pixl97 wrote: | To the terminal or instrumental goal of the statement it is | working on. | | Question to LLM, "I have one hundred and eleven eggs in the | store and another two hundred and twenty two are showing up | in an hour, how many eggs will I have in total" | | Internal response "this looks like math problem that | requires addition. the answer is 333. use a calculator | validate 111 + 222. (send 111+222, receive 333). Tool | returns 333 validing previous response" | | External response: "The answer is 333" | | This chain of logic is internally consistent, hence makes | sense. | tayo42 wrote: | Yeah thinking about it more, it does seem possible. | | Like the attention masking done in transformers, I get why, | but I also never understood why you wouldn't want attention | to happen, at least about concepts and abstract ideas about | what is coming up in the phrase before predicting a word | TerrifiedMouse wrote: | > when I speak am I just spitting out the next word that makes | sense or is there more to it. | | There is more to it. Specifically you are doing so to advance | towards a specific goal. LLMs don't have goals. They just pick | from a list of likely tokens - based on their training data - | at random to generate the next token. | brookst wrote: | So the externally provided metaprompt doesn't establish a | goal that generation works toward? | TerrifiedMouse wrote: | I don't think LLMs work towards anything. It just picks | from a list of likely "next tokens" at random. | brookst wrote: | "Likely" as defined by...? | TerrifiedMouse wrote: | Statistics as defined by training data. | | https://writings.stephenwolfram.com/2023/02/what-is- | chatgpt-... | nkingsy wrote: | I think people get tricked by the forward pass mechanics into | thinking a single generation is comparable to a human thought | process. | | I think we have a llm like mechanism we can employ and lean on | as heavily as we like, but we also have an executive function, | like thousands of specialized instant Boolean checks, which can | adjust and redirect the big talky model's context on the fly. | | My hunch is it's turtles all the way down. "Gut feelings" are | hyper-optimized ASICS with tiny parameter counts, but all using | the same intelligence mechanisms. | | Extrapolating from that hunch, we are already witnessing AGI, | and in fact we've started at the top. | | I believe that current llms are actually far far superior to a | human language center and current architectures are more than | sufficient in terms of raw intelligence. | | The challenge that remains is to understand, train, integrate, | and orchestrate all the different flavors of intelligence that | humans wield so elegantly and naturally as to make them opaque | to our own understanding. | overactor wrote: | I think this misses the point made by Hinton. The point is that, | if a model is to reach a certain level of autocomplete, it needs | to understand what it is talking about. Imagine the result of | what required deep thought from you is included in the training | data, along with billions of other such exchanges. At some point | the only way to improve training score is to develop the ability | to "understand" things. (That is, if the model isn't big enough | to memorize the training data.) | | Universal autocomplete implies understanding and the only ending | question is if our current architectures and model sizes are able | to achieve any level of it. | hopfenspergerj wrote: | I agree. We should prompt the model with the statement of the | Riemann hypothesis. If the autocomplete is good, the model will | output a proof. | overactor wrote: | No human can solve the Riemann Hypothesis. Why do you expect | an AI to do it before you consider it able to understand | things? | scarmig wrote: | The key bit is constructing a hard, novel proof. The fact | that AI doesn't (yet) do this isn't evidence that it | doesn't reason, but if it did so it would be strong | evidence that it does reason. | | (I also take the pessimistic point of view that most humans | don't reason, so YMMV.) | lordnacho wrote: | OTOH, doing something that only humans have done thus far | would be a huge step in demonstrating understanding. | | Does that mean when a computer outputs a new proof it | understands? | corethree wrote: | You can do something similar to this without giving it a | problem that might be impossible. | | Train the LLM on a bunch of things but avoid certain | things...Things that humans already know about. | | The you query the model about that thing. See if the | model can come to the same conclusions humans do. You can | actually do this right now with chatGPT. | lordnacho wrote: | So what happens? | corethree wrote: | This is a response from chatGPT. I asked it to come up | with something novel and creative. I went through several | iterations and eventually it started doing math stuff and | I had it focus on that: | | Understood, let's propose a distinct mathematical concept | within the realm of pure mathematics: "Infinite- | Dimensional Fractal Algebra." This new branch of algebra | focuses on operations, equations, and functions within an | infinitely dimensional fractal space. | | In Infinite-Dimensional Fractal Algebra, traditional | algebraic operations like addition, multiplication, and | exponentiation are redefined and extended to work within | an infinite fractal structure. Each point in this fractal | space represents a unique algebraic entity, and the | relationships between these points follow complex, self- | repeating patterns characteristic of fractals. | | The concept challenges and expands the traditional | boundaries of algebra by integrating the self-similarity | and scale invariance of fractals into algebraic | operations. For example, solving an equation in this | system would involve navigating through the infinitely | repeating patterns of the fractal, where each level of | the fractal represents a different dimension or layer of | the problem. | | This approach could offer new insights into complex | systems and phenomena that exhibit fractal-like behavior, | and potentially lead to breakthroughs in areas like chaos | theory, dynamic systems, and perhaps even in | understanding the fabric of the universe itself. | | To illustrate the concept of Infinite-Dimensional Fractal | Algebra, let's consider a simple algebraic operation: | addition. | | In traditional algebra, adding two numbers is | straightforward: \\(3 + 4 = 7\\). In Infinite-Dimensional | Fractal Algebra, however, this operation takes place in a | fractal space with infinite dimensions. | | 1. *Fractal Representation*: Instead of representing | numbers as points on a line, each number is represented | by a unique fractal structure. The fractal for '3' and | the fractal for '4' each have infinitely complex, self- | similar patterns that vary across infinite dimensions. | | 2. *Adding Fractals*: The operation of addition involves | combining these fractal structures. This isn't a simple | sum but a fusion of their infinite patterns. The | resulting fractal for '7' contains elements of both the | '3' and '4' fractals, but in a way that the patterns and | complexities of each are preserved and merged. | | 3. *Infinite Dimensions*: Each level of depth in the | fractals represents a different dimension of the problem. | As you dive deeper into the fractals, you encounter more | intricate patterns, representing more complex layers of | the addition. | | 4. *Result Interpretation*: The outcome of the operation | is a new fractal representing '7'. But this '7' is not | just a number; it's a multi-dimensional entity with a | pattern that encodes not just the value '7', but also the | history of how it was derived from '3' and '4'. | | This approach would require a rethinking of how we | visualize and understand mathematical operations. The | fractal patterns in Infinite-Dimensional Fractal Algebra | carry more information than just the numerical value; | they encode the relationships and interactions between | numbers in a deeply interconnected, infinite-dimensional | space. | | Let's choose the Sierpinski Triangle, a well-known | fractal, to represent the identity element for addition | in Infinite-Dimensional Fractal Algebra. In traditional | algebra, the identity element for addition is '0', as | adding zero to any number doesn't change its value. Here, | we'll conceptualize this using the Sierpinski Triangle. | | 1. *Sierpinski Triangle as the Identity*: In this | framework, the Sierpinski Triangle represents the '0'. | It's a fractal with the property of self-similarity: any | part of the triangle resembles the whole. | | 2. *Adding to the Identity*: When another number | (fractal) is added to this Sierpinski Triangle, the | process involves integrating the fractal patterns of this | number with the Sierpinski Triangle. However, since the | Sierpinski Triangle represents '0', this integration | process must preserve the original number's fractal | structure intact. | | 3. *Concrete Example*: Suppose we're adding the number | '2', represented by a different fractal structure, to '0' | (the Sierpinski Triangle). The addition operation would | overlay the '2' fractal onto the Sierpinski Triangle. The | key here is that the Sierpinski Triangle, with its | repetitive and self-similar gaps, would not alter the | structure of the '2' fractal. Instead, it would fill into | the gaps of the Sierpinski Triangle, maintaining its own | pattern. | | 4. *Resulting Fractal*: The resulting fractal from this | addition is the '2' fractal itself, as the Sierpinski | Triangle (being the identity element) does not alter the | other number in the addition. The intricate details of | the '2' fractal remain unchanged, signifying that adding | '0' did not change its value. | | This is a very abstract and conceptual way to represent | mathematical operations, and the choice of the Sierpinski | Triangle for the identity is primarily for its self- | similar and recursive properties, which align well with | the concepts in fractal algebra. | corethree wrote: | Basically it formed a fuzzy idea of a algebra using | different fractals as entities. | | I'm sure this can be mapped out further into very | concrete detail. It's a highly realistic idea. we have | algebras for all kinds of things from complex numbers to | lists. | | Choosing the triangle for identity is probably the wrong | choice though. The identity fractal should be zero | dimensional or nothing. I think that will in actuality | end up fitting the rules of the identity fractal if we | ever decided to map out this algebra. | | If you're not familiar with abstract algebra basically | it's choosing some fractal that's equivalent to a zero | value and coming up with ways to combine fractals with | operations that hold the same properties of associativity | and commutativity that multiplication/addition does for | numbers. | dboreham wrote: | > it needs to understand what it is talking about | | "to understand" is a concept invented by the human brain to | help it operate. It doesn't actually mean anything concrete in | the physical world. See: the Chinese room argument | (paradoxically it proves the converse of what Searle was | arguing -- that there is no such thing as understanding, | because you can emulate a system with identical outputs from | the same set of inputs without caring or labeling what happens | inside the room). | scudsworth wrote: | that's, uh, not the conclusion of the chinese room argument | ImHereToVote wrote: | The Chinese room doesn't have a model; it has an infinite | dictionary. A sufficiently large LLM isn't a dictionary, it | has to contain a model of some sort. | attilakun wrote: | I agree, Hinton's point is more about next-token-prediction | being a good training objective for understanding, and not | whether LLMs reproduce every aspect of a human's thought | processes. They clearly don't (at least not at present scale), | otherwise we'd have AGI now! | zeroonetwothree wrote: | We have quantum mechanics, a model of the universe that makes | predictions with absurd levels of accuracy. Yet we don't really | understand "why" it works---I don't think we can claim that we | really "understand" the universe. | | This disproves the claim that making accurate predictions implies | understanding. Or perhaps a better way to put it is that the | model has some understanding but a model doesn't necessarily | reflect underlying truth. | raincom wrote: | Best examples come from medicine. There are many drugs that | work, but we don't understand how they work. For example: | https://en.wikipedia.org/wiki/Category:Drugs_with_unknown_me... | marginalia_nu wrote: | What are you even asking when you're asking why it works. | Ultimately, it works the way it does because that's the way | things are. I'm not sure what sort of answers you expect when | asking why physics acts in a particular fashion. | | Also here's Feynman ranting about why questions: | https://www.youtube.com/watch?v=36GT2zI8lVA | firebaze wrote: | Asking "why" may lead to really new physics, though. I assume | the reason for discovering the theory of relativity was | asking "Why is the speed of light constant in all | directions". | marginalia_nu wrote: | My point is, why isn't really a question that makes sense. | | There isn't a "why" the speed of light is constant and | isotropic. The answers you typically see for that question | is just different ways of saying the same thing; there is | no causal relationship. | | In the end, it just is. There are things that must be true | if this is the case, and things that can't be true if this | is the case. | overactor wrote: | Making accurate predictions in a specific domain doesn't imply | understanding, sure. Universal autocomplete can make accurate | predictions in all domains. Apples to oranges. | xcv123 wrote: | By "understanding" he simply means it builds a model of | higher order abstractions over the text in order to model the | meaning and context of each token in relation to other | tokens. This is more sophisticated than a Markov process. | zeroonetwothree wrote: | I found this post to be overly long and unclear in its focus. | Also who is this person that only serves as an autocomplete in | meetings? Maybe don't go to those meetings then? In my meetings | I'm mainly having to decide what our approach will be in some | ambiguous problem space. | coldtea wrote: | > _In my meetings I'm mainly having to decide what our approach | will be in some ambiguous problem space._ | | Maybe that's because you are just some manager or exec in some | company, and not a professor that has to transfer his | knowledge, desribe, and explain things for others in his | meetings? | | The post has a very specific subject it discusses. It's also | not required reading, if someone finds it long, they can skip | it. | patcon wrote: | Yeah, I agree there are two types: | | 1) repeating things he's reflected in or seen in the past (more | like autocomplete), or | | 2) mixing two semantic concepts that he's seen in the past, and | using it as a bridge to walk over to get to the next word/concept | (which is back to autocomplete mode) | | The second is like crafting in Minecraft, instead of just taking | things out of inventory (but all of that is happening in high | dimensional semantic space :) ) | cmdli wrote: | The argument "a sufficiently capable autocomplete must contain a | level of general intelligence" is correct but also not very | useful. It is a lot like saying "a sufficiently fast horse can | fly". | | It is technically correct that when you take things to the | extreme you can accomplish great things, but we may not reach | those levels. We may require completely different technology to | reach those levels of autocomplete, and we have simply reached a | new plateau at this point in time. | morkalork wrote: | Just need a jet engine strapped to your horse for that | wyago wrote: | Funnily enough, "a sufficiently fast horse can fly" sounds sort | of like a decent way to convey the idea of planes to a pre-car | world. | og_kalu wrote: | The argument is simpler than that. Prediction requires a model, | completely accurate or not. There's a projection of the world | in text. A model of the text data we feed it is a model of the | world as humans see it. The trend of loss is more and more | accurate models of the dataset. So it won't stop at any | arbitrary competency level. Indeed, there are already a few | abilities GPT possess that are deemed Super Human. It's not a | distinction that matters to the machine. It's all just data to | be modelled. | | We have reached those levels lol. That's why we're having this | argument. | cmdli wrote: | I think the trouble is that "model" is a very general term. | If you had a computer doing simulations of artillery shots | back in the 50s, then it would have a "model" of the world in | terms of variables tracking projectiles, but this model | doesn't generalize to anything else. If a computer does image | recognition from the 90s and 2000s to recognize faces, then | the computer has a "model" of visual information in the | world, but this model only lets it recognize faces. | | ChatGPT has a model of all the text information on the | internet, but it remains to be seen what the hard limits of | this model are. Does this model let it do logic or predict | the future well, or will no amount of training give it those | abilities? Simply being good in one task doesn't imply a | general ability to do everything, or even most of everything. | LLM's would simply be the last advancement in a field with a | lot of similar advancements. | og_kalu wrote: | >ChatGPT has a model of all the text information on the | internet, but it remains to be seen what the hard limits of | this model are. | | Before training is complete and loss is maxed, there will | be limits on what the "learned so far" model can do that | say absolutely nothing about the limits of a perfect(or | very close to it) model. | | It really looks like anything will converge with enough | compute. I don't think architecture is particularly | important except as "how much compute will this one take?" | question. | | https://nonint.com/2023/06/10/the-it-in-ai-models-is-the- | dat... | | >Does this model let it do logic or predict the future | well, or will no amount of training give it those | abilities? | | There's nothing special about logic. Basically, any | sequence is fair game. It literally does not matter to the | machine. | | Boolformer: Symbolic Regression of Logic Functions with | Transformers(https://arxiv.org/abs/2309.12207) | | That said, GPT-4 can already do logic. It's not perfect but | if perfect logic were a requirement then humans cannot do | logic either. | | >Simply being good in one task doesn't imply a general | ability to do everything, or even most of everything. | | It's not one task. It's one modality (text) that a plethora | of tasks could be learned in. | | Coding and playing chess did not suddenly become a single | task just because we found the common ground that allows a | machine to learn both. | | The text, image, video and audio data we could feed a | transformer will cover anything we care about. | ijidak wrote: | I've noticed that when I speak I really don't control each | word. | | I have an idea that I want to convey, but how each word comes | to my mind as I form a sentence has always felt like it's | controlled by an unconscious algorithm. | | So I don't understand why people find this prediction | mechanism so alien. | | It isn't clear to me how much of communication is really in | our control. | | With the current tools, it feels like we still provide the | ideas we want the AI to convey, and it may be using a nearly | identical mechanism to us to form the words. | | Consciousness would be the computer being able to come up | with the ideas. | | So, it seems to me we've gotten close enough on the | communication side of intelligence. | | But the machine is not conscious. When it is, it seems like | it will generate its own ideas. | | Are people debating whether the machine is conscious? | | Otherwise, it feels very straightforward to grasp what we've | made up to now. | gmuslera wrote: | Is not black or white. Probably most of what we do is System 1, | most of the time we are mostly meme machines, and so is a good | part of the activity on some sectors. | | But are able to make steps forward, intuition, hard step by step | reasoning, finding connections between dots, etc. GPT can do some | of that, and in some point of the road someone must decide if we | reached somewhere else. Even if making the full road may or not | be possible in a foreseeable future. | arketyp wrote: | Indeed. What has caught researchers off guard is the way system | 2 properties seem to appear as emergent phenomena in LLMs. This | is also what has prompted people like Hinton and Sutskever to | make this condensed point about statistical modelling and | understanding (hypothetically) being simply matter of a | spectrum. | Kiro wrote: | Reminds me of this scene from Westworld (slightly NSFW): | https://www.youtube.com/watch?v=ZnxJRYit44k | tester457 wrote: | Also spoilers | canjobear wrote: | The argument is: "deep thinking" isn't like autocomplete because | it feels different. | bananaflag wrote: | Deep thinking is how autocomplete feels like from the inside. | fritzo wrote: | Andrew's distinction between associative vs logical thinking | reminds me of two kinds of programming thinking. Half of my time | while programming is spent churning out glue code, satisfying a | type checker and test suite, and implementing the simple solution | that turns out to work. The other half of my time is spent slowly | thinking through why some simple solution is wrong, and step by | step proceeding towards the correct but complex solution. The | former phase is intuitive, the latter phase is scientific, where | I hypothesize, test, and repeat. | | Reading through the code-as-transcript afterwards it's unclear | which bits of code required shallow associative vs deep rational | thinking, pure autocomplete vs latent chain of thought. | merizian wrote: | The fallacy being made in this argument is that computers need to | perform tasks the same way as humans to achieve equal or better | performance on them. While having better "system 2" abilities may | improve performance, it's plausible that scaled-up next-token | prediction along with a bit of scaffolding and finetuning could | match human performance on the same diversity of tasks while | doing them a completely different way. | | If I had to critique Hinton's claims, I would say his usage of | the word "understand" can be vague and communicate assumptions | because it's from an ontology used for reasoning about human | reasoning, not this new alien form of reasoning which language | models embody. | BurningFrog wrote: | > _The fallacy being made in this argument is that computers | need to perform tasks the same way as humans to achieve equal | or better performance_ | | Especially since I don't think we know that much about how | human intelligence actually works. | edot wrote: | I believe it was Feynman who said something to the effect of | "airplanes do not fly like birds do, but they fly much faster | and can carry much more". So yes, we do not need to exactly | replicate how humans do things in order to do human-like things | in a useful manner. Planes do not flap their wings, but the jet | engine (which is completely unnatural) does a great job of | making things fly when paired with fixed wings of a certain | shape. | metanonsense wrote: | In addition to that, the "system 2" abilities might already be | there with "epi" strategies like chain-of-thought prompting. | Talking / writing to yourself might not be the most efficient | way to think but at least I do it often enough when pondering a | problem. | francois14 wrote: | And what about "stochastic parrot" ? | ffwd wrote: | I think there is another aspect to human thinking other than | system 1/system 2 though, which is the abstract world model | humans have. system 1 / 2 is more like the process, while the | world model is the actual data being 'processed'. | | And I think basically, humans have a much simplified, 'low | dimensional' world model that consists of a set of objects (let's | call them patterns), and then a "list" of essential properties | that those objects have, that leads to a constraint on how each | object can behave in the world model as a whole. | | And this is sort of hierarchical or at least, we can zoom in and | out in detail depending on the level of knowledge we have about a | particular pattern. | | So problem 1 is: It's not clear to me that text or any sort of | data would contain all the necessary constraints so that any | particular prompt would result in a world model that exactly | takes into account the constrains of each object and 2) Even if | was, I'm not sure the process of step by step thinking | (system1/2) about each object and computing world states could | occur in current architectures. This is especially important for | computing a set of objects, then abstracting the result, then | doing another round of computing with that result, or something | like this. | | I'm not hard set on this but this is my current thinking. | robbrown451 wrote: | I agree with Hinton, although a lot hinges on your definition of | "understand." | | I think to best wrap your head around this stuff, you should look | to the commonalities of LLM's, image, generators, and even things | like Alpha Zero and how it learned to play Go. | | Alpha Zero is kind of the extreme in terms of not imitating | anything that humans have done. It learns to play the game simply | by playing itself -- and what they found is that there isn't | really a limit to how good it can get. There may be some | theoretical limit of a "perfect" Go player, or maybe not, but it | will continue to converge towards perfection by continuing to | train. And it can go far beyond what the best human Go player can | ever do. Even though very smart humans have spent their lifetimes | deeply studying the game, and Alpha Zero had to learn everything | from scratch. | | One other thing to take into consideration, is that to play the | game of Go you can't just think of the next move. You have to | think far forward in the game -- even though technically all it's | doing is picking the next move, it is doing so using a model that | has obviously looked forward more than just one move. And that | model is obviously very sophisticated, and if you are going to | say that it doesn't understand the game of Go, I would argue that | you have a very, oddly restricted definition of the word, | understand, and one that isn't particularly useful. | | Likewise, with large language models, while on the surface, they | may be just predicting the next word one after another, to do so | effectively they have to be planning ahead. As Hinton says, there | is no real limit to how sophisticated they can get. When | training, it is never going to be 100% accurate in predicting | text it hasn't trained on, but it can continue to get closer and | closer to 100% the more it trains. And the closer it gets, the | more sophisticated model it needs. In the sense that Alpha Zero | needs to "understand" the game of Go to play effectively, the | large language model needs to understand "the world" to get | better at predicting. | jon_richards wrote: | > to play the game of Go you can't just think of the next move. | You have to think far forward in the game -- even though | technically all it's doing is picking the next move, it is | doing so using a model that has obviously looked forward more | than just one move. | | While I imagine alpha go does some brute force and some tree | exploration, I think the main "intelligent" component of alpha | go is the ability to recognize a "good" game state from a "bad" | game state based on _that moment in time_ , not any future | plans or possibilities. That pattern recognition is all it has | once its planning algorithm has reached the leaves of the | trees. Correct me if I'm wrong, but I doubt alpha go has a | neural net evaluating an entire tree of moves all at once to | discover meta strategies like "the opponent focusing on this | area" or "the opponent feeling on the back foot." | | You can therefore imagine a pattern recognition algorithm _so | good_ that it is able to pick a move by only looking 1 move | into the future, based solely on local stone densities and | structures. Just play wherever improves the board state the | most. It does not even need to "understand" that a game is | being played. | | > while on the surface, they may be just predicting the next | word one after another, to do so effectively they have to be | planning ahead. | | So I don't think this statement is necessarily true. | "Understanding" is a major achievement, but I don't think it | requires planning. A computer can understand that 2+2=4 or | where to play in tic-tac-toe without any "planning". | | That said, there's probably not much special about the concept | of planning either. If it's just simulating a tree of future | possibilities and pruning it based on evaluation, then many | algorithms have already achieved that. | Jensson wrote: | A big difference between a game like Go and writing text is | that text is single player. I can write out the entire text, | look at it and see where I made mistakes on the whole and edit | those. I can't go back in a game of Go and change one of my | moves that turned out to be a mistake. | | So trying to make an AI that solves the entire problem before | writing the first letter will likely not result in a good | solution while also making it compute way too much since it | solves the entire problem for every token generated. That is | the kind of AI we know how to train so for now that is what we | have to live with, but it isn't the kind of AI that would be | efficient or smart. | bytefactory wrote: | This doesn't seem like a major difference, since LLMs are | also choosing from a probability distribution of tokens for | the most likely one, which is why they respond a token at a | time. They can't "write out' the entire text at a time, which | is why fascinating methods like "think step by step" work at | all. | Jensson wrote: | But it can't improve its answer after it has written it, | that is a major limitation. When a human writes an article | or response or solution, that is likely not the first thing | the human thought of, instead they write something down and | works on it until it is tight and neat and communicates | just what the human wants to communicate. | | Such answers will be very hard for an LLM to find, instead | you mostly get very verbose messages since that is how our | current LLM thinks. | bytefactory wrote: | Completely agree. The System 1/System 2 distinction seems | relevant here. As powerful as transformers are with just | next-token generation and context, which can be hacked to | form a sort of short-term memory, some time of real-time | learning + long-term memory storage seems like an | important research direction. | klodolph wrote: | > As Hinton says, there is no real limit to how sophisticated | they can get. | | There's no limit to how sophisticated a model can get, but, | | 1. That's a property shared with many architectures, and not | really that interesting, | | 2. There _are_ limits to the specific ways that we train | models, | | 3. We care about the relative improvement that these models | deliver, for a given investment of time and money. | | From a mathematical perspective, you can just kind of keep | multiplying the size of your model, and you can prove that it | can represent arbitrary complicated structures (like, internal | mental models of the world). That doesn't mean that your | training methods will produce those complicated structures. | | With Go, I can see how the model itself can be used to generate | new, useful training data. How such a technique could be | applied to LLMs is less clear, and its benefits are more | dubious. | lsy wrote: | The difference is that "the world" is not exhaustible in the | same way as Go is. While it's surely true that the number of | possible overall Go game states is extremely large, the game | itself is trivially representable as a set of legal moves and | rules. The "world model" of the Go board is actually just | already exhaustive and finite, and the computer's work in | playing against itself is to generate more varied data within | that model rather than to develop that model itself. We know | that when Alpha Zero plays a game against itself it is valuable | data because it is a legitimate game which most likely | represents a new situation it hasn't seen before and thus | expands its capacity. | | For an LLM, this is not even close to being the case. The sum | of all human artifacts ever made (or yet to be made) doesn't | exhaust the description of a rock in your front yard, let alone | the world in all its varied possibility. And we certainly | haven't figured out a "model" which would let a computer | generate new and valid data that expands its understanding of | the world beyond its inputs, so self-training is a non-starter | for LLMs. What the LLM is "understanding", and what it is | reinforced to "understand" is not the world but the format of | texts, and while it may get very good at understanding the | format of texts, that isn't equivalent to an understanding of | the world. | og_kalu wrote: | >The sum of all human artifacts ever made (or yet to be made) | doesn't exhaust the description of a rock in your front yard, | let alone the world in all its varied possibility. | | No human or creature we know of has a "true" world model so | this is irrelevant. You don't experience the "real world". | You experience a tiny slice of it, a few senses that is | further slimmed down and even fabricated at parts. | | To the bird who can intuitively sense and use electromagnetic | waves for motion and guidance, your model of the world is | fundamentally incomplete. | | There is a projection of the world in text. Moreover training | on additional modalities is trivial for a transformer. That's | all that matters. | lsy wrote: | That's the difference though. I _know_ my world model is | fundamentally incomplete. Even more foundationally, I know | that there _is_ a world, and when my world model and the | world disagree, the world wins. To a neural network there | is no distinction. The closest the entire dynamic comes is | the very basic annotation of RLHF which itself is done by | an external human who is providing the value judgment, but | even that is absent once training is over. | | Despite not having the bird's sense for electromagnetic | waves, I have an understanding that they are there, because | humans saw behavior they couldn't describe and | investigated, in a back-and-forth with a world that has | some capacity to disprove hypotheses. | | Additional modalities are really just reducible to more | kinds of text. That still doesn't exhaust the world, and | unless a machine has some ability to integrate new data in | real time alongside a meaningful commitment and | accountability to the world as a world, it won't be able to | cope with the real world in a way that would constitute | genuine intelligence. | og_kalu wrote: | >I know my world model is fundamentally incomplete. Even | more foundationally, I know that there is a world, and | when my world model and the world disagree, the world | wins. | | Yeah this isn't really true. There's not how humans work. | For a variety of reasons, Plenty stick with their | incorrect model despite the world indicating otherwise. | In fact, this seems to be normal enough human behaviour. | Everyone does it, for something or the other. You are no | exception. | | And yes LLMs can in fact tell truth from fiction. | | GPT-4 logits calibration pre RLHF - | https://imgur.com/a/3gYel9r | | Just Ask for Calibration: Strategies for Eliciting | Calibrated Confidence Scores from Language Models Fine- | Tuned with Human Feedback - | https://arxiv.org/abs/2305.14975 | | Teaching Models to Express Their Uncertainty in Words - | https://arxiv.org/abs/2205.14334 | | Language Models (Mostly) Know What They Know - | https://arxiv.org/abs/2207.05221 | | The Geometry of Truth: Emergent Linear Structure in Large | Language Model Representations of True/False Datasets - | https://arxiv.org/abs/2310.06824 | | Your argument seems to boil down to "they can't perform | experiments" but that isn't true either. | astrange wrote: | There is no such thing as a world model, and you don't | have one of them. This is a leftover bad psychological | concept from the 70s AI researchers who never got | anywhere. People and other creatures do very little | modeling things, they mostly just do stuff. | tazjin wrote: | The world on our plane of existence absolutely is | exhaustible, just on a much, much larger scale. Doesn't mean | that the process is fundamentally different, and for the | human perspective there might be diminishing returns. | huytersd wrote: | Next word generation is one way to put it. The key point here | is we have no idea what's happening in the black box that is | the neural network. It could be forming very strong connections | between concepts in there with multi tiered abstractions. | SkiFire13 wrote: | > One other thing to take into consideration, is that to play | the game of Go you can't just think of the next move. You have | to think far forward in the game -- even though technically all | it's doing is picking the next move, it is doing so using a | model that has obviously looked forward more than just one | move. | | It doesn't necessarily have to look ahead. Since Go is a | deterministic game there is always a best move (or moves that | are better than others) and hence a function that goes from the | state of the game to the best move. We just don't have a way to | compute this function, but it exists. And that function doesn't | need the concept of lookahead, that's just an intuitive way of | how could find some of its values. Likewise ML algorithms don't | necessarily need lookahead, they can just try to approximate | that function with enough precision by exploiting patterns in | it. And that's why we can still craft puzzles that some AIs | can't solve but humans can, by exploiting edge cases in that | function that the ML algorithm didn't notice but are solvable | with understanding of the game. | | The thing is though, does this really matter if eventually we | won't be able to notice the difference? | bytefactory wrote: | > It doesn't necessarily have to look ahead. Since Go is a | deterministic game there is always a best move | | Is there really a difference between the two? If a certain | move shapes the opponent's remaining possible moves into a | smaller subset, hasn't AlphaGo "looked ahead"? In other | words, when humans strategize and predict what happens in the | real world, aren't they doing the same thing? | | I suppose you could argue that humans also include additional | world models in their planning, but it's not clear to me that | these models are missing and impossible for machine learning | models to generate during training. | Joel_Mckay wrote: | Well structured nonsense is indistinguishable from the assumption | of sentience for the undisciplined. This means a 95% LLM | generated article is indistinguishable from an illogical | contradictory chaotic rant. | | Best of luck, and remember to tip your bot on the way out =) | breadwinner wrote: | There is evidence that the human brain is also doing | "autocomplete" (prediction). The human brain uses predictive | mechanisms when processing language, and these mechanisms play an | important role in forming thoughts. | | When we hear or read a word, our brain quickly generates a set of | predictions about what word might come next, based on the context | of the sentence and our past experiences with language. These | predictions are constantly updated as we receive new information, | and they help us to process language more efficiently and | accurately. | | In addition, research has shown that the brain engages in similar | predictive processes when we are forming thoughts or planning | actions. For example, when we plan a complex movement, such as | reaching for a cup, our brain generates a set of predictions | about the movements required to complete the action. These | predictions are constantly updated as we receive feedback from | our muscles and our environment, allowing us to make adjustments | and achieve our goal. | | See links below for additional details: | | https://www.earth.com/news/our-brains-are-constantly-working... | | https://www.psycholinguistics.com/gerry_altmann/research/pap... | | https://www.tandfonline.com/doi/pdf/10.1080/23273798.2020.18... | | https://onlinelibrary.wiley.com/doi/10.1111/j.1551-6709.2009... | Probiotic6081 wrote: | And curiously, those predictions that are made during language | comprehension are made by the language production system | itself! | lsy wrote: | This is of course sometimes true, we take shortcuts to minimize | cognitive effort. However, when the situation warrants it we | think more carefully about which words to use to achieve goals, | or to reach correspondence with the situation at hand. Or we | move more precisely and carefully to do something we haven't | done before. I've no doubt that an LLM can approximate whatever | model of language a person has from their life experience, but | I don't think this type of model is capable of active coping, | making judgments, or of having accountability to the world it's | meant to operate in. | lacrimacida wrote: | > When we hear or read a word, our brain quickly generates a | set of predictions about what word might come next, based on | the context of the sentence | | Yes a big part of it is prediction but the brain also does | something else which LLMs by themselves completely eschew. The | human brain imagines in pictures, used abstractions, studies | things and produces new knowledge. | Eggpants wrote: | As long as it's just returning the tokens of the statistical mean | of previous tokens, it is just a clever autocomplete. | | A somewhat useful internet search engine without all the ads/seo | garbage. Of course, the first rule of the internet is don't | believe everything on the internet. | | I believe AI won't overcome its statistic mask until it can self | tune its coefficients in real time. That requires an error | function not yet invented that can mimic animals pain feedback | error function. | | Baby steps can be taken with attempting to run GPT generated code | then adjusting coefficients based on the returned errors. Aka | compiler and unit test failures are basic "pain" functions, which | is pretty much how humans learn to code. | esafak wrote: | Autocompletion is a red herring of a dismissal. Once you have a | general purpose model, you can do many things with it, including | next token prediction. And in that respect we are no different. | eviks wrote: | > Suppose you want to be really good at predicting the next word. | If you want to be really good, you have to understand what's | being said. That's the only way. So by training something to be | really good at predicting the next word, you're actually forcing | it to understand. | | Ok, so in this framing you'll just get stuck below the "really | good" level because the training isn't based on understanding | Falkon1313 wrote: | I think that insight is an important feature that GPT doesn't | seem to have, at least not yet. | | For instance, I've seen people saying they can ask it for help | with how to code something and it will help them. Although the | results aren't perfect, they can be helpful. | | However, I recall years ago asking a more senior developer how to | do something. They could've just told me how, but instead they | asked why I was doing that. Then they told me not to bother with | writing code at all, and to instead do this other, much simpler | thing which would have the same effect at solving the client's | request. ChatGPT wouldn't have had that insight. | | Years later, a junior dev came to me asking for help with his | code because it wasn't working. We went over the code line by | line and he explained what it was trying to do and it all looked | good. But when he ran it, the web server crashed. I told him to | try adding a comment: // Do not remove this | comment. | | Sure enough, it worked perfectly and the server no longer | crashed. Why? Because I realized that if his code was wrong he | should get an error message, not crash the server. But sometime | back I had read about a bug in a version of the interpreter that | would crash when parsing a file that was an exact multiple of | 4096 bytes. Would chatGPT have thought of that? Probably not. It | would've just talked about the code. | | Which is not to say that it's useless. But it lacks greater | context and the insight to go beyond the parameters of the | question on its own initiative. Then again, so do humans most of | the time. | bytefactory wrote: | > I think that insight is an important feature that GPT doesn't | seem to have, at least not yet. | | I actually think this is a limitation of the RLHF that GPT has | been put through. With open-ended questions, I've seen GPT4 | come up with reasonable alternatives instead of just answering | the question I've asked. This is often seen as the infamous, | "however, please consider..." bits that it tacks on, which | occasionally do consider actual insights into the problem I'm | trying to solve. | | In most cases it seems to try very hard to mold the answer into | what I want to hear, which in many cases isn't necessarily the | best answer. A more powerful version of GPT with a less- | restrictive RLHF seems like it would be more open to suggesting | novel solutions, although this is just my speculation. | Animats wrote: | The big insight we've obtained from LLMs is how much of human | discourse is glorified autocomplete. Humans are usually not very | profound. | lacrimacida wrote: | But some are. Could we have that as AI? Would LLMs ever have | it? | 1vuio0pswjnm7 wrote: | Here is a question: What is the practical significance of viewing | "AI" as autocomplete versus some other idea. Why try to influence | how others view using a computer. Why anthromorphise. These are | questions for which I have answers, but of course they are | personal opinions. Historically, programmers often like to refer | to programming as "magic". But magic is illusion, | entertaintainment, tricks. Believing in "magic" is a personal | choice. | | Why not describe things in terms of what they do instead of what | they "are". The latter is highly subjective and open to abuse. ___________________________________________________________________ (page generated 2023-11-18 23:01 UTC)