[HN Gopher] AI language models are struggling to "get" math
       ___________________________________________________________________
        
       AI language models are struggling to "get" math
        
       Author : rbanffy
       Score  : 61 points
       Date   : 2022-10-12 13:53 UTC (9 hours ago)
        
 (HTM) web link (spectrum.ieee.org)
 (TXT) w3m dump (spectrum.ieee.org)
        
       | blueprint wrote:
       | Maybe because it's not actual AI.
        
       | PaulHoule wrote:
       | Ashby strikes again.
       | 
       | Current sequence models don't have the right structures to
       | represent math. Even if they use floating point internally, they
       | can't really float the point because the nonlinearity in the
       | model has a certain scale.
       | 
       | A system that processes language can take advantage of the human
       | desire for closure
       | 
       | https://www.eurogamer.net/blood-in-the-gutter
       | 
       | to fool people into thinking it is more capable than it really
       | is. Math isn't like that.
        
         | hey_over_here wrote:
         | Mwell, the article claims, and points to work that also claims,
         | that large language models can actually be made to perform
         | arithmetic well. They need fine-tuning, verification, chain of
         | thought prompting and majority voting to be combined but the
         | linked Google blog says that Minerva hit 78.5% accuracy (on the
         | GSM8K benchmark).
         | 
         | For me the problem is that we can look at the output and say if
         | it's right or wrong, but we know what language models do,
         | internally: they predict the next token in a sequence. And we
         | know that this is no way to do arithmetic, in the long run,
         | even though it might well work over finite domains.
         | 
         | Which is to say, I'm just as skeptical as you are, and probably
         | even more, but I think it's useful to separate the claim from
         | what has actually been demonstrated. Google claims its Minerva
         | model is "solving maths problems" but what it's really doing is
         | predicting solutions to problems like the ones it's been fine-
         | tuned on, and those problems are problems stated at least
         | partly in natural language, not "naked" arithmetic operations.
         | In the latter, language models are still crap because they
         | can't use the context of the natural language problem statement
         | to help them predict the solution.
         | 
         | Btw, "chain of thought prompting" if I remember correctly is a
         | process by which an experimenter prompts the language model
         | with a sequence of intermediary problems. So it's not so much
         | the model's chain of thought, as the experimenter's chain of
         | thought and the experimenter is asking the model to help him or
         | her complete their chain of thought. I have a fuzzy
         | recollection of that though.
        
         | sharemywin wrote:
         | computers already do math. language models just need to
         | translate problems into code of some kind that can be run to
         | get the answer.
         | 
         | executive function/planning is probably the biggest problem at
         | this point for ai.
        
           | sharemywin wrote:
           | The point I'm trying to make is LLMs don't need to do
           | everything just be the glue to other systems.
        
             | enord wrote:
             | Wait what? Glue as in extract high level semantic
             | representations from _syntatic probabilities_ and pass on
             | to appropriate domain specific tools?
             | 
             | This is the glaring hole in LLMs, a paradoxical semantic
             | incoherence despite impressive sentenial and gramatical
             | coherence.
             | 
             | As glue it is so thin as to be potable.
        
           | zackmorris wrote:
           | That's interesting, I hadn't made the connection between
           | executive function and intelligence.
           | 
           | I went through a burnout in 2019 that felt like having a
           | stroke. My brain finally reached such a level of negative
           | reinforcement after years of failure that it wouldn't let me
           | work anymore. I'd go to do very simple tasks, everything from
           | brushing my teath to writing a TODO list, and it was like the
           | part of my brain that performed those tasks wasn't there
           | anymore. Or at least, it no longer obeyed if it perceived a
           | potential reward involved. It was like my motivation got
           | reversed. I had to relearn how to do everything, despite
           | knowing that no reward might come for a very long time, which
           | took at least 6 months before I began recovering. The closest
           | answer I have is that my brain healed through faith.
           | 
           | I only bring it up because executive function may be
           | associated with a subjective experience of meaning. If
           | there's truly no point to anything, then it's hard to summon
           | the motivation to string together a sequence of AI tasks into
           | something more like AGI.
           | 
           | I guess that's another way of saying that nihilism could be
           | the final hurdle for AGI to overcome. It's like the human
           | philosophical question of why there's something instead of
           | nothing. Or why angels would choose to be incarnate on Earth
           | to experience a life of suffering when it's so much easier to
           | remain dissociated.
        
           | the_af wrote:
           | > _language models just need to translate problems into code
           | of some kind that can be run to get the answer_
           | 
           | A huge "just"! Isn't this the magic step? Translating
           | ambiguous symbols to meaning and combining them in meaningful
           | ways is a big deal which, apparently, these AI models cannot
           | do. They can just parrot things.
        
             | gamegoblin wrote:
             | It's already being done and will only get better: https://t
             | witter.com/sergeykarayev/status/1569377881440276481
        
               | the_af wrote:
               | I suspect it's not solved, because solving this (beyond
               | some trick/toy examples) is essentially solving General
               | AI.
        
           | JacobiX wrote:
           | I'm not so sure about that. Of course computers can do
           | arithmetic operations, but this is not the same as solving
           | math problems, proving theorems, etc. Even mathematical
           | objects are approximated up to an approximation error in a
           | computer (like a differentiable manifold or a real number).
        
             | PaulHoule wrote:
             | There has been big progress in automated theorem proving
             | lately
             | 
             | https://en.wikipedia.org/wiki/Automated_theorem_proving
             | 
             | you just don't hear about it much because the technology is
             | not so fashionable today. Also it is more clear what the
             | limits are, I mean, Turing, Godel, Tarski and all of those
             | apply to neural networks as well any other formal system
             | but people mostly forget it.
             | 
             | Knuth wrote a really fun volume of _The Art of Computer
             | Programming_ about advances in SAT solvers which are the
             | foundation for theorem provers
             | 
             | https://www.amazon.com/Art-Computer-Programming-Fascicle-
             | Sat...
             | 
             | Everybody is aware that neural network techniques have
             | improved drastically in performance, it's much more obscure
             | that the toolbox of symbolic A.I. has improved greatly.
             | Back in the 1980s production rules engines struggled to
             | handle 10,000 rules, now Drools can handle 1,000,000+ rules
             | with no problems.
        
               | sva_ wrote:
               | > There has been big progress in automated theorem
               | proving lately
               | 
               | It doesn't seem like there has been much progress for
               | anything but FOL?
        
               | thwayunion wrote:
               | The wiki article on automated theorem proving is quite
               | bad as an overview of the active field; it's more a
               | historical article about the mid to late 20th century.
               | Most of the interesting things in automated reasoning
               | have happened since the naughts, and that article kind of
               | stops in the 90s
               | 
               | SMT solvers have gotten quite good over the past couple
               | decades, there are tons of domain-specific tools (eg in
               | software and hardware verification), tons of niche
               | applied decidable or semi-decidable theories (eg various
               | modal and description logics), a lot of progress on the
               | proof assistant ("non-fully-automated theorem proving")
               | paradigm, and so on.
        
               | PaulHoule wrote:
               | It's clear that commonsense reasoning needs to deal with
               | modals, counterfactuals, defaults, temporal logic, etc.
               | 
               | It's not hard to add some extensions to logic for a
               | particular application but a very hard problem to develop
               | a general purpose extended logic.
               | 
               | I look at the logic-adjacent production rules systems
               | which never really standardized some of the commonly
               | necessary things such as agendas, priorities, defaults,
               | etc.
        
             | IshKebab wrote:
             | Computers are much much better at all that stuff than
             | almost everyone too. Try asking Wolfram Alpha to solve
             | something. Computers have gotten really good at proving
             | things in the last couple of decades and formal
             | verification methods are becoming increasingly popular.
             | 
             | I think sharemywin is probably on to something. It's going
             | to be _really_ hard for an AI to prove that e.g. x >0 &&
             | x+y <= 1 && y>1 is unsatisfiable, but it's trivial for an
             | SMT solver. On the other hand it probably isn't that much
             | of a leap to make an AI that can feed that problem _into_
             | an SMT solver.
        
             | thwayunion wrote:
             | _> Of course computers can do arithmetic operations, but
             | this is not the same as solving math problems, proving
             | theorems, etc. _
             | 
             | Computers can solve math problems and prove theorems; this
             | remains a significant subfield of Computer Science with
             | lots of industrial use cases. However, pure machine
             | learning based approaches toward these problems remain
             | subpar.
             | 
             |  _> Even mathematical objects are approximated up to an
             | approximation error in a computer (like a differentiable
             | manifold or a real number)._
             | 
             | Only because it caught on (and in the case of non-
             | computationally-intensive applications, for purely
             | historical reasons). For example, Mathematica has Reals and
             | even functionality for Reals that is literally impossible
             | to implement for integers [1,2]. There are also precise
             | characterizations of objects in differential geometry [3].
             | You could imagine applying LLMs to these types of programs
             | a la Copilot, but when you do this you will find yourself
             | agreeing with Paul Houle's observation that math is harder
             | to fake than eg art, language, or even glue code for web
             | apps.
             | 
             | [1] https://reference.wolfram.com/language/ref/Reduce.html
             | 
             | [2] https://en.wikipedia.org/wiki/G%C3%B6del%27s_incomplete
             | ness_...
             | 
             | [3] https://github.com/bollu/diffgeo
        
               | the_af wrote:
               | > _Computers can solve math problems and prove theorems_
               | 
               | But the specification of the problem must be done by a
               | human, translating to a formalized system that the
               | software can understand. And if there's a problem in the
               | formal specification, it's mostly up to the human to
               | notice and fix; the computer will happily output garbage
               | or crash or enter an infinite loop.
               | 
               | So it seems this translation, going from an exploration
               | of the problem statement, usually in ambiguous terms, to
               | a formal specification, _and the awareness to possibly
               | detect whether the answers make sense and the specs were
               | right_ , is uniquely human.
        
             | casey2 wrote:
             | Counterexample: Shalosh B. Ekhad is a computer who is also
             | a mathematician.
        
             | Sharlin wrote:
             | Well, you don't _need_ anything else than basic arithmetic
             | to encode the entirety of, say, ZFC, enumerate every
             | proposition in it, and halt iff you find a proof of
             | whatever theorem you 're after. It just might take a
             | while...
        
             | sharemywin wrote:
             | Online Integral Calculator Solve integrals with
             | Wolfram|Alpha
             | 
             | https://www.wolframalpha.com/calculators/integral-
             | calculator...
        
               | sva_ wrote:
               | Now try to make a computer prove that there are no
               | natural numbers a,b,c; so that a^n + b^n = c^n for any n
               | > 2.
        
               | Sharlin wrote:
               | Shifting the goal posts a bit, aren't we?
        
               | sharemywin wrote:
               | I guess it depends on the outcome your worried about.
               | Super intelligence or machines that replace the average
               | office worker.
        
           | PaulHoule wrote:
           | That's not a bad approach, necessarily.
           | 
           | There is a fairly simple program in
           | 
           | https://www.amazon.com/Paradigms-Artificial-Intelligence-
           | Pro...
           | 
           | that solves word problems using the methods of the old AI.
           | The point is that is is efficient and effective to use real
           | math operators and not expect to fit numbers through the
           | mysterious bottleneck of neural encoding.
        
         | lupire wrote:
         | Floating point isn't relevant here.
         | 
         | The problem is that human language is approximate and correct
         | math is not, so pattern matching on prose text is doomed. AI
         | trained on exact math does a lot better. But that's not fully
         | generic so fails the weird GPT goal of modeling all of human
         | intelligence through prose. That's not how people solve math at
         | all.
         | 
         | GPT's "Superficially plausible but wrong" math is actually
         | pretty good match for non-expert bad-at-math average human
         | behavior.
        
           | zozbot234 wrote:
           | > GPT's "Superficially plausible but wrong" math is actually
           | pretty good match for non-expert bad-at-math average human
           | behavior.
           | 
           | Relevant blog post: https://www.greaterwrong.com/posts/YhgjmC
           | xcQXixStWMC/artific... "The best experts in the field
           | estimate it will be at least a hundred years before
           | calculators can add as well as a human twelve-year-old."
        
             | PaulHoule wrote:
             | I like Yudkovsky parodying himself there although I still
             | don't know if he has a sense of humor or not.
        
       | CommieBobDole wrote:
       | Also Excel is terrible at encoding MP3s.
       | 
       | It's a language model; why would we expect it do math or try to
       | somehow shoehorn math into the model? Do the language centers of
       | our brain do math?
       | 
       | If something approximating AGI is going to happen, it's going to
       | be a lot of models tied together with an executive function to
       | recognize and send things to the area that's good at working with
       | them.
        
         | dr_dshiv wrote:
         | Well, because we want rational language models. Something with
         | a sense of truth.
         | 
         | Math is not irrelevant--and I'm sure it's a solvable problem
         | with language models.
        
           | CommieBobDole wrote:
           | But if it's rational and has a sense of truth, then it's AGI.
           | Which I don't think is impossible or even unattainable within
           | a reasonable amount of time, but we're .001% of the way
           | there, not 50% or 75%.
           | 
           | These models are fascinating, but the problem 'a lot of the
           | things this model generates lack any semantic meaning' is
           | inherent and likely insurmountable without connecting the
           | model to other, far more complex models that haven't been
           | built yet.
           | 
           | We are at the level where our models can consistently
           | generate blocks of text with full sentences in them that make
           | grammatical sense. Which is pretty cool.
           | 
           | But the next step is being able to consistently generate full
           | sentences that make grammatical sense and usefully convey
           | information. And while the current models do that a lot of
           | the time, they don't do that all of the time because they
           | don't and can't know the difference without essentially being
           | a different thing. Because to do that consistently, we need
           | an "understanding what things mean" model. Which is many
           | orders of magnitude larger and more difficult than a text
           | generator.
        
         | [deleted]
        
         | thwayunion wrote:
         | What are some (non-nefarious) applications of generative
         | language models that produce language which isn't constrained
         | by some sort of rationality or directed by some sort of high-
         | level goal?
         | 
         | The point isn't the math. The point is that, in math and
         | similar disciplines, it's harder to get away with producing
         | mostly undirected gibberish that happens to have some imputed
         | meaning. The point is "use language to do something where it's
         | easy to verify correctness and generating infinite amounts of
         | synthetic data is trivial"
         | 
         | If a language model can't even do high school algebra, then I
         | have a lot less confidence that it will ever be useful for
         | customer service applications or any other number of potential
         | applications outside of propaganda, advertising, and spam.
        
         | hey_over_here wrote:
         | > It's a language model; why would we expect it do math or try
         | to somehow shoehorn math into the model?
         | 
         | Language models can do math, or anyway arithmetic. That's
         | because language models are trained to predict the next token
         | in a sequence and an arithmetic operation can be represented as
         | a sequence of tokens.
         | 
         | For example, see Figure 3.10 on page 22, here:
         | 
         | https://arxiv.org/abs/2005.14165
         | 
         | The only problem is that language models are crap at arithmetic
         | because they can only predict the next token in a sequence.
         | That's enough to guess at the answer of an arithmetic problem
         | some of the time but not enough to solve any arithmetic problem
         | all of the time.
         | 
         | More generally, the answer to your question is in the same
         | Figure 3.10 I've referenced above. OpenAI (and others) have
         | claimed that their large language models can do arithmetic. So
         | then people tested the claim and found it to be a bag of old
         | cobblers.
         | 
         | Hence the article above. Nobody's trying to "shoehorn" anything
         | anywhere. It's just something that language models can do,
         | albeit badly.
        
           | CommieBobDole wrote:
           | Right, but what you're describing is 'not being able to do
           | math'. Like, if I've memorized a multiplication table and can
           | give you any result that's on the table but can't multiply
           | anything that wasn't on the table, I can't do multiplication.
        
             | hey_over_here wrote:
             | It depends on how you see it. I agree with you, generally,
             | but in the limit, if you memorised all possible instances
             | of multiplication, then yes, you could certainly be said to
             | know multiplication.
             | 
             | I've not just come up with that off the top of my head,
             | either. In PAC-Learning (what we have in terms of theory,
             | in machine learning) a "concept" (e.g. multiplication) is a
             | set of instances and a learning system is said to learn a
             | concept if it can correctly label each of a set of testing
             | instances by membership to the target concept with
             | arbitrary probability of error. Trivially, a learner that
             | has memorised every instance of a target concept can be
             | said to have learned the concept. All this is playing fast
             | and loose with PAC-Learning terminology for the sake of
             | simplification.
             | 
             | The problem of course is that some concepts have infinite
             | sets of instances, and that is the case with arithmetic. On
             | the other hand, it's maybe a little disingenuous to require
             | a machine learning system to be able to represent infinite
             | arithmetic since there is no physical computer that can do
             | that, either.
             | 
             | Anyway that's how the debate goes on these things. I'm on
             | the side that says that if you want to claim your system
             | can do arithmetic, you have to demonstrate that it has
             | something that we can all agree is a recognisable
             | representation of the rules of arithmetic, as we understand
             | them. For instance, the axioms of Peano arithmetic. Which
             | though is a bit unfair for deep learning systems that can't
             | "show their work" in this way.
        
       | abrax3141 wrote:
       | The situation is actually much worse for science, or any moving
       | field. This models are by design and necessity historical. So
       | that if, for example, the FDA issues a drug approval overnight,
       | The model camp follow sudden changes in a "reasoned" why.
        
       | make3 wrote:
       | The article is actually about how they are getting good at it :)
        
       | mavu wrote:
       | Talking about this stuff would be so much easier if we stopped
       | calling those software "AI".
       | 
       | It is a machine learning algorithm. It is an electronic Parrot.
       | 
       | thats it. And suddenly no one will wonder "OH MY WHY CANN IT NOT
       | DO MATH< IT SMART?!?!"
        
       | mjburgess wrote:
       | How much of this is just "AI is bad at everything", but in the
       | math case, it's easier for the lay person _to tell_.
       | 
       | It's all just passable garbled nonesense that the reader (goes to
       | lengths) to interept based on _their_ prior knowledge, which is
       | not expressed in the syntax of what these systems output.
       | 
       | In the case of mathematics, we're far less willing to "BS away"
       | the interpretive failures. But if we were equally demanding,
       | likewise, all prose generated by these systems isnt AI "getting"
       | anything either.
       | 
       | Pass a film reel thru' a shredder and an art student would still
       | call it a film. Pass math thru' and a mathematician wont. This
       | says more about our ability and inclination to make sense out of
       | nonesense when in apparent communicative situations (since, when
       | speaking to a person, this actually improves our mutual
       | understanding).
       | 
       | So, how much of AI is just hacking people's cognitive failures:
       | (1) people's willingness to attribute intention; (2) people's
       | willingness to impart sense "at all costs" to apparent
       | communication; and (3) "hopeium".
        
         | woah wrote:
         | Have you ever used Github CoPilot? It does a lot of useful
         | work, automating away rote typing in programming. Have you
         | tried Dall-E or Stable Diffusion? They make good looking
         | images. This comment seems completely unmoored from where the
         | state of the art is right now.
        
           | civilized wrote:
           | I agree. It's possible to point out the clear limitations of
           | current AI without being oblivious to the huge, indisputable
           | advances that have occurred.
           | 
           | People thought it might take centuries for a computer to
           | defeat a top human in Go. Then deep learning showed up and a
           | few years later it's the opposite.
           | 
           | A lot of the things deep learning methods are doing now are
           | things no one had any idea how long research would take to
           | achieve, or if they were even possible.
           | 
           | Personally, I think we are currently hitting some walls that
           | might take a while to climb before we get to AGI, but I am
           | _very_ impressed at the recent progress.
        
           | TuringTest wrote:
           | Math follows a completely different approach with respect to
           | how machine-learning AIs do their thing.
           | 
           | Reason derives its strength in having a few primitives and
           | creating new assertions through the transformation of symbols
           | by following precise rules (which is how algorithms work).
           | 
           | In ML-based AIs, everything is imprecise and probabilistic,
           | and this kind of generation gets its strength from building
           | recognizable from utterly imprecise inputs and training -
           | quite the opposite of how logic and reason evolve. Now,
           | "classic" AI was a powerful way to derive new knowledge, and
           | automatic theorem proving is a strong discipline; but the
           | recent breakthroughs in AI are not directly applicable to
           | classic techniques.
           | 
           | Do you know what machine-learning AIs could be good for?
           | Generating "insight" in problem solvers for guiding the
           | theorem demonstrations through the proof search space, trying
           | to find the best sub-spaces to explore. If there's a way to
           | create human-like general AI, it will likely combine both
           | kinds of generation - the rational methods of symbolic logic
           | and the "irrational" statistical methods of ML.
        
             | zmgsabst wrote:
             | Automated theorem proving is the same problem as "complete
             | and label the diagram", which image generation is okay at.
             | 
             | Work in progress for sure, though.
        
           | mjburgess wrote:
           | sure, but co-pilot is mostly just copying code (see, for
           | example, the issue with it producing quake source code).
           | 
           | If you think of AI as a dial from sample(data) to mean(data),
           | then as the dial is turned towards the mean() you get more
           | "generic" results, but also more garbled ones.
           | 
           | Copilot is more like a search engine, having turned the dial
           | more towards sample().
           | 
           | The real invention of the NN is simply to provide that dial
           | in a trainable way.
           | 
           | The only change to the "state of the art" is the size of the
           | weights, and how long they take to train. This "advancement"
           | is no more impressive than google indexing more webpages.
           | 
           | There has been no step-change advancement in AI in, perhaps,
           | 50 years. All we see today is a product of hardware, in
           | GPU/CPUs able to compress TBs of data into c. 300GB of
           | weights. And likewise, the internet to provide it and SSDs to
           | hold it.
           | 
           | The "magic" of AI is no more the magic of wikipida, here:
           | copilot is good only because million+ programmers made github
           | good.
           | 
           | It's still little more than a fancy search.
        
             | woah wrote:
             | > It's all just passable garbled nonesense that the reader
             | (goes to lengths) to interept based on their prior
             | knowledge, which is not expressed in the syntax of what
             | these systems output.
             | 
             | > It's still little more than a fancy search.
             | 
             | I feel like the goalposts have been moved between your two
             | comments. CoPilot is obviously not producing garbled
             | nonsense, and it's also not just printing the top result
             | from StackOverflow. It is producing code that references my
             | variables, does the right thing 50% of the time, and
             | usually compiles.
             | 
             | One of the nice little things is error messages- when I
             | type `if (!foo) { throw ... ` CoPilot is able to complete a
             | nicely formatted and descriptive error message from its
             | understanding of my code. It's not garbled nonsense, and
             | it's not just a search engine.
             | 
             | Does AI deserve the hype it sometimes gets? Not yet. But I
             | think you're going to have to start digging a little deeper
             | for your commentary.
        
             | planetsprite wrote:
             | Even if AI got to the point of perfectly passing every
             | expert-level Turing test your degree of rigor as to what
             | "thinking" is would never truly permit any belief of AI
             | having struck the golden nugget of intelligence.
             | 
             | Imagine if we were all self-replicating computers, and
             | certain members of this silicon race began experimenting
             | with making creatures with carbon macro-molecules to create
             | organic intelligence, you could make the same claim in the
             | other direction:
             | 
             | "There has been no step-change advancement in Organic
             | Intelligence in, perhaps, 50 years. All we see today is a
             | product of cell count, in neurotransmitter chemistry able
             | to compress TBs of experiences into c. 300B neurons."
        
           | Marazan wrote:
           | Dall-E produces good looking images within certain
           | parameters.
           | 
           | When you are in its bounds it seems magical, once you go
           | outside it seems like a weak joke.
           | 
           | And many of the reasons it is bad outside its sweet spot are
           | fundamental to how it works not a flaw that can be iterated
           | away.
        
         | dimmuborgir wrote:
         | AI is bad at music also. Even the state of the art transformer
         | models can't produce more than a few seconds of coherent
         | melodic phrases.
        
           | [deleted]
        
           | vladf wrote:
           | Have you heard the piano continuations of AudioLM?
           | 
           | https://google-research.github.io/seanet/audiolm/examples/
        
             | bloep wrote:
             | Indeed, there is lots of denial or ignorance in this thread
             | (ignorance in the technical sense). AudioLM already
             | produced impressive results and it's a tiny fraction of
             | what is already possible because performance simply
             | improves with scale. One can probably solve music
             | generation today with a ~$1B budget for most purposes like
             | film or game music, or personalized soundtracks. This is
             | not science fiction.
        
               | p1esk wrote:
               | I don't see a lot of progress in AudioLM compared to
               | results from 2018: https://storage.googleapis.com/magenta
               | data/papers/maestro/in...
               | 
               | What's more interesting and concerning - listen carefully
               | to the first piano continuation example from AudioLM,
               | notice the similarity of the last 7 seconds to Moonlight
               | sonata: https://youtu.be/4Tr0otuiQuU?t=516
               | 
               | I'm afraid we will see a lot of this with music
               | generation models in the near future.
        
               | bloep wrote:
               | There are quite simple tricks to avoid repetition/copying
               | in NNs, e.g. by (1) training a model to predict the
               | "popularity" of the main model's outputs and penalizing
               | popular/copied productions by backpropping through that
               | model so as to decrease the predicted popularity, or (2)
               | by conditioning on random inputs (LLMs can be prompted
               | with imaginary "ID XXX" prefixes before each example to
               | mitigate repetitions), or (3) by increasing temperature
               | or optimizing for higher entropy. LLM outputs are already
               | extremely diverse and verbatim copying is not a huge
               | issue at all. The point being, all evidence points to
               | this not being a show stopper if you massage these
               | evolutionary methods for long enough in one or more of
               | the various right ways.
        
               | p1esk wrote:
               | I'm not sure what you mean by "backpropping through that
               | model so as to decrease the predicted popularity". During
               | training, we train a model to literally reproduce famous
               | chunks of music exactly as they are in the training set.
               | We can also learn to predict popularity at the same time,
               | but we can't backpropagate anything that will reduce
               | popularity, because this would directly contradict to the
               | main loss objective of exact reproduction.
               | 
               | Having said that, I think the idea of predicting
               | popularity is good - we can use it for filtering already
               | generated chunks during post-training evaluation phase.
               | 
               | I don't think the other two methods you suggest would
               | help here, we want to generate while conditioning on
               | famous pieces, and we don't want to increase temperature
               | if we want to generate conservative, but still high
               | quality pieces.
               | 
               | It's true that we (humans) are less sensitive to
               | plagiarism in the text output, but even for LLMs it is a
               | problem when it tries to generate something highly
               | creative, such as poetry. I personally noticed multiple
               | times a particular beautiful poetry phrases generated by
               | GPT-2 only to google it and find out they were copied
               | verbatim from a human poem.
        
             | phillipharr1s wrote:
             | Pretty sure the first continuation is a famous piece with a
             | few notes messed up. Can't remember the name. Honestly it
             | only sounds marginally better than the old markov chain
             | continuations.
        
               | macrolocal wrote:
               | Yep, Moonlight Sonata (mov. 3) no less. Talk about over-
               | fitting!
        
               | vladf wrote:
               | Isn't that as good as it gets? The whole point of the
               | continuations is that given a short leading prompt from a
               | real piece that it should continue it realistically.
               | 
               | It didn't get to train on the test set, if that's what
               | you're implying, and I find it hard to believe the
               | assertion that continuations are copies of the train set
               | (if that's your claim).
        
               | p1esk wrote:
               | It definitely copied a piece of Moonlight sonata in the
               | last 7 seconds of the first continuation sample:
               | https://youtu.be/4Tr0otuiQuU?t=516
        
               | holub008 wrote:
               | Interestingly, the original piece is a later Beethoven
               | Sonata, Op. 31 No. 3. The model has its styles down!
               | https://youtu.be/P-Q5aBAw-T4?t=78
        
           | Der_Einzige wrote:
           | That's wrong, and shows how ignorant you are of SOTA
           | techniques for music generation. They are far ahead of that.
        
           | denton-scratch wrote:
           | It doesn't surprise me that an AI model for language can't
           | grok maths or music. I can't see how a language model can map
           | to maths. Hell, I don't even know how to describe music in
           | words. It's possible to articulate _some_ maths in words, but
           | that often involves using words with unexpected definitions.
        
           | aaroninsf wrote:
           | AI can be quite good at music,
           | 
           | but yes there is not yet at on-demand button rendering from a
           | text prompt of bitstreams encoding composed performed and
           | mastered music.
        
           | CactusOnFire wrote:
           | AI is bad at Audio. AI can do MIDI fine.
        
             | dwringer wrote:
             | MIDI is extraordinarily expressive and is likely used to
             | sequence a large majority of music produced within the last
             | three decades. A lot of the instruments you hear are
             | synthesizers or samplers running directly from MIDI. There
             | is a lot more to what MIDI can do, and is used for, than
             | the conception most people have from "canyon.mid" or old
             | website background music. If an AI can do MIDI just fine
             | then it's an extremely small leap to doing audio just fine.
        
               | p1esk wrote:
               | _If an AI can do MIDI just fine then it 's an extremely
               | small leap to doing audio just fine._
               | 
               | Unfortunately this is not true. It takes a huge amount of
               | human effort to make MIDI encoded music sound good. The
               | difference between MIDI and raw audio music generation is
               | the same as the difference between drawing a cartoon and
               | producing a photograph.
               | 
               | To clarify, yes MIDI can be expressive, but what's being
               | generated when people say "AI generates MIDI music" is
               | basically a piano roll.
        
             | causi wrote:
             | Which is a real shame. AI-powered restoration of poor-
             | quality audio would be highly useful.
        
               | aaroninsf wrote:
               | That particular niche has had some pretty amazing
               | successes already. It's coming.
               | 
               | We can't produce arbitrary media streams with many "stack
               | layers" of meaning and detail yet, but we can do a lot of
               | specific instrumental transformations...
               | 
               | Vaguely relevant: https://koe.ai/recast/
        
           | stephencanon wrote:
           | Which is extra funny, because GOFAI models (e.g. David Cope's
           | work) were doing a pretty OK job back in the 1990s!
        
           | mjburgess wrote:
           | I think if we replaced "AI" with "taking averages over
           | subsets of historical examples", then there'd be no mystery
           | for when "AI" will be good or bad at anything.
           | 
           | Would we expect a discrete melodic structure to be
           | expressible as averages of prior music? No.
        
           | yeasurebut wrote:
           | That's what a musician does. They make short loops and loop
           | them.
           | 
           | This reads like someone who knows sheet music and theory but
           | does not listen to music. It's repetition of short phrases
           | over and over.
           | 
           | I'm not really sure what people expect of general AI trained
           | on human generated outputs. It can't make up anything
           | anything "net new" only compose based upon what we feed it.
           | 
           | I like to think AI is just showing us how simple minded we
           | really are and how our habit of sharing vain fairy tales
           | about history makes us believe we're masters of the universe.
        
             | dimmuborgir wrote:
             | Those models are not trained on short loops. They are
             | trained on whole songs just like image generation models
             | are trained on whole images. And yet they struggle to
             | repeat sections, modulate to a different key, create
             | bridges, intros and outros. After a few seconds of
             | hallucinating a melodic line they simply abandon the idea
             | and migrate to another one. There is no global structure
             | whatsoever.
        
               | yeasurebut wrote:
               | Musicians don't spit out an album in one sitting and
               | they're highly trained in theory. They get bored and
               | tired of a process and take breaks. They come up with an
               | album of loops composed together over time.
               | 
               | AIs state will forever be constrained to the limits of
               | human cognition and behavior as that's what it's trained
               | on.
               | 
               | I read published research all year. Circular reasoning.
               | Tautology. It's all over PhD thesis.
               | 
               | There's no "global structure" to humanity. Relativity is
               | a bitch.
               | 
               | Seeing the world through the vacuum of embedded inner
               | monologue ignores the constraints of the physical one.
               | It's exhausting dealing with the mentality some clean
               | room idea we imagine in a hammock can actually exist in a
               | universe being ripped asunder by entropy.
               | 
               | It's living in memory of what we were sold; some ideal
               | state. Very akin to religious and nation state idealism.
        
               | mjburgess wrote:
               | I think it's deeply depressing that AI has been sold as
               | something even capable of modelling anything humans do;
               | and quite depressing that this comment exists.
               | 
               | "AI" is just taking `mean()` over our choice of encodings
               | of our choice of measurements of our selection of things
               | we've created.
               | 
               | There is as much "alike humans" in patterns in tree bark.
               | 
               | AI is an embarrassingly dumb procedure, incapable of the
               | most basic homology with anything any animal has ever
               | done; us especially.
               | 
               | We are embedded in our environments, on which we act, and
               | which act on us. In doing so we physically grow, mould
               | our structure and that of our environment, and develop
               | sensory-motor conceptualisations of the world. Everything
               | we do, every act of the imagination or of movement of our
               | limbs, is preconditioned-on and symptomatic-of our
               | profound understanding of the world and how we are in it.
               | 
               | The idea that `mean(424,34324,223123,3424,....)` even has
               | any revelance to us at all is quite absurd. The idea that
               | such a thing might sound pleasant thru' a speaker,
               | _irrelevant_.
               | 
               | This is a product of i dont know what. On the optimist
               | side, a cultish desire to see Science produce a new
               | utopia. On the pessimisst side, a likewise delusional
               | desire to see Humans as dumb machines.
               | 
               | What a sad state!
        
               | pessimizer wrote:
               | I lack your confidence, and find it a bit religious.
               | 
               | > The idea that `mean(424,34324,223123,3424,....)` even
               | has any revelance to us at all is quite absurd.
               | 
               | Most of what I say to anyone is exactly this.
               | 
               | When I'm about to give anyone any information, I look
               | back at all of the relevant past information that I can
               | recall (through word and sensory association, not by
               | logic, unless I have a recollection of an associated
               | internal or external dialog that also used logical
               | rules.) I multiply those by strength of recollection and
               | similarity of situation (e.g. can I create a metaphor for
               | the current situation from the recalled one?). I take the
               | mean, then I share it, along with caveats about the
               | aforementioned strength of recollection and similarity of
               | situation.
               | 
               | This is what it feels like I actually do. Any of these
               | steps can be either taken consciously or by reflex. It's
               | not hidden.
               | 
               | > I think it's deeply depressing that AI has been sold as
               | something even capable of modelling anything humans do
               | 
               | This is a bizarre position. All computers ever do is
               | model things that humans do. All a computer consists of
               | is a receptacle for placing human will that will continue
               | to apply that will after the human is removed. They are a
               | way of crystallizing will in a way that you can sustain
               | it with things (like electricity) other than the
               | particular combination of air, water, food, space,
               | pressure, temperature, etc. that is a person. An overflow
               | drain is a computer that models the human will. An
               | automatic switch/regulator is the basic electrical model
               | of human will, and a computer is just a bunch of those
               | stitched together in a complementary way.
        
               | mjburgess wrote:
               | You're an animal. You've no idea what you do, and you're
               | using machines as a model. Likewise, in the 16th C. it
               | was brass cogs; and in anchient greece, air/fire/etc.
               | 
               | You're no more made of clay & god's breath, as you are
               | sand and electricy.
               | 
               | You're an oozing, growing, malluable organic organism
               | being physiologically dynamically shaped by your sensory-
               | motor oozing. You're a mystery to yourself, and these
               | self-reports, heavily coloured by the in-vogue tech _are
               | not science_ , they're pseudoscience.
               | 
               | If you want to study how animals work, you'd need to
               | study _that_. Not these impoverished metaphors that
               | mystify both machines and men. No machine has ever
               | acquired a concept through sensory-motor action, nor used
               | one to imagine, nor thereby planned its actions. No
               | machine is ever at play, nor has grown its muscles to be
               | better at-play. No machine has, therefore, learned to
               | play the piano. No machine has thought about food,
               | because no machine has been hungry; no machine has cared,
               | nor been motivated to care by a harsh environment.
               | 
               | An inorganic mechanism is nothing at all like an animal,
               | and an algorithm over a discrete sequence of numbers with
               | electronic semantics, is nothing like tissue development.
               | 
               | What you are doing is not something you can introspect.
               | And you arent really doing that. Rather, you've learned a
               | "way of speaking" about machine action and are back-
               | projecting that onto yourself. In this way, you're
               | obliterating 95% of the things you are.
        
         | saghm wrote:
         | > How much of this is just "AI is bad at everything", but in
         | the math case, it's easier for the lay person to tell
         | 
         | Honestly, even as someone generally pretty dismissive of the AI
         | hype, I'm not sure you can go that far. The whole reason we
         | have specific mathematical notation is that human languages
         | often are not super great at dealing with it, and English in
         | particular is pretty abysmal for being both unambiguous and
         | precise (and I'd be surprised if language models didn't end up
         | suffering from biases analogous to how many image recognition
         | AI models have been found to not deal well with a diverse set
         | of human appearances). We don't teach math the same way we
         | teach English, and we certainly don't expect people to be
         | experts at teaching both, so why would we expect an AI model
         | designed for language to be able to do math?
        
         | planetsprite wrote:
         | Language models aren't built for math. Their
         | improvement/training cycles aren't sensitive to the exactness
         | and rule-based nature of mathematical language, plus there are
         | probably a lot of bad/misleading examples of math in the source
         | data.
         | 
         | You'd have to be unrealistically pessimistic to call what GPT-3
         | and other huge language models produce "nonsense".
        
           | visarga wrote:
           | It's not that they were not built for math, but more like
           | verification is hard. But it's hard for humans as well. A
           | large generative model + a fast verifier could do wonders.
           | 
           | AlphaGo was built on that - the model can propose moves, but
           | you can verify who won in the end. There are some code
           | generation models that write their own tests as well, or use
           | externally provided tests to verify their solutions. The
           | DeepMind matrix multiplication algorithm was also "learning
           | from verification" of generated solutions, because it's
           | trivial to do that. In general verification remains an open
           | problem.
        
             | spywaregorilla wrote:
             | I disagree. It is that they were not built for math. While
             | brain analogies are shittier than most people assume, this
             | is like trying to do math in your head without being
             | allowed to think through calculations.
        
       | burlesona wrote:
       | I genuinely wonder if we will find there are some inherent
       | tradeoffs to knowledge and understanding such that if we ever
       | have machines that can "think like humans" they would in practice
       | run into human-like cognition limits: ie such machines would be
       | "bad at math" in the same way humans are "bat at math" compared
       | to conventional computers.
        
         | ryandvm wrote:
         | Indeed. I posit that as we get closer and closer to simulating
         | how the human brain works in the pursuit of artificial
         | intelligence, we're going to start seeing more and more of the
         | same "bugs" that humans have (logical fallacies, susceptibility
         | to illusions, mental illness, etc.)
         | 
         | You think your job sucks now, just wait until you're dealing
         | with the general AI over on the UX team that's trying to get
         | your ass fired because it's fostering a 3 year old grudge over
         | that time you said Chappie was stupid.
        
           | Der_Einzige wrote:
           | At first, I thought it was surprising that a language model
           | with a restricted vocabulary (e.g. banning the letter "E")
           | acts significantly more "mentally ill", and then I thought
           | about how I would come across if forced to use that
           | constraint all the time, and I realized that maybe I'd appear
           | mentally ill too!
           | 
           | You can play with LMs with constrained vocabularies here:
           | https://huggingface.co/spaces/Hellisotherpeople/Gadsby
        
         | blackbear_ wrote:
         | That's an interesting thought. However it's not cognitive
         | limits that make humans bad at math, it's just a "hardware"
         | issue: a human with a piece of paper is much better at math.
        
         | aaaaaaaaaaab wrote:
         | Even if neural networks were fundamentally incompatible with
         | conventional computation, I don't see why you couldn't augment
         | a neural network with a conventional ALU to do the numerical
         | computations. This is exactly what humans do with pencil and
         | paper - it's just a bit too slow.
        
           | auganov wrote:
           | Either the language model would need to know what it's doing
           | or the host program would have to know what the AI is doing.
           | Both seem out of reach. The latter seems more doable since
           | you could hack something up for simple scenarios, but you'd
           | effectively have to match the capabilities of the neural
           | network in a classical way to handle every case (which would
           | render using a neural net moot).
        
       | WalterBright wrote:
       | People struggle to get math, too.
        
       | _0ffh wrote:
       | And no wonder, as they correspond much closer to a Kahneman
       | system 1 than system 2, where _we_ do most of our math.
        
       | [deleted]
        
       | bionhoward wrote:
       | I bet vision transformers understand math better because it's
       | somewhat artistic
        
       | abrax3141 wrote:
       | More generally, they struggle to get thing right. They're great
       | at grammatical confabulation, but when you need a correct answer,
       | or a correct drug recommendation, ask an expert.
        
       | pessimizer wrote:
       | That's because they're not modelling anything. The shocking thing
       | about current AI models is that just sort of repeating and
       | copying from memory what you've heard and seen gets you 97% of
       | the way to imitating a person.* They still need to generate
       | actual models somewhere to create consistency; so many generated
       | images with one eye completely different from the other, or three
       | arms, or fingers that grow into their cellphones.
       | 
       | If you solve this, you've probably solved almost anything in the
       | simulation field. I have no confidence that the solution will
       | even be complicated. Information consumed needs to be used to add
       | to some sort of model, and that model always needs to be used as
       | part of input. The complicated part would be to make that base
       | model able to modify itself reasonably based on input, to
       | tolerate constant inconsistency, and to constantly refine itself
       | towards consistency i.e. ruminate.
       | 
       | I think a huge difference (which I think was approached through
       | theories of embodied cognition) is that people start with a model
       | (or the ability to create a model) of themselves. We can apply
       | that model to other things and use it both to change how we
       | ourselves behave, and how we speculate about the invisible states
       | of other things. It's not for nothing that we can (and must)
       | anthropomorphize anything.
       | 
       | -----
       | 
       | * Which was huge towards the confirmation of my belief that this
       | is all people do 97% of the time.
        
         | mgraczyk wrote:
         | This is factually wrong, both in terms of quantity and quality.
         | 
         | Current AI models are not "just sort of repeating and copying
         | from memory". This is just an incorrect characterization of how
         | they work and how they perform.
         | 
         | AI skeptics often say things like this then backpedal with
         | something like "Well they aren't really repeating what they
         | heard, but their generative model is just a slightly more
         | sophisticated version of repeating what they've heard." But
         | this weaker claim is also true of humans. It's certainly the
         | case that >97% percent of what humans say is "just repeating
         | and copying" in the same sense.
        
           | pessimizer wrote:
           | > Current AI models are not "just sort of repeating and
           | copying from memory". This is just an incorrect
           | characterization of how they work and how they perform.
           | 
           | You say this, but don't explain how. Because this is exactly
           | what they are doing.
           | 
           | > AI skeptics often say things like this
           | 
           | I'm not really an AI skeptic. I think that we're very close
           | to AI being indistinguishable from people. There are clearly
           | problems that need to be solved, but I think the hardest
           | problem was _accepting the fact that humans are largely just
           | copying_ and realizing that would be enough to get you 97% of
           | the way there, especially if you gave a machine far more to
           | copy than a human could consume.
           | 
           | > then backpedal with something like "Well they aren't really
           | repeating what they heard, but their generative model is just
           | a slightly more sophisticated version of repeating what
           | they've heard." But this weaker claim is also true of humans.
           | It's certainly the case that >97% percent of what humans say
           | is "just repeating and copying" in the same sense.
           | 
           | Maybe I'm not expressing myself clearly, but it seems that
           | you're just repeating my comment with a sneer. Agreeing
           | angrily?
        
             | mgraczyk wrote:
             | I'm disagreeing with the language you are using to
             | characterize models. "copying from memory" implies that
             | there is something being copied, and a memory that you are
             | copying it from. I am pointing out that LLMs do not do
             | this. It's not how they work.
             | 
             | If you polled 1M random English speakers randomly and asked
             | them whether or not a system that "just sort of repeating
             | and copying from memory" could produce completely novel
             | answers in response to completely novel questions, I
             | suspect that the overwhelming majority would respond by
             | saying no.
             | 
             | Similarly if you asked 1000 people working on LLMs whether
             | they work by "copying from memory", I suspect nearly all
             | would say no. It would be accurate to say they are
             | "generating text via a probabilistic model of language,
             | which is encoded in the weights of a neural network", but
             | there really is just no sense in which the models are
             | "copying" anything.
             | 
             | That being said, these models do "copy" some text in the
             | sense that they can reconstruct some strings from their
             | training input. For example every LLM I have played with
             | can recite the first few paragraphs of A Tale of Two Cities
             | verbatim. But that's a capability they have _in spite of_
             | their actual design, not because of it.
        
               | pessimizer wrote:
               | > I'm disagreeing with the language you are using to
               | characterize models. "copying from memory" implies that
               | there is something being copied, and a memory that you
               | are copying it from. I am pointing out that LLMs do not
               | do this. It's not how they work.
               | 
               | Then we're arguing about the semantics of the word
               | "copy." That is not an interesting argument when you know
               | exactly what I mean and can express it clearly.
               | 
               | edit: If it helps, either substitute your description in
               | whenever I say 'pretty much copy' or change the word
               | "copy" to whatever word you want to use. But even though
               | I can't reproduce the opening paragraph to A Tale of Two
               | Cities verbatim, I can certainly write something that is
               | "copying" it without doing that, and anyone who was
               | familiar with the book and read my paragraph would agree
               | with me.
        
               | mgraczyk wrote:
               | It is semantics, but that was your whole point no?
               | 
               | > That's because they're not modelling anything
               | 
               | If we agree on "how LLMs work", then how can you claim
               | that they aren't modeling anything? They are modeling
               | language, and while it's unlikely current paradigms will
               | be proving new mathematical truths, it's completely
               | plausible to me that bigger models will be able to handle
               | simple math word problems like those in the article,
               | precisely because LLMs can model the "Alice", "Apple",
               | and "Bob" entities.
        
           | sebastialonso wrote:
           | can you actually share what "current AI models" are then? Not
           | trying to be rude, but you just said "na ah" and then refused
           | to argument any position.
        
             | mgraczyk wrote:
             | Current LLMs are "modeling" something according to pretty
             | much any sense of the word "model".
             | 
             | In the technical, computational linguistics sense, LLMs are
             | language models that give a conditional posterior
             | distribution over sentences. Given some (constrained)
             | context, the model tells you the posterior distribution
             | over sentences in or around that context.
             | 
             | In the nontechnical, layman sense of the word, they are a
             | system that is used as an example of language. LLMs imitate
             | language by generating new sentences. They are a "model" in
             | the same way that an architectural model is a model, or in
             | the same way that a statue is a model of a human.
             | 
             | The other point I disagreed with is the characterization
             | that LLMs "just sort of repeat and copy from memory". I
             | went into more detail about that in other replies.
        
       | [deleted]
        
       | jxy wrote:
       | > "When multiplying really large numbers together ... they'll
       | forget to carry somewhere and be off by one," says Vineet
       | Kosaraju, a machine learning expert at OpenAI. Other mistakes
       | made by language models are less human, such as misinterpreting
       | 10 as 1 and 0, not ten.
       | 
       | So the expert has never seen a seven year old struggling in
       | adding two single digit numbers together? Did the expert learn 1
       | and 0 being 10 first and learn to speak second?
       | 
       | > The MATH group found just how challenging quantitative
       | reasoning is for top-of-the-line language models, which scored
       | less than 7 percent. (A human grad student scored 40 percent,
       | while a math olympiad champ scored 90 percent.)
       | 
       | Is this that surprising? How would our ieee editor score on the
       | same problem set?
        
       | Buttons840 wrote:
       | Are there any general purpose models that are good at learning
       | math? I mainly know basic feed-forward neural nets, but I don't
       | think they do well outside their training region. Math, of
       | course, has an infinite training region.
        
         | alan-crowe wrote:
         | I attempted to create a general purpose model for the exact
         | version of the "what comes next problem." It enumerated
         | primitive recursive functions, trying them out as it went. The
         | limitation to primitive recursive functions was convenient
         | because they always terminate. I didn't have to filter out the
         | functions that ran for too long. (or do I?)
         | 
         | The enumeration inherently includes functions of several
         | variables, so I wasn't restricted to examples such as 1->1,
         | 2->4, 3->9, 4->16 etc.
         | 
         | I could try it out on examples such as (1,2)->3 (2,1)->3
         | (0,2)->2, etc. Perhaps with enough it would "learn to add" =
         | find a primitive recursive function that did addition.
         | 
         | I got as far as finding the first problem. The enumeration
         | technique that I used was effectively doing a tree recursion,
         | like that function for computing Fibonacci numbers that bogs
         | down because Fib(10) is computing Fib(5) lots of times. I had a
         | lot of numbers that coded for the identity function, lots of
         | numbers that coded for the first few functions, making the
         | whole thing bog down, trying the same few functions over and
         | over under different numerical disguises.
         | 
         | I thought that I could see my way to fixing this first problem.
         | Have some way of recognizing numbers that give forms that give
         | the same function. I guessed that I could approximate this by
         | saying that if two functions give the same value on a variety
         | of arguments they are probably the same. Then I parameterise
         | this criterion and tune. That opens the way to creating a
         | consolidated enumeration, analogous to fixing the tree
         | recursive fibonacci function by memoization, except trickier.
         | 
         | But my health is poor and I ran out of energy.
         | 
         | Also, I have a guess for the second problem. What happens if I
         | fix the first problem and my enumeration reaches decently
         | complicated primitive recursive functions. While they will all
         | terminate, some might run for far too long, causing the process
         | to bog down. Rejecting them on the basis of limiting the run
         | time might work well. We are happy to only learn reasonably
         | effect functions for doing maths.
         | 
         | It is a fun idea and I encourage others to have a go.
        
         | geoduck14 wrote:
         | From my (limited) experience with the advanced ML models, they
         | can "do basic math", but they make amateur mistakes with basic
         | things - which indicates they _don 't actually know addition_,
         | but they are good at looking at patterns in existing language.
         | 
         | I would assume that state-of-the-art ML models could "convert a
         | word problem into an equation", then feed _that_ equation into
         | a 30 year-old graphing calculator to  "do the math"
         | 
         | The fact that no one has done this is an indicator that "there
         | are more important things to work on", and it is just a matter
         | of time that someone connects the two together
        
           | MarkPNeyer wrote:
           | This seems so much like humans that it makes me think lots of
           | people are learning math with an ML-like approach instead
           | of... whatever the heck people like engineers and
           | mathematicians are doing.
        
             | vidarh wrote:
             | I wonder how these language models would do if we tried to
             | teach them maths the way schools do: Feed them explanations
             | first, then endless sequences of toy problems, see which
             | they got wrong and feed them corrected examples back in.
             | 
             | I'm not at all surprised they don't do well at maths,
             | because while there are maths texts online, I doubt there
             | is _enough_ material to give these models the same
             | experience of repetition and reinforcement to help
             | sufficiently generalise an understanding of the underlying
             | rules.
        
               | lupire wrote:
               | Generating solved math problems is trivial, like making
               | AlphaZero play itself in chess. Sparse Data is not the
               | problem. Refusing to use it is.
        
               | vidarh wrote:
               | I don't think it's so much a refusal, as that it's not
               | been a sufficient priority for anyone before. As the
               | article points out there are now a few training sets
               | which includes math problems, and models which do well on
               | them. But the remaining problems seems to be with basics
               | which humans tends to learn to do consistently with a lot
               | of repetition, and it'd be interesting to see those
               | datasets extended to the very simple.
        
             | idealmedtech wrote:
             | Anyone can do higher level math, the problem is that math
             | education is generally done by people who see math as a
             | tool for computation, rather than a study of deep
             | connections bordering on philosophy, and beautiful insights
             | resembling poetry. I've been in arguments before where
             | someone didn't believe me that the underpinnings of modern
             | philosophy are essentially the same as math!
             | 
             | If the teachers don't love math, how can we expect students
             | to?
        
           | lupire wrote:
           | What you describe is exactly what state of the art has done.
           | They even lied and said it was "solving math problems" by
           | calling numpy methods.
        
           | the_af wrote:
           | > _" convert a word problem into an equation"_
           | 
           | Isn't this a huge step? It's not a minor detail remaining to
           | be solved, but possibly the largest step!
        
         | neoneye2 wrote:
         | There is "LODA", which uses genetic algorithms, that
         | continuously mutates existing math programs until discovering
         | something new. It uses OEIS as training data, around 350k known
         | integer sequences, such as primes/fibonacci. Around 100k
         | programs have been mined so far.
         | 
         | https://loda-lang.org/
         | 
         | I'm a contributer to LODA.
         | 
         | LODA runs on CPU. It doesn't use GPU. If you have spare
         | computer, then please consider contributing with the mining.
         | Your contribution helps.
         | 
         | https://boinc.loda-lang.org/loda/
        
       | xiphias2 wrote:
       | It is a great sign that we are building AI in the right
       | direction. Before building artificial human intelligence, it
       | makes sense to get to the intelligence level of a mosquito or
       | fly, then go to more intelligent animals in later iterations.
       | 
       | As most of the human knowledge is encoded in videos, getting
       | better at understanding / generating videos will clearly get us
       | closer to make computers understand the world.
        
       | yshrestha wrote:
       | Language models can generate a Python function that does the math
       | perfectly.
       | 
       | I bet you would get better results if you tweaked the prompt to
       | say "Generate a Python program that solves X math problem" and
       | then just ran the resulting Python script.
       | 
       | It does not need to be AGI to be useful.
        
         | lupire wrote:
         | You mean "generate a Python function that _calls a library_
         | that does math perfectly, right?
        
           | hgomersall wrote:
           | In the limit, it's going to design an AI to write some python
           | to call a library that does the math perfectly.
        
           | thwayunion wrote:
           | Unlike 99.99% of human programmers, who can and often do
           | implement everything in sympy/numpy from scratch ;-)
        
           | yshrestha wrote:
           | Exactly! Hey it gets the job done :)
           | 
           | Software is just a tall wedding cake of abstractions built on
           | top of abstractions.
        
         | swyx wrote:
         | you can also tell the model that it doesnt know how to do math,
         | and _it respects that_
         | 
         | https://twitter.com/goodside/status/1568448128495534081
        
         | Kim_Bruning wrote:
         | That is also a very valid and interesting thing to do.
         | 
         | But it's also quite interesting to see how the model would do
         | "by itself". All kinds of interesting lessons to be learned!
        
           | yshrestha wrote:
           | Yeah! It is interesting to try and figure out "what" the
           | model is actually learning. It is a valid thread of
           | scientific inquiry.
        
         | mlajtos wrote:
         | Exactly, we need computer-equipped neural nets. Models need to
         | use traditional UIs (including programming languages) and then
         | we can talk about how to stop them. :)
        
         | lairv wrote:
         | That could only generate constructivist [0] proofs, and there
         | are many things done in modern maths which are not
         | constructivist. Maybe a better approach would be to use Curry-
         | Howard [1] correspondence to directly get proofs from generated
         | programs
         | 
         | [0]
         | https://en.wikipedia.org/wiki/Constructivism_(philosophy_of_...
         | 
         | [1]
         | https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspon...
        
       ___________________________________________________________________
       (page generated 2022-10-12 23:01 UTC)