[HN Gopher] AI language models are struggling to "get" math ___________________________________________________________________ AI language models are struggling to "get" math Author : rbanffy Score : 61 points Date : 2022-10-12 13:53 UTC (9 hours ago) (HTM) web link (spectrum.ieee.org) (TXT) w3m dump (spectrum.ieee.org) | blueprint wrote: | Maybe because it's not actual AI. | PaulHoule wrote: | Ashby strikes again. | | Current sequence models don't have the right structures to | represent math. Even if they use floating point internally, they | can't really float the point because the nonlinearity in the | model has a certain scale. | | A system that processes language can take advantage of the human | desire for closure | | https://www.eurogamer.net/blood-in-the-gutter | | to fool people into thinking it is more capable than it really | is. Math isn't like that. | hey_over_here wrote: | Mwell, the article claims, and points to work that also claims, | that large language models can actually be made to perform | arithmetic well. They need fine-tuning, verification, chain of | thought prompting and majority voting to be combined but the | linked Google blog says that Minerva hit 78.5% accuracy (on the | GSM8K benchmark). | | For me the problem is that we can look at the output and say if | it's right or wrong, but we know what language models do, | internally: they predict the next token in a sequence. And we | know that this is no way to do arithmetic, in the long run, | even though it might well work over finite domains. | | Which is to say, I'm just as skeptical as you are, and probably | even more, but I think it's useful to separate the claim from | what has actually been demonstrated. Google claims its Minerva | model is "solving maths problems" but what it's really doing is | predicting solutions to problems like the ones it's been fine- | tuned on, and those problems are problems stated at least | partly in natural language, not "naked" arithmetic operations. | In the latter, language models are still crap because they | can't use the context of the natural language problem statement | to help them predict the solution. | | Btw, "chain of thought prompting" if I remember correctly is a | process by which an experimenter prompts the language model | with a sequence of intermediary problems. So it's not so much | the model's chain of thought, as the experimenter's chain of | thought and the experimenter is asking the model to help him or | her complete their chain of thought. I have a fuzzy | recollection of that though. | sharemywin wrote: | computers already do math. language models just need to | translate problems into code of some kind that can be run to | get the answer. | | executive function/planning is probably the biggest problem at | this point for ai. | sharemywin wrote: | The point I'm trying to make is LLMs don't need to do | everything just be the glue to other systems. | enord wrote: | Wait what? Glue as in extract high level semantic | representations from _syntatic probabilities_ and pass on | to appropriate domain specific tools? | | This is the glaring hole in LLMs, a paradoxical semantic | incoherence despite impressive sentenial and gramatical | coherence. | | As glue it is so thin as to be potable. | zackmorris wrote: | That's interesting, I hadn't made the connection between | executive function and intelligence. | | I went through a burnout in 2019 that felt like having a | stroke. My brain finally reached such a level of negative | reinforcement after years of failure that it wouldn't let me | work anymore. I'd go to do very simple tasks, everything from | brushing my teath to writing a TODO list, and it was like the | part of my brain that performed those tasks wasn't there | anymore. Or at least, it no longer obeyed if it perceived a | potential reward involved. It was like my motivation got | reversed. I had to relearn how to do everything, despite | knowing that no reward might come for a very long time, which | took at least 6 months before I began recovering. The closest | answer I have is that my brain healed through faith. | | I only bring it up because executive function may be | associated with a subjective experience of meaning. If | there's truly no point to anything, then it's hard to summon | the motivation to string together a sequence of AI tasks into | something more like AGI. | | I guess that's another way of saying that nihilism could be | the final hurdle for AGI to overcome. It's like the human | philosophical question of why there's something instead of | nothing. Or why angels would choose to be incarnate on Earth | to experience a life of suffering when it's so much easier to | remain dissociated. | the_af wrote: | > _language models just need to translate problems into code | of some kind that can be run to get the answer_ | | A huge "just"! Isn't this the magic step? Translating | ambiguous symbols to meaning and combining them in meaningful | ways is a big deal which, apparently, these AI models cannot | do. They can just parrot things. | gamegoblin wrote: | It's already being done and will only get better: https://t | witter.com/sergeykarayev/status/1569377881440276481 | the_af wrote: | I suspect it's not solved, because solving this (beyond | some trick/toy examples) is essentially solving General | AI. | JacobiX wrote: | I'm not so sure about that. Of course computers can do | arithmetic operations, but this is not the same as solving | math problems, proving theorems, etc. Even mathematical | objects are approximated up to an approximation error in a | computer (like a differentiable manifold or a real number). | PaulHoule wrote: | There has been big progress in automated theorem proving | lately | | https://en.wikipedia.org/wiki/Automated_theorem_proving | | you just don't hear about it much because the technology is | not so fashionable today. Also it is more clear what the | limits are, I mean, Turing, Godel, Tarski and all of those | apply to neural networks as well any other formal system | but people mostly forget it. | | Knuth wrote a really fun volume of _The Art of Computer | Programming_ about advances in SAT solvers which are the | foundation for theorem provers | | https://www.amazon.com/Art-Computer-Programming-Fascicle- | Sat... | | Everybody is aware that neural network techniques have | improved drastically in performance, it's much more obscure | that the toolbox of symbolic A.I. has improved greatly. | Back in the 1980s production rules engines struggled to | handle 10,000 rules, now Drools can handle 1,000,000+ rules | with no problems. | sva_ wrote: | > There has been big progress in automated theorem | proving lately | | It doesn't seem like there has been much progress for | anything but FOL? | thwayunion wrote: | The wiki article on automated theorem proving is quite | bad as an overview of the active field; it's more a | historical article about the mid to late 20th century. | Most of the interesting things in automated reasoning | have happened since the naughts, and that article kind of | stops in the 90s | | SMT solvers have gotten quite good over the past couple | decades, there are tons of domain-specific tools (eg in | software and hardware verification), tons of niche | applied decidable or semi-decidable theories (eg various | modal and description logics), a lot of progress on the | proof assistant ("non-fully-automated theorem proving") | paradigm, and so on. | PaulHoule wrote: | It's clear that commonsense reasoning needs to deal with | modals, counterfactuals, defaults, temporal logic, etc. | | It's not hard to add some extensions to logic for a | particular application but a very hard problem to develop | a general purpose extended logic. | | I look at the logic-adjacent production rules systems | which never really standardized some of the commonly | necessary things such as agendas, priorities, defaults, | etc. | IshKebab wrote: | Computers are much much better at all that stuff than | almost everyone too. Try asking Wolfram Alpha to solve | something. Computers have gotten really good at proving | things in the last couple of decades and formal | verification methods are becoming increasingly popular. | | I think sharemywin is probably on to something. It's going | to be _really_ hard for an AI to prove that e.g. x >0 && | x+y <= 1 && y>1 is unsatisfiable, but it's trivial for an | SMT solver. On the other hand it probably isn't that much | of a leap to make an AI that can feed that problem _into_ | an SMT solver. | thwayunion wrote: | _> Of course computers can do arithmetic operations, but | this is not the same as solving math problems, proving | theorems, etc. _ | | Computers can solve math problems and prove theorems; this | remains a significant subfield of Computer Science with | lots of industrial use cases. However, pure machine | learning based approaches toward these problems remain | subpar. | | _> Even mathematical objects are approximated up to an | approximation error in a computer (like a differentiable | manifold or a real number)._ | | Only because it caught on (and in the case of non- | computationally-intensive applications, for purely | historical reasons). For example, Mathematica has Reals and | even functionality for Reals that is literally impossible | to implement for integers [1,2]. There are also precise | characterizations of objects in differential geometry [3]. | You could imagine applying LLMs to these types of programs | a la Copilot, but when you do this you will find yourself | agreeing with Paul Houle's observation that math is harder | to fake than eg art, language, or even glue code for web | apps. | | [1] https://reference.wolfram.com/language/ref/Reduce.html | | [2] https://en.wikipedia.org/wiki/G%C3%B6del%27s_incomplete | ness_... | | [3] https://github.com/bollu/diffgeo | the_af wrote: | > _Computers can solve math problems and prove theorems_ | | But the specification of the problem must be done by a | human, translating to a formalized system that the | software can understand. And if there's a problem in the | formal specification, it's mostly up to the human to | notice and fix; the computer will happily output garbage | or crash or enter an infinite loop. | | So it seems this translation, going from an exploration | of the problem statement, usually in ambiguous terms, to | a formal specification, _and the awareness to possibly | detect whether the answers make sense and the specs were | right_ , is uniquely human. | casey2 wrote: | Counterexample: Shalosh B. Ekhad is a computer who is also | a mathematician. | Sharlin wrote: | Well, you don't _need_ anything else than basic arithmetic | to encode the entirety of, say, ZFC, enumerate every | proposition in it, and halt iff you find a proof of | whatever theorem you 're after. It just might take a | while... | sharemywin wrote: | Online Integral Calculator Solve integrals with | Wolfram|Alpha | | https://www.wolframalpha.com/calculators/integral- | calculator... | sva_ wrote: | Now try to make a computer prove that there are no | natural numbers a,b,c; so that a^n + b^n = c^n for any n | > 2. | Sharlin wrote: | Shifting the goal posts a bit, aren't we? | sharemywin wrote: | I guess it depends on the outcome your worried about. | Super intelligence or machines that replace the average | office worker. | PaulHoule wrote: | That's not a bad approach, necessarily. | | There is a fairly simple program in | | https://www.amazon.com/Paradigms-Artificial-Intelligence- | Pro... | | that solves word problems using the methods of the old AI. | The point is that is is efficient and effective to use real | math operators and not expect to fit numbers through the | mysterious bottleneck of neural encoding. | lupire wrote: | Floating point isn't relevant here. | | The problem is that human language is approximate and correct | math is not, so pattern matching on prose text is doomed. AI | trained on exact math does a lot better. But that's not fully | generic so fails the weird GPT goal of modeling all of human | intelligence through prose. That's not how people solve math at | all. | | GPT's "Superficially plausible but wrong" math is actually | pretty good match for non-expert bad-at-math average human | behavior. | zozbot234 wrote: | > GPT's "Superficially plausible but wrong" math is actually | pretty good match for non-expert bad-at-math average human | behavior. | | Relevant blog post: https://www.greaterwrong.com/posts/YhgjmC | xcQXixStWMC/artific... "The best experts in the field | estimate it will be at least a hundred years before | calculators can add as well as a human twelve-year-old." | PaulHoule wrote: | I like Yudkovsky parodying himself there although I still | don't know if he has a sense of humor or not. | CommieBobDole wrote: | Also Excel is terrible at encoding MP3s. | | It's a language model; why would we expect it do math or try to | somehow shoehorn math into the model? Do the language centers of | our brain do math? | | If something approximating AGI is going to happen, it's going to | be a lot of models tied together with an executive function to | recognize and send things to the area that's good at working with | them. | dr_dshiv wrote: | Well, because we want rational language models. Something with | a sense of truth. | | Math is not irrelevant--and I'm sure it's a solvable problem | with language models. | CommieBobDole wrote: | But if it's rational and has a sense of truth, then it's AGI. | Which I don't think is impossible or even unattainable within | a reasonable amount of time, but we're .001% of the way | there, not 50% or 75%. | | These models are fascinating, but the problem 'a lot of the | things this model generates lack any semantic meaning' is | inherent and likely insurmountable without connecting the | model to other, far more complex models that haven't been | built yet. | | We are at the level where our models can consistently | generate blocks of text with full sentences in them that make | grammatical sense. Which is pretty cool. | | But the next step is being able to consistently generate full | sentences that make grammatical sense and usefully convey | information. And while the current models do that a lot of | the time, they don't do that all of the time because they | don't and can't know the difference without essentially being | a different thing. Because to do that consistently, we need | an "understanding what things mean" model. Which is many | orders of magnitude larger and more difficult than a text | generator. | [deleted] | thwayunion wrote: | What are some (non-nefarious) applications of generative | language models that produce language which isn't constrained | by some sort of rationality or directed by some sort of high- | level goal? | | The point isn't the math. The point is that, in math and | similar disciplines, it's harder to get away with producing | mostly undirected gibberish that happens to have some imputed | meaning. The point is "use language to do something where it's | easy to verify correctness and generating infinite amounts of | synthetic data is trivial" | | If a language model can't even do high school algebra, then I | have a lot less confidence that it will ever be useful for | customer service applications or any other number of potential | applications outside of propaganda, advertising, and spam. | hey_over_here wrote: | > It's a language model; why would we expect it do math or try | to somehow shoehorn math into the model? | | Language models can do math, or anyway arithmetic. That's | because language models are trained to predict the next token | in a sequence and an arithmetic operation can be represented as | a sequence of tokens. | | For example, see Figure 3.10 on page 22, here: | | https://arxiv.org/abs/2005.14165 | | The only problem is that language models are crap at arithmetic | because they can only predict the next token in a sequence. | That's enough to guess at the answer of an arithmetic problem | some of the time but not enough to solve any arithmetic problem | all of the time. | | More generally, the answer to your question is in the same | Figure 3.10 I've referenced above. OpenAI (and others) have | claimed that their large language models can do arithmetic. So | then people tested the claim and found it to be a bag of old | cobblers. | | Hence the article above. Nobody's trying to "shoehorn" anything | anywhere. It's just something that language models can do, | albeit badly. | CommieBobDole wrote: | Right, but what you're describing is 'not being able to do | math'. Like, if I've memorized a multiplication table and can | give you any result that's on the table but can't multiply | anything that wasn't on the table, I can't do multiplication. | hey_over_here wrote: | It depends on how you see it. I agree with you, generally, | but in the limit, if you memorised all possible instances | of multiplication, then yes, you could certainly be said to | know multiplication. | | I've not just come up with that off the top of my head, | either. In PAC-Learning (what we have in terms of theory, | in machine learning) a "concept" (e.g. multiplication) is a | set of instances and a learning system is said to learn a | concept if it can correctly label each of a set of testing | instances by membership to the target concept with | arbitrary probability of error. Trivially, a learner that | has memorised every instance of a target concept can be | said to have learned the concept. All this is playing fast | and loose with PAC-Learning terminology for the sake of | simplification. | | The problem of course is that some concepts have infinite | sets of instances, and that is the case with arithmetic. On | the other hand, it's maybe a little disingenuous to require | a machine learning system to be able to represent infinite | arithmetic since there is no physical computer that can do | that, either. | | Anyway that's how the debate goes on these things. I'm on | the side that says that if you want to claim your system | can do arithmetic, you have to demonstrate that it has | something that we can all agree is a recognisable | representation of the rules of arithmetic, as we understand | them. For instance, the axioms of Peano arithmetic. Which | though is a bit unfair for deep learning systems that can't | "show their work" in this way. | abrax3141 wrote: | The situation is actually much worse for science, or any moving | field. This models are by design and necessity historical. So | that if, for example, the FDA issues a drug approval overnight, | The model camp follow sudden changes in a "reasoned" why. | make3 wrote: | The article is actually about how they are getting good at it :) | mavu wrote: | Talking about this stuff would be so much easier if we stopped | calling those software "AI". | | It is a machine learning algorithm. It is an electronic Parrot. | | thats it. And suddenly no one will wonder "OH MY WHY CANN IT NOT | DO MATH< IT SMART?!?!" | mjburgess wrote: | How much of this is just "AI is bad at everything", but in the | math case, it's easier for the lay person _to tell_. | | It's all just passable garbled nonesense that the reader (goes to | lengths) to interept based on _their_ prior knowledge, which is | not expressed in the syntax of what these systems output. | | In the case of mathematics, we're far less willing to "BS away" | the interpretive failures. But if we were equally demanding, | likewise, all prose generated by these systems isnt AI "getting" | anything either. | | Pass a film reel thru' a shredder and an art student would still | call it a film. Pass math thru' and a mathematician wont. This | says more about our ability and inclination to make sense out of | nonesense when in apparent communicative situations (since, when | speaking to a person, this actually improves our mutual | understanding). | | So, how much of AI is just hacking people's cognitive failures: | (1) people's willingness to attribute intention; (2) people's | willingness to impart sense "at all costs" to apparent | communication; and (3) "hopeium". | woah wrote: | Have you ever used Github CoPilot? It does a lot of useful | work, automating away rote typing in programming. Have you | tried Dall-E or Stable Diffusion? They make good looking | images. This comment seems completely unmoored from where the | state of the art is right now. | civilized wrote: | I agree. It's possible to point out the clear limitations of | current AI without being oblivious to the huge, indisputable | advances that have occurred. | | People thought it might take centuries for a computer to | defeat a top human in Go. Then deep learning showed up and a | few years later it's the opposite. | | A lot of the things deep learning methods are doing now are | things no one had any idea how long research would take to | achieve, or if they were even possible. | | Personally, I think we are currently hitting some walls that | might take a while to climb before we get to AGI, but I am | _very_ impressed at the recent progress. | TuringTest wrote: | Math follows a completely different approach with respect to | how machine-learning AIs do their thing. | | Reason derives its strength in having a few primitives and | creating new assertions through the transformation of symbols | by following precise rules (which is how algorithms work). | | In ML-based AIs, everything is imprecise and probabilistic, | and this kind of generation gets its strength from building | recognizable from utterly imprecise inputs and training - | quite the opposite of how logic and reason evolve. Now, | "classic" AI was a powerful way to derive new knowledge, and | automatic theorem proving is a strong discipline; but the | recent breakthroughs in AI are not directly applicable to | classic techniques. | | Do you know what machine-learning AIs could be good for? | Generating "insight" in problem solvers for guiding the | theorem demonstrations through the proof search space, trying | to find the best sub-spaces to explore. If there's a way to | create human-like general AI, it will likely combine both | kinds of generation - the rational methods of symbolic logic | and the "irrational" statistical methods of ML. | zmgsabst wrote: | Automated theorem proving is the same problem as "complete | and label the diagram", which image generation is okay at. | | Work in progress for sure, though. | mjburgess wrote: | sure, but co-pilot is mostly just copying code (see, for | example, the issue with it producing quake source code). | | If you think of AI as a dial from sample(data) to mean(data), | then as the dial is turned towards the mean() you get more | "generic" results, but also more garbled ones. | | Copilot is more like a search engine, having turned the dial | more towards sample(). | | The real invention of the NN is simply to provide that dial | in a trainable way. | | The only change to the "state of the art" is the size of the | weights, and how long they take to train. This "advancement" | is no more impressive than google indexing more webpages. | | There has been no step-change advancement in AI in, perhaps, | 50 years. All we see today is a product of hardware, in | GPU/CPUs able to compress TBs of data into c. 300GB of | weights. And likewise, the internet to provide it and SSDs to | hold it. | | The "magic" of AI is no more the magic of wikipida, here: | copilot is good only because million+ programmers made github | good. | | It's still little more than a fancy search. | woah wrote: | > It's all just passable garbled nonesense that the reader | (goes to lengths) to interept based on their prior | knowledge, which is not expressed in the syntax of what | these systems output. | | > It's still little more than a fancy search. | | I feel like the goalposts have been moved between your two | comments. CoPilot is obviously not producing garbled | nonsense, and it's also not just printing the top result | from StackOverflow. It is producing code that references my | variables, does the right thing 50% of the time, and | usually compiles. | | One of the nice little things is error messages- when I | type `if (!foo) { throw ... ` CoPilot is able to complete a | nicely formatted and descriptive error message from its | understanding of my code. It's not garbled nonsense, and | it's not just a search engine. | | Does AI deserve the hype it sometimes gets? Not yet. But I | think you're going to have to start digging a little deeper | for your commentary. | planetsprite wrote: | Even if AI got to the point of perfectly passing every | expert-level Turing test your degree of rigor as to what | "thinking" is would never truly permit any belief of AI | having struck the golden nugget of intelligence. | | Imagine if we were all self-replicating computers, and | certain members of this silicon race began experimenting | with making creatures with carbon macro-molecules to create | organic intelligence, you could make the same claim in the | other direction: | | "There has been no step-change advancement in Organic | Intelligence in, perhaps, 50 years. All we see today is a | product of cell count, in neurotransmitter chemistry able | to compress TBs of experiences into c. 300B neurons." | Marazan wrote: | Dall-E produces good looking images within certain | parameters. | | When you are in its bounds it seems magical, once you go | outside it seems like a weak joke. | | And many of the reasons it is bad outside its sweet spot are | fundamental to how it works not a flaw that can be iterated | away. | dimmuborgir wrote: | AI is bad at music also. Even the state of the art transformer | models can't produce more than a few seconds of coherent | melodic phrases. | [deleted] | vladf wrote: | Have you heard the piano continuations of AudioLM? | | https://google-research.github.io/seanet/audiolm/examples/ | bloep wrote: | Indeed, there is lots of denial or ignorance in this thread | (ignorance in the technical sense). AudioLM already | produced impressive results and it's a tiny fraction of | what is already possible because performance simply | improves with scale. One can probably solve music | generation today with a ~$1B budget for most purposes like | film or game music, or personalized soundtracks. This is | not science fiction. | p1esk wrote: | I don't see a lot of progress in AudioLM compared to | results from 2018: https://storage.googleapis.com/magenta | data/papers/maestro/in... | | What's more interesting and concerning - listen carefully | to the first piano continuation example from AudioLM, | notice the similarity of the last 7 seconds to Moonlight | sonata: https://youtu.be/4Tr0otuiQuU?t=516 | | I'm afraid we will see a lot of this with music | generation models in the near future. | bloep wrote: | There are quite simple tricks to avoid repetition/copying | in NNs, e.g. by (1) training a model to predict the | "popularity" of the main model's outputs and penalizing | popular/copied productions by backpropping through that | model so as to decrease the predicted popularity, or (2) | by conditioning on random inputs (LLMs can be prompted | with imaginary "ID XXX" prefixes before each example to | mitigate repetitions), or (3) by increasing temperature | or optimizing for higher entropy. LLM outputs are already | extremely diverse and verbatim copying is not a huge | issue at all. The point being, all evidence points to | this not being a show stopper if you massage these | evolutionary methods for long enough in one or more of | the various right ways. | p1esk wrote: | I'm not sure what you mean by "backpropping through that | model so as to decrease the predicted popularity". During | training, we train a model to literally reproduce famous | chunks of music exactly as they are in the training set. | We can also learn to predict popularity at the same time, | but we can't backpropagate anything that will reduce | popularity, because this would directly contradict to the | main loss objective of exact reproduction. | | Having said that, I think the idea of predicting | popularity is good - we can use it for filtering already | generated chunks during post-training evaluation phase. | | I don't think the other two methods you suggest would | help here, we want to generate while conditioning on | famous pieces, and we don't want to increase temperature | if we want to generate conservative, but still high | quality pieces. | | It's true that we (humans) are less sensitive to | plagiarism in the text output, but even for LLMs it is a | problem when it tries to generate something highly | creative, such as poetry. I personally noticed multiple | times a particular beautiful poetry phrases generated by | GPT-2 only to google it and find out they were copied | verbatim from a human poem. | phillipharr1s wrote: | Pretty sure the first continuation is a famous piece with a | few notes messed up. Can't remember the name. Honestly it | only sounds marginally better than the old markov chain | continuations. | macrolocal wrote: | Yep, Moonlight Sonata (mov. 3) no less. Talk about over- | fitting! | vladf wrote: | Isn't that as good as it gets? The whole point of the | continuations is that given a short leading prompt from a | real piece that it should continue it realistically. | | It didn't get to train on the test set, if that's what | you're implying, and I find it hard to believe the | assertion that continuations are copies of the train set | (if that's your claim). | p1esk wrote: | It definitely copied a piece of Moonlight sonata in the | last 7 seconds of the first continuation sample: | https://youtu.be/4Tr0otuiQuU?t=516 | holub008 wrote: | Interestingly, the original piece is a later Beethoven | Sonata, Op. 31 No. 3. The model has its styles down! | https://youtu.be/P-Q5aBAw-T4?t=78 | Der_Einzige wrote: | That's wrong, and shows how ignorant you are of SOTA | techniques for music generation. They are far ahead of that. | denton-scratch wrote: | It doesn't surprise me that an AI model for language can't | grok maths or music. I can't see how a language model can map | to maths. Hell, I don't even know how to describe music in | words. It's possible to articulate _some_ maths in words, but | that often involves using words with unexpected definitions. | aaroninsf wrote: | AI can be quite good at music, | | but yes there is not yet at on-demand button rendering from a | text prompt of bitstreams encoding composed performed and | mastered music. | CactusOnFire wrote: | AI is bad at Audio. AI can do MIDI fine. | dwringer wrote: | MIDI is extraordinarily expressive and is likely used to | sequence a large majority of music produced within the last | three decades. A lot of the instruments you hear are | synthesizers or samplers running directly from MIDI. There | is a lot more to what MIDI can do, and is used for, than | the conception most people have from "canyon.mid" or old | website background music. If an AI can do MIDI just fine | then it's an extremely small leap to doing audio just fine. | p1esk wrote: | _If an AI can do MIDI just fine then it 's an extremely | small leap to doing audio just fine._ | | Unfortunately this is not true. It takes a huge amount of | human effort to make MIDI encoded music sound good. The | difference between MIDI and raw audio music generation is | the same as the difference between drawing a cartoon and | producing a photograph. | | To clarify, yes MIDI can be expressive, but what's being | generated when people say "AI generates MIDI music" is | basically a piano roll. | causi wrote: | Which is a real shame. AI-powered restoration of poor- | quality audio would be highly useful. | aaroninsf wrote: | That particular niche has had some pretty amazing | successes already. It's coming. | | We can't produce arbitrary media streams with many "stack | layers" of meaning and detail yet, but we can do a lot of | specific instrumental transformations... | | Vaguely relevant: https://koe.ai/recast/ | stephencanon wrote: | Which is extra funny, because GOFAI models (e.g. David Cope's | work) were doing a pretty OK job back in the 1990s! | mjburgess wrote: | I think if we replaced "AI" with "taking averages over | subsets of historical examples", then there'd be no mystery | for when "AI" will be good or bad at anything. | | Would we expect a discrete melodic structure to be | expressible as averages of prior music? No. | yeasurebut wrote: | That's what a musician does. They make short loops and loop | them. | | This reads like someone who knows sheet music and theory but | does not listen to music. It's repetition of short phrases | over and over. | | I'm not really sure what people expect of general AI trained | on human generated outputs. It can't make up anything | anything "net new" only compose based upon what we feed it. | | I like to think AI is just showing us how simple minded we | really are and how our habit of sharing vain fairy tales | about history makes us believe we're masters of the universe. | dimmuborgir wrote: | Those models are not trained on short loops. They are | trained on whole songs just like image generation models | are trained on whole images. And yet they struggle to | repeat sections, modulate to a different key, create | bridges, intros and outros. After a few seconds of | hallucinating a melodic line they simply abandon the idea | and migrate to another one. There is no global structure | whatsoever. | yeasurebut wrote: | Musicians don't spit out an album in one sitting and | they're highly trained in theory. They get bored and | tired of a process and take breaks. They come up with an | album of loops composed together over time. | | AIs state will forever be constrained to the limits of | human cognition and behavior as that's what it's trained | on. | | I read published research all year. Circular reasoning. | Tautology. It's all over PhD thesis. | | There's no "global structure" to humanity. Relativity is | a bitch. | | Seeing the world through the vacuum of embedded inner | monologue ignores the constraints of the physical one. | It's exhausting dealing with the mentality some clean | room idea we imagine in a hammock can actually exist in a | universe being ripped asunder by entropy. | | It's living in memory of what we were sold; some ideal | state. Very akin to religious and nation state idealism. | mjburgess wrote: | I think it's deeply depressing that AI has been sold as | something even capable of modelling anything humans do; | and quite depressing that this comment exists. | | "AI" is just taking `mean()` over our choice of encodings | of our choice of measurements of our selection of things | we've created. | | There is as much "alike humans" in patterns in tree bark. | | AI is an embarrassingly dumb procedure, incapable of the | most basic homology with anything any animal has ever | done; us especially. | | We are embedded in our environments, on which we act, and | which act on us. In doing so we physically grow, mould | our structure and that of our environment, and develop | sensory-motor conceptualisations of the world. Everything | we do, every act of the imagination or of movement of our | limbs, is preconditioned-on and symptomatic-of our | profound understanding of the world and how we are in it. | | The idea that `mean(424,34324,223123,3424,....)` even has | any revelance to us at all is quite absurd. The idea that | such a thing might sound pleasant thru' a speaker, | _irrelevant_. | | This is a product of i dont know what. On the optimist | side, a cultish desire to see Science produce a new | utopia. On the pessimisst side, a likewise delusional | desire to see Humans as dumb machines. | | What a sad state! | pessimizer wrote: | I lack your confidence, and find it a bit religious. | | > The idea that `mean(424,34324,223123,3424,....)` even | has any revelance to us at all is quite absurd. | | Most of what I say to anyone is exactly this. | | When I'm about to give anyone any information, I look | back at all of the relevant past information that I can | recall (through word and sensory association, not by | logic, unless I have a recollection of an associated | internal or external dialog that also used logical | rules.) I multiply those by strength of recollection and | similarity of situation (e.g. can I create a metaphor for | the current situation from the recalled one?). I take the | mean, then I share it, along with caveats about the | aforementioned strength of recollection and similarity of | situation. | | This is what it feels like I actually do. Any of these | steps can be either taken consciously or by reflex. It's | not hidden. | | > I think it's deeply depressing that AI has been sold as | something even capable of modelling anything humans do | | This is a bizarre position. All computers ever do is | model things that humans do. All a computer consists of | is a receptacle for placing human will that will continue | to apply that will after the human is removed. They are a | way of crystallizing will in a way that you can sustain | it with things (like electricity) other than the | particular combination of air, water, food, space, | pressure, temperature, etc. that is a person. An overflow | drain is a computer that models the human will. An | automatic switch/regulator is the basic electrical model | of human will, and a computer is just a bunch of those | stitched together in a complementary way. | mjburgess wrote: | You're an animal. You've no idea what you do, and you're | using machines as a model. Likewise, in the 16th C. it | was brass cogs; and in anchient greece, air/fire/etc. | | You're no more made of clay & god's breath, as you are | sand and electricy. | | You're an oozing, growing, malluable organic organism | being physiologically dynamically shaped by your sensory- | motor oozing. You're a mystery to yourself, and these | self-reports, heavily coloured by the in-vogue tech _are | not science_ , they're pseudoscience. | | If you want to study how animals work, you'd need to | study _that_. Not these impoverished metaphors that | mystify both machines and men. No machine has ever | acquired a concept through sensory-motor action, nor used | one to imagine, nor thereby planned its actions. No | machine is ever at play, nor has grown its muscles to be | better at-play. No machine has, therefore, learned to | play the piano. No machine has thought about food, | because no machine has been hungry; no machine has cared, | nor been motivated to care by a harsh environment. | | An inorganic mechanism is nothing at all like an animal, | and an algorithm over a discrete sequence of numbers with | electronic semantics, is nothing like tissue development. | | What you are doing is not something you can introspect. | And you arent really doing that. Rather, you've learned a | "way of speaking" about machine action and are back- | projecting that onto yourself. In this way, you're | obliterating 95% of the things you are. | saghm wrote: | > How much of this is just "AI is bad at everything", but in | the math case, it's easier for the lay person to tell | | Honestly, even as someone generally pretty dismissive of the AI | hype, I'm not sure you can go that far. The whole reason we | have specific mathematical notation is that human languages | often are not super great at dealing with it, and English in | particular is pretty abysmal for being both unambiguous and | precise (and I'd be surprised if language models didn't end up | suffering from biases analogous to how many image recognition | AI models have been found to not deal well with a diverse set | of human appearances). We don't teach math the same way we | teach English, and we certainly don't expect people to be | experts at teaching both, so why would we expect an AI model | designed for language to be able to do math? | planetsprite wrote: | Language models aren't built for math. Their | improvement/training cycles aren't sensitive to the exactness | and rule-based nature of mathematical language, plus there are | probably a lot of bad/misleading examples of math in the source | data. | | You'd have to be unrealistically pessimistic to call what GPT-3 | and other huge language models produce "nonsense". | visarga wrote: | It's not that they were not built for math, but more like | verification is hard. But it's hard for humans as well. A | large generative model + a fast verifier could do wonders. | | AlphaGo was built on that - the model can propose moves, but | you can verify who won in the end. There are some code | generation models that write their own tests as well, or use | externally provided tests to verify their solutions. The | DeepMind matrix multiplication algorithm was also "learning | from verification" of generated solutions, because it's | trivial to do that. In general verification remains an open | problem. | spywaregorilla wrote: | I disagree. It is that they were not built for math. While | brain analogies are shittier than most people assume, this | is like trying to do math in your head without being | allowed to think through calculations. | burlesona wrote: | I genuinely wonder if we will find there are some inherent | tradeoffs to knowledge and understanding such that if we ever | have machines that can "think like humans" they would in practice | run into human-like cognition limits: ie such machines would be | "bad at math" in the same way humans are "bat at math" compared | to conventional computers. | ryandvm wrote: | Indeed. I posit that as we get closer and closer to simulating | how the human brain works in the pursuit of artificial | intelligence, we're going to start seeing more and more of the | same "bugs" that humans have (logical fallacies, susceptibility | to illusions, mental illness, etc.) | | You think your job sucks now, just wait until you're dealing | with the general AI over on the UX team that's trying to get | your ass fired because it's fostering a 3 year old grudge over | that time you said Chappie was stupid. | Der_Einzige wrote: | At first, I thought it was surprising that a language model | with a restricted vocabulary (e.g. banning the letter "E") | acts significantly more "mentally ill", and then I thought | about how I would come across if forced to use that | constraint all the time, and I realized that maybe I'd appear | mentally ill too! | | You can play with LMs with constrained vocabularies here: | https://huggingface.co/spaces/Hellisotherpeople/Gadsby | blackbear_ wrote: | That's an interesting thought. However it's not cognitive | limits that make humans bad at math, it's just a "hardware" | issue: a human with a piece of paper is much better at math. | aaaaaaaaaaab wrote: | Even if neural networks were fundamentally incompatible with | conventional computation, I don't see why you couldn't augment | a neural network with a conventional ALU to do the numerical | computations. This is exactly what humans do with pencil and | paper - it's just a bit too slow. | auganov wrote: | Either the language model would need to know what it's doing | or the host program would have to know what the AI is doing. | Both seem out of reach. The latter seems more doable since | you could hack something up for simple scenarios, but you'd | effectively have to match the capabilities of the neural | network in a classical way to handle every case (which would | render using a neural net moot). | WalterBright wrote: | People struggle to get math, too. | _0ffh wrote: | And no wonder, as they correspond much closer to a Kahneman | system 1 than system 2, where _we_ do most of our math. | [deleted] | bionhoward wrote: | I bet vision transformers understand math better because it's | somewhat artistic | abrax3141 wrote: | More generally, they struggle to get thing right. They're great | at grammatical confabulation, but when you need a correct answer, | or a correct drug recommendation, ask an expert. | pessimizer wrote: | That's because they're not modelling anything. The shocking thing | about current AI models is that just sort of repeating and | copying from memory what you've heard and seen gets you 97% of | the way to imitating a person.* They still need to generate | actual models somewhere to create consistency; so many generated | images with one eye completely different from the other, or three | arms, or fingers that grow into their cellphones. | | If you solve this, you've probably solved almost anything in the | simulation field. I have no confidence that the solution will | even be complicated. Information consumed needs to be used to add | to some sort of model, and that model always needs to be used as | part of input. The complicated part would be to make that base | model able to modify itself reasonably based on input, to | tolerate constant inconsistency, and to constantly refine itself | towards consistency i.e. ruminate. | | I think a huge difference (which I think was approached through | theories of embodied cognition) is that people start with a model | (or the ability to create a model) of themselves. We can apply | that model to other things and use it both to change how we | ourselves behave, and how we speculate about the invisible states | of other things. It's not for nothing that we can (and must) | anthropomorphize anything. | | ----- | | * Which was huge towards the confirmation of my belief that this | is all people do 97% of the time. | mgraczyk wrote: | This is factually wrong, both in terms of quantity and quality. | | Current AI models are not "just sort of repeating and copying | from memory". This is just an incorrect characterization of how | they work and how they perform. | | AI skeptics often say things like this then backpedal with | something like "Well they aren't really repeating what they | heard, but their generative model is just a slightly more | sophisticated version of repeating what they've heard." But | this weaker claim is also true of humans. It's certainly the | case that >97% percent of what humans say is "just repeating | and copying" in the same sense. | pessimizer wrote: | > Current AI models are not "just sort of repeating and | copying from memory". This is just an incorrect | characterization of how they work and how they perform. | | You say this, but don't explain how. Because this is exactly | what they are doing. | | > AI skeptics often say things like this | | I'm not really an AI skeptic. I think that we're very close | to AI being indistinguishable from people. There are clearly | problems that need to be solved, but I think the hardest | problem was _accepting the fact that humans are largely just | copying_ and realizing that would be enough to get you 97% of | the way there, especially if you gave a machine far more to | copy than a human could consume. | | > then backpedal with something like "Well they aren't really | repeating what they heard, but their generative model is just | a slightly more sophisticated version of repeating what | they've heard." But this weaker claim is also true of humans. | It's certainly the case that >97% percent of what humans say | is "just repeating and copying" in the same sense. | | Maybe I'm not expressing myself clearly, but it seems that | you're just repeating my comment with a sneer. Agreeing | angrily? | mgraczyk wrote: | I'm disagreeing with the language you are using to | characterize models. "copying from memory" implies that | there is something being copied, and a memory that you are | copying it from. I am pointing out that LLMs do not do | this. It's not how they work. | | If you polled 1M random English speakers randomly and asked | them whether or not a system that "just sort of repeating | and copying from memory" could produce completely novel | answers in response to completely novel questions, I | suspect that the overwhelming majority would respond by | saying no. | | Similarly if you asked 1000 people working on LLMs whether | they work by "copying from memory", I suspect nearly all | would say no. It would be accurate to say they are | "generating text via a probabilistic model of language, | which is encoded in the weights of a neural network", but | there really is just no sense in which the models are | "copying" anything. | | That being said, these models do "copy" some text in the | sense that they can reconstruct some strings from their | training input. For example every LLM I have played with | can recite the first few paragraphs of A Tale of Two Cities | verbatim. But that's a capability they have _in spite of_ | their actual design, not because of it. | pessimizer wrote: | > I'm disagreeing with the language you are using to | characterize models. "copying from memory" implies that | there is something being copied, and a memory that you | are copying it from. I am pointing out that LLMs do not | do this. It's not how they work. | | Then we're arguing about the semantics of the word | "copy." That is not an interesting argument when you know | exactly what I mean and can express it clearly. | | edit: If it helps, either substitute your description in | whenever I say 'pretty much copy' or change the word | "copy" to whatever word you want to use. But even though | I can't reproduce the opening paragraph to A Tale of Two | Cities verbatim, I can certainly write something that is | "copying" it without doing that, and anyone who was | familiar with the book and read my paragraph would agree | with me. | mgraczyk wrote: | It is semantics, but that was your whole point no? | | > That's because they're not modelling anything | | If we agree on "how LLMs work", then how can you claim | that they aren't modeling anything? They are modeling | language, and while it's unlikely current paradigms will | be proving new mathematical truths, it's completely | plausible to me that bigger models will be able to handle | simple math word problems like those in the article, | precisely because LLMs can model the "Alice", "Apple", | and "Bob" entities. | sebastialonso wrote: | can you actually share what "current AI models" are then? Not | trying to be rude, but you just said "na ah" and then refused | to argument any position. | mgraczyk wrote: | Current LLMs are "modeling" something according to pretty | much any sense of the word "model". | | In the technical, computational linguistics sense, LLMs are | language models that give a conditional posterior | distribution over sentences. Given some (constrained) | context, the model tells you the posterior distribution | over sentences in or around that context. | | In the nontechnical, layman sense of the word, they are a | system that is used as an example of language. LLMs imitate | language by generating new sentences. They are a "model" in | the same way that an architectural model is a model, or in | the same way that a statue is a model of a human. | | The other point I disagreed with is the characterization | that LLMs "just sort of repeat and copy from memory". I | went into more detail about that in other replies. | [deleted] | jxy wrote: | > "When multiplying really large numbers together ... they'll | forget to carry somewhere and be off by one," says Vineet | Kosaraju, a machine learning expert at OpenAI. Other mistakes | made by language models are less human, such as misinterpreting | 10 as 1 and 0, not ten. | | So the expert has never seen a seven year old struggling in | adding two single digit numbers together? Did the expert learn 1 | and 0 being 10 first and learn to speak second? | | > The MATH group found just how challenging quantitative | reasoning is for top-of-the-line language models, which scored | less than 7 percent. (A human grad student scored 40 percent, | while a math olympiad champ scored 90 percent.) | | Is this that surprising? How would our ieee editor score on the | same problem set? | Buttons840 wrote: | Are there any general purpose models that are good at learning | math? I mainly know basic feed-forward neural nets, but I don't | think they do well outside their training region. Math, of | course, has an infinite training region. | alan-crowe wrote: | I attempted to create a general purpose model for the exact | version of the "what comes next problem." It enumerated | primitive recursive functions, trying them out as it went. The | limitation to primitive recursive functions was convenient | because they always terminate. I didn't have to filter out the | functions that ran for too long. (or do I?) | | The enumeration inherently includes functions of several | variables, so I wasn't restricted to examples such as 1->1, | 2->4, 3->9, 4->16 etc. | | I could try it out on examples such as (1,2)->3 (2,1)->3 | (0,2)->2, etc. Perhaps with enough it would "learn to add" = | find a primitive recursive function that did addition. | | I got as far as finding the first problem. The enumeration | technique that I used was effectively doing a tree recursion, | like that function for computing Fibonacci numbers that bogs | down because Fib(10) is computing Fib(5) lots of times. I had a | lot of numbers that coded for the identity function, lots of | numbers that coded for the first few functions, making the | whole thing bog down, trying the same few functions over and | over under different numerical disguises. | | I thought that I could see my way to fixing this first problem. | Have some way of recognizing numbers that give forms that give | the same function. I guessed that I could approximate this by | saying that if two functions give the same value on a variety | of arguments they are probably the same. Then I parameterise | this criterion and tune. That opens the way to creating a | consolidated enumeration, analogous to fixing the tree | recursive fibonacci function by memoization, except trickier. | | But my health is poor and I ran out of energy. | | Also, I have a guess for the second problem. What happens if I | fix the first problem and my enumeration reaches decently | complicated primitive recursive functions. While they will all | terminate, some might run for far too long, causing the process | to bog down. Rejecting them on the basis of limiting the run | time might work well. We are happy to only learn reasonably | effect functions for doing maths. | | It is a fun idea and I encourage others to have a go. | geoduck14 wrote: | From my (limited) experience with the advanced ML models, they | can "do basic math", but they make amateur mistakes with basic | things - which indicates they _don 't actually know addition_, | but they are good at looking at patterns in existing language. | | I would assume that state-of-the-art ML models could "convert a | word problem into an equation", then feed _that_ equation into | a 30 year-old graphing calculator to "do the math" | | The fact that no one has done this is an indicator that "there | are more important things to work on", and it is just a matter | of time that someone connects the two together | MarkPNeyer wrote: | This seems so much like humans that it makes me think lots of | people are learning math with an ML-like approach instead | of... whatever the heck people like engineers and | mathematicians are doing. | vidarh wrote: | I wonder how these language models would do if we tried to | teach them maths the way schools do: Feed them explanations | first, then endless sequences of toy problems, see which | they got wrong and feed them corrected examples back in. | | I'm not at all surprised they don't do well at maths, | because while there are maths texts online, I doubt there | is _enough_ material to give these models the same | experience of repetition and reinforcement to help | sufficiently generalise an understanding of the underlying | rules. | lupire wrote: | Generating solved math problems is trivial, like making | AlphaZero play itself in chess. Sparse Data is not the | problem. Refusing to use it is. | vidarh wrote: | I don't think it's so much a refusal, as that it's not | been a sufficient priority for anyone before. As the | article points out there are now a few training sets | which includes math problems, and models which do well on | them. But the remaining problems seems to be with basics | which humans tends to learn to do consistently with a lot | of repetition, and it'd be interesting to see those | datasets extended to the very simple. | idealmedtech wrote: | Anyone can do higher level math, the problem is that math | education is generally done by people who see math as a | tool for computation, rather than a study of deep | connections bordering on philosophy, and beautiful insights | resembling poetry. I've been in arguments before where | someone didn't believe me that the underpinnings of modern | philosophy are essentially the same as math! | | If the teachers don't love math, how can we expect students | to? | lupire wrote: | What you describe is exactly what state of the art has done. | They even lied and said it was "solving math problems" by | calling numpy methods. | the_af wrote: | > _" convert a word problem into an equation"_ | | Isn't this a huge step? It's not a minor detail remaining to | be solved, but possibly the largest step! | neoneye2 wrote: | There is "LODA", which uses genetic algorithms, that | continuously mutates existing math programs until discovering | something new. It uses OEIS as training data, around 350k known | integer sequences, such as primes/fibonacci. Around 100k | programs have been mined so far. | | https://loda-lang.org/ | | I'm a contributer to LODA. | | LODA runs on CPU. It doesn't use GPU. If you have spare | computer, then please consider contributing with the mining. | Your contribution helps. | | https://boinc.loda-lang.org/loda/ | xiphias2 wrote: | It is a great sign that we are building AI in the right | direction. Before building artificial human intelligence, it | makes sense to get to the intelligence level of a mosquito or | fly, then go to more intelligent animals in later iterations. | | As most of the human knowledge is encoded in videos, getting | better at understanding / generating videos will clearly get us | closer to make computers understand the world. | yshrestha wrote: | Language models can generate a Python function that does the math | perfectly. | | I bet you would get better results if you tweaked the prompt to | say "Generate a Python program that solves X math problem" and | then just ran the resulting Python script. | | It does not need to be AGI to be useful. | lupire wrote: | You mean "generate a Python function that _calls a library_ | that does math perfectly, right? | hgomersall wrote: | In the limit, it's going to design an AI to write some python | to call a library that does the math perfectly. | thwayunion wrote: | Unlike 99.99% of human programmers, who can and often do | implement everything in sympy/numpy from scratch ;-) | yshrestha wrote: | Exactly! Hey it gets the job done :) | | Software is just a tall wedding cake of abstractions built on | top of abstractions. | swyx wrote: | you can also tell the model that it doesnt know how to do math, | and _it respects that_ | | https://twitter.com/goodside/status/1568448128495534081 | Kim_Bruning wrote: | That is also a very valid and interesting thing to do. | | But it's also quite interesting to see how the model would do | "by itself". All kinds of interesting lessons to be learned! | yshrestha wrote: | Yeah! It is interesting to try and figure out "what" the | model is actually learning. It is a valid thread of | scientific inquiry. | mlajtos wrote: | Exactly, we need computer-equipped neural nets. Models need to | use traditional UIs (including programming languages) and then | we can talk about how to stop them. :) | lairv wrote: | That could only generate constructivist [0] proofs, and there | are many things done in modern maths which are not | constructivist. Maybe a better approach would be to use Curry- | Howard [1] correspondence to directly get proofs from generated | programs | | [0] | https://en.wikipedia.org/wiki/Constructivism_(philosophy_of_... | | [1] | https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspon... ___________________________________________________________________ (page generated 2022-10-12 23:01 UTC)