[HN Gopher] GPT-3 can run code ___________________________________________________________________ GPT-3 can run code Author : maytc Score : 163 points Date : 2022-03-29 15:31 UTC (7 hours ago) (HTM) web link (mayt.substack.com) (TXT) w3m dump (mayt.substack.com) | unixhero wrote: | Great so how do I run GPT-3 on my own hardware at home? | mr_toad wrote: | It's not available to the public or open source so you can't. | Only the smallest models might run on a single GPU, the largest | would need a large grid. | DC-3 wrote: | Very far from an expert on ML, but isn't GPT-3 trivially not | Turing Complete since it halts deterministically? | [deleted] | charcircuit wrote: | >Is GPT-3 Turing complete? Maybe. | | It's obviously not. To handle infinite loops it needs to solve | the halting problem. Which is not possible. | anyfoo wrote: | I don't quite understand your answer. You don't need to solve | the halting problem to be Turing complete, quite obviously. Why | would GPT-3 need to in order to be? | mountainriver wrote: | This is such an interesting field but I think there needs to be | more focus on determinism and correctness. The stuff that's | happening with retrieval transformers is likely where this is | heading | a-dub wrote: | is there a search engine for the training data so that one can | verify that it is actually performing novel operations and not | just quoting back stuff from its incredibly large training set? | algon33 wrote: | If I remember rightly, the AlphaCode paper includes a list of | benchmarks, including the results of a finetuned GPT-3 for | coding. I think they did it because Codex wasn't available to | them when were doing their tests, but I might be wrong there. | imranq wrote: | An interesting research direction would be to see how much the | GPT3 deviates as we get more precise on various computational | tasks. Possibly this would give some measure of some of the | concepts the model has learned | sho_hn wrote: | Do we today have any test suites/benchmarks for models along | those lines? | kaetemi wrote: | It has a ton of programming books in its training data. It only | "runs" anything that's close enough to any samples it has seen | that included output. Anything complex, and it fails, because it | does not reason about it logically. It's bad at the same things | humans are bad at. | mr_toad wrote: | Human programmers rely on intuition and experience much more | than some people give them credit for. An experienced | programmer can find common errors quickly, simply because | they've seen (and made) so many. | | Being able to intuit what a block of code does is actually a | core skill; having to actually step through code in your head | is slow and difficult. | Avalaxy wrote: | Just because you can, doesn't mean that you should. For some | things it's just better to use a rules-based engine that is | always correct, rather than a heuristics based algorithm that | gives answers that are merely close. | tasty_freeze wrote: | I don't think the author of the piece (or anyone for that | matter) thinks GPT-3 should be used for running programs or | evaluating functions. | | It is being discussed because it is surprising that GPT-3 can | do it at all. It is worth investigating what types of emergent | knowledge and behavior are encoded in the trained network, as | the boundaries of its capabilities may help illuminate future | neural network architecture design. | kevincox wrote: | I find it quite interesting that in the JSON to YAML example it | reordered the list. If this was an access control list that could | be a serious security issue that could have easily been missed in | review. (Especially if dozens of files like this were changed at | once). Of course a malicious user could have done this as well | and likely got by code review but the fact that it was accidental | is scarier in a way. | timdellinger wrote: | I assume that GPT-3 is just exhibiting rote memory. For small | numbers, it has accurate answers memorized from the training set, | but for larger numbers, it just "remembers" whatever is close... | hence the ability to estimate. | | My take is not that GPT-3 can run code, but rather that GPT-3 has | memorized what code looks like and what the output looks like. | lopatin wrote: | Can someone explain for a dummy how this is possible? How does it | know that range() is zero indexed? Was it specifically trained on | Python input/function/output data? Or did it just "learn" it? Do | the researchers know how it learned it? | | Does it actually "run" the code? Like, if it was looping over 1 | billion iterations would it take 1B times longer than if it was | just one iteration? I have so many questions. | MauranKilom wrote: | > How does it know that range() is zero indexed? | | If you read through all of the internet once, would _you_ know | that range() is zero indexed? | | > Like, if it was looping over 1 billion iterations would it | take 1B times longer than if it was just one iteration? | | It clearly cannot, because querying the network for a token | executes the exact same sequence of operations every time. | | But it's very impressive that it can basically recognize the | Collatz Conjecture in the code and mostly guess in the right | ballpark for the results. | | The fact that it's just liking (in a loose sense) inputs to | inputs it has seen is quite visible in the f(g(x)) vs g(f(x)) | behavior - the former is significantly more common, so it | struggles to work with the latter. | etskinner wrote: | It seems more likely that it learned it. If you knew nothing | about Python, but understood the word "for" a little, and | understood code a little, you're likely to figure out that | range() is zero-indexed after you see something like this a few | times | | >>> for i in range(3): print(i) | | 0 1 2 | lopatin wrote: | My mind is just blown that it learned a language runtime | based on examples. What would happen if you gave it an | infinitely recusrive function? It can't stack overflow, | there's no stack! Wait, is there? | stevenhuang wrote: | My guess is it would respond with the standard stack | overflow error, from examples of similar output posted in | its training set. | lucidrains wrote: | https://alphacode.deepmind.com/ gives you a glimpse inside of | what emerged from a similar attention net trained on code. | however, whether the attention net has been forced upon pixels, | language, amino acid sequences, the resultant representations | are a bit beyond human reasoning, even if we can examine what | individual attention heads are 'looking' at | [deleted] | [deleted] | [deleted] | mhh__ wrote: | How do you know a range is zero indexed? (As in how is it | stored in your brain) | lopatin wrote: | I have no idea how it's stored in my brain. Is that the same | way it's stored in GPT-3? | ogogmad wrote: | Theorem: There are infinitely many prime numbers. | | Proof: | | _[GPT-3 's "proof"]_ | | Let p be a prime number. Then by definition, p is a positive | integer that has no positive divisors other than 1 and p. | | Now consider the integer q = p + 1. We will show that q is not a | prime number, which will prove that there are infinitely many | prime numbers. | | Since p is a positive divisor of q, q cannot be prime. | | _[...another attempt with the same prompt]_ | | We know that there are infinitely many natural numbers. Let's | suppose that there are only finitely many prime numbers. Then we | can list them all: | | p_1, p_2, p_3, ..., p_n | | Now, consider the number | | N = p_1 p_2 p_3 ... p_n + 1 | | This number is greater than any of the prime numbers in our list, | so it cannot be divisible by any of them. But it is also a | natural number, so it must be prime. This contradicts our | assumption that there are only finitely many prime numbers, so we | must conclude that there are infinitely many prime numbers. | | ~~ _Personal remark_ : I had to give it the same prompt like 5 | times to get a correct-ish proof. The first attempts came out | like the first example.~~ The proof is still wrong. | [deleted] | nonameiguess wrote: | This isn't really the "human level mathematician" equivalent | task anyway. A human mathematician's main purpose isn't to | memorize and reproduce proofs generated by other people. It's | to prove original results no one else has proven before. To | remember and reproduce existing proofs, I just typed "proof | infinitely many primes" into DuckDuckGo and it gave me plenty | of correct results. | karpierz wrote: | That's like saying "standing still" isn't a human-level | sprinter's task. In principle, yes, nothing in the 100m | sprint requires that you need to be able to stand still. In | practice, I would be very skeptical of someone who can't | stand claiming they can sprint. | zardo wrote: | It's a human level mathematics student problem. If it can't | determine that's it's proof is nonsense here there's little | hope it could produce any worthwhile original work. | jameshart wrote: | What does GPT-3 come up with if you ask it for a proof that | there are a finite number of primes? Or that pi is rational? | | I guess it would stitch together some more seemingly sensible | statements that also don't quite add up to a rigorous proof? | [deleted] | gnulinux wrote: | Both proofs are wrong, second one is closest. Second one should | not claim that N is a prime (it likely isn't). It should say N | is not divisible by any of p_i, and since due to Fun. Theo. of | Arith. it is such that N = Sum {c_i q_i} where q_i are prime, | and none of q_i in {p_i} which shows a finite list of primes is | not possible construct. | brian_cloutier wrote: | Interestingly, these attempts are about the same as what pops | up when I try to remember the proof: | | - It's a proof by contradiction - The key step is in taking the | finite list of primes, multiplying them together, and adding 1 | | I then try to flesh out the details, it might take a second to | realize that this new number is also prime, and then a few | moments more to remember the exact rationale why. | | Along the way the proof lives in a kind of superposition where | I'm not clear on the exact details. The "proofs" you gave here | seem to be serializations of a similar superposition! GPT-3 | seems to remember the proof about as well as I do, but it's | missing the final sanity check which tweaks the proof until all | the pieces correctly fit together. | | In this case, you seem to be performing a version of this | sanity check by running the prompt multiple times until a | correct answer comes out. I wonder if it's possible to prove | something more obscure using a similar process: GPT-3 comes up | with ideas and the human sanity checks. | ctoth wrote: | I believe this recent paper demonstrates a method for | allowing these large language models to perform this "sanity | check" automatically[0]. | | [0]: Self-Consistency Improves Chain of Thought Reasoning in | Language Models https://arxiv.org/abs/2203.11171 | actually_a_dog wrote: | The thing I find interesting about the proof attempts in the | GP comment is that they very much resemble what you'd expect | to see coming from a hypothetical somewhat confused | undergrad. I think that ties into what you say about the | proof living "in a kind of superposition where I'm not clear | on the exact details," because that's where I imagine said | hypothetical confused undergrad's understanding being. | mr_toad wrote: | It's imitation rather than true understanding. Still, even | imitation is a remarkable ability for a computer. | Banana699 wrote: | >this new number is also prime | | Not necessarily, it might be composite, but in this case one | of it's prime factors will necessarily not lie in the | supposed list of primes, therefore also a contradiction. | | The first counter example to "If L := {P0,P1,..,Pn} is a list | of primes, then prod(L)+1 is prime" is {2,3,5,7,11,13}, their | product is 30030, and 30031 is a composite of 2 primes, none | of which are in the list. | falcor84 wrote: | It's somewhat silly semantics, but I believe it is a valid | deductive step on the way to the contradiction - if the | number is not divisible by any other prime, then it must be | a new prime, [?]. | ivegotnoaccount wrote: | The issue is that it is not divisible by any other prime | *from the list*. The two cases (prime or composite) must | be handled separately since they do not use the same | logic to infer there is one more prime. | | For instance, 2 * 3 * 5 * 7 * 11 * 13 + 1 = 30031 = 59 * | 509. | ravi-delia wrote: | But to get the contradiction, you assume a finite number | of primes. As each of them does _not_ divide the new one, | the new one is not divisible by a prime. It seems like | your method is some kind of induction? Which probably | gets a little closer to the "reason" for it, but isn't | the standard proof I've seen. | Tainnor wrote: | You don't need two separate cases. | | Assume p1, ..., pn is a finite list of primes. The sum | p1+...+pn+1 is divisible by a prime, because every | natural number> 1 is. However, it's not divisible by | p1,...,pn, hence there must be an additional prime not in | the list. | | (I think you're right though that GP's "contradiction" | doesn't work) | ogogmad wrote: | I keep asking GPT-3 to prove that the LR algorithm (for finding | eigenvalues and eigenvectors) converges for PSD matrices. It | keeps insisting that it's a form of gradient descent. Is that | true? | daenz wrote: | Nit, but YAML is a superset of JSON, so no conversion required :) | jefftk wrote: | This sort of "do what I mean" situation, where doing the thing | the user intended is different from doing something technically | correct, is a place GPT-3 excels. Even though returning the | input would be easiest, it has the pragmatic judgement to | predict that's not what the user wants. | mbowcut2 wrote: | So, for people unfamiliar with deep language models like GPT, | it's essentially a program that takes in a prompt and predicts | the next set of words based on a training corpus -- which in | GPT-3's case is a large portion of the internet. In these | examples GPT is not executing any python code, it has just been | trained on enough Python code/output to successfully predict what | kinds of outputs these functions would produce. | kcorbitt wrote: | For folks wanting to play around with the GPT-3 code-editing | capabilities referenced in the article within your own codebase, | I wrote a simple open source VS Code plugin that lets you run | commands against your currently-open file and get GPT-3's | suggested edits back in a diff: | https://marketplace.visualstudio.com/items?itemName=clippy-a... | 58x14 wrote: | I think I'm going to pair this with Copilot and see what | happens. Hopefully I don't accidentally unlock something | bizarre. | zora_goron wrote: | A quick question for anyone familiar with the architecture of | these Transformer-based models -- I've heard that one reason why | they don't work well with numbers is how the inputs are tokenized | (i.e. as "chunks" rather than individual words/numbers). Is there | anything architecturally preventing an exception in this form of | tokenizing in the data preprocessing step, and passing numbers | into the model in the format of 1 digit == 1 token? It seems like | such a change could possibly result in a better semantic | "understanding" of digits by the model. | [deleted] | Veedrac wrote: | Nothing prevents it, no. Transformers are certainly capable of | learning mathematical tasks; consider [1] as an example, which | uses big but regular token lengths. | | Alternatively you could just scale 'till the problem solves | itself. | | [1] https://arxiv.org/abs/2201.04600 | learndeeply wrote: | Anyone have any ideas on how they're doing text insertion using | an auto-regressive model? | lucidrains wrote: | yes, they are most likely finetuning with this type of | pretraining https://arxiv.org/abs/2103.10360 quite easy to | build | PaulHoule wrote: | It would be remarkable if it got the right answers. | | But it can't because it doesn't have the right structure (e.g. | GPT-3 finishes in a finite time, a program in a real programming | doesn't necessarily!) | | GPT-3's greatest accomplishment is that it has "neurotypical | privilege", that is if it gets an answer that is 25% or 95% | correct people give it credit for the whole thing. People see a | spark of intelligence in it the way that people see faces in leaf | axels or in martian rock formations or how G.W. Bush looked in | Vladimir Putin's eyes and said he got a sense of Putin's soul. | (That was about the only thing in his presidency that he later | said he regretted!) | | As an awkward person I am envious because sometimes it seems I | get an answer 98% correct or 99.8% correct and get no credit at | all. | Micoloth wrote: | GPT3 does _not_ think like a human, but it definitely executes | code in a way that is more similar to a human than a computer.. | | Proof is, that indeed humans _do_ get the wrong answer in | quizzes like these sometimes! | | So i cannot understand this point of view of diminishing it as | "spark of intelligence". It is exactly what advertised: a very | big step forward towards real AI, even if definitely not the | last one? | PaulHoule wrote: | It is the Emperor's New Clothes incarnate. | | It has the special talent of hijacking your own intelligence | to make you think it is intelligent. | | People understood this about the 1966 ELIZA program but | intellectual standards have dropped greatly since then. | thrtythreeforty wrote: | > GPT-3 struggles with large numbers, decimal numbers, and | negative numbers. When used it returns answers that are close but | often incorrect. | | Regarding GPT-3's "guesstimates," intuitively it feels like the | network _has_ to guess because it hasn 't been given a way to do | exact computation--a neural network is built out of nonlinear | functions--even if it "understands" the prompt (for whatever | value you want to give to "understand"). | | Are there any techniques that involve giving the model access to | an oracle and allowing it to control it? To continue the analogy, | this would be the equivalent of giving GPT-3 a desk calculator. | | If this is a thing, I have other questions. How do you train | against it? Would the oracle have to be differentiable? (There | are multiple ways to operate a desk calculator to evaluate the | same expression.) Also, what control interface would the model | need so that it can learn to use the oracle? (Would GPT-3 emit a | sequence of 1-hot vectors that represent functions to do, and | would the calculator have "registers" that can be fed directly | from the input text? Some way of indirectly referring to operands | so the model doesn't have to lossily handle them.) | ravi-delia wrote: | I believe the dominant thinking is that GPT-3 has trouble with | math because it doesn't see individual digits. It obviously has | no trouble working on words, which are much more discreet than | numbers. I wouldn't be surprised if it had trouble carrying a | long equation though. When writing it can reconsider the whole | context with each new word, externalizing that memory, but with | most computations it would have to carry out the whole thing in | one go. That's a lot of dedicated parameters for a single | subtask. | thrtythreeforty wrote: | > with most computations it would have to carry out the whole | thing in one go | | Is there a way to allow models to say "let me think about | this some more"? With language models like GPT-3 you emit one | token per inference iteration, with its previous output fed | back in as input/state. Can models opt out of providing a | token, but still update state? That would allow it to break | up the computation into discrete steps. | durovo wrote: | I believe GPT-3 has a transformer-based architecture. So it | doesn't recursively ingest it's own output in each | iteration. I believe attention-based transformer models | have enough complexity to be able to learn what you are | talking about on their own. | thrtythreeforty wrote: | Thank you for pointing out the difference. I went and | reread about transformers; previously I thought they were | a kind of RNN. (I am not an ML engineer.) | ravi-delia wrote: | I think it would work, but backprop would be computed in a | different way every time. I'm not an expert, so there may | be sneaky ways around it, but I'm pretty sure you'd lose | out on a long history of little efficiency improvements | when you could just make it more recurrent instead. | daniel-cussen wrote: | And that's where you see the man behind the curtain. | AitchEmArsey wrote: | Next year: GPT-NG offloads it's answers to Amazon | Mechanical Turk, and we've come full circle. | daniel-cussen wrote: | Yeah for sure. With energy prices soaring, Moore's law | being morally over for since 2010, wages being so | completely destroyed by the hatred Democrats have for | them, and the sneaky little misconceptions and errors the | golem's makers did not fight hard enough to let in, AI | will be supplanted by plain I. | edflsafoiewq wrote: | Can it do math on "prose numbers", eg. "two thousand three | hundred and four"? | mirker wrote: | Even the tokenization is wonky. Imagine if you had no concept | of math characters and instead has a lookup table of common- | ngrams (BPE encoding). For example, the binary addition | function "a+b" may be tokenized as a unary "3+b" because | "3+b" occurs commonly. That tokenization is vastly different | from "3.00000001+b". GPT has to invert this tokenization | artifact with finite training data. | visarga wrote: | There are many papers trying to couple language models with | external modules. | | In the Retrieval-Enhanced Transformer (RETRO) paper a large | language model was coupled with a similarity based text index. | It can populate the prompt with relevant information from the | index thus being more grounded and update-able. | | In another paper (AlphaCode) the language model was coupled | with a compiler and could run programs and check if they match | the expected outputs for a few test cases. The model was able | to solve competition style coding problems above average human | score. | | In another paper (Language Models as Zero Shot Planners) a | language model generates commands to navigate a virtual home | environment and performs tasks. The knowledge in the LM helps | in quickly learning tasks. | | A recent one can learn new concepts by simple conversation, | then apply them where necessary. You can talk-train your model. | (Memory assisted prompt editing to improve GPT 3 after | deployment) | | So the trend is to add "toys" on language models - a simulator, | a compiler, a search engine, a long term memory module. | | I'd like to see a recursive language model, that can sub-call | itself to decompose problems. | gwern wrote: | You forgot all the inner monologue | (https://www.gwern.net/docs/ai/gpt/inner-monologue/index) & | scratchpad papers which give it additional steps or access to | Python REPL etc: eg https://arxiv.org/abs/2112.15594 | https://arxiv.org/abs/2111.08267 | https://arxiv.org/abs/2111.08171 | visarga wrote: | AI Chains really takes it to the next level. | emmelaich wrote: | An intriguing thought is that a GAI will behave very much like | a well-read smart individual. With the faults, mystery and | foibles that implies. | spupe wrote: | This is fascinating. I feel that we are still in the infancy of | the field, however. These observations are analogous to | naturalists of the past describing an animal's behavior, but we | need to get to the point where more accurate estimates are made | (ie, how often does it do each thing, how accurate it is after | 100+ tries, etc). Every day we see a new observation showing wha | GPTs can do, we also need a good way to make these observations | systematic. | berryg wrote: | I struggle to understand how GPT-3 executes code. Is it simply | running a python (or any other language) interpreter? Or is GPT-3 | itself interpreting and executing python code? If the latter | question is true that would be amazing. | [deleted] | bidirectional wrote: | It is the latter. | Veedrac wrote: | > GPT-3 seems to have issues with large numbers. Moyix's gist | covers this in detail. GPT-3 tends to guesstimate an algebraic | function instead of evaluating the numbers, so the answer is only | correct to a certain approximation. | | There are two issues here. One is the lack of working memory, | which means that there is very little scratch space for | calculating things with a meaningful sequential depth. GPT-3 is | very unlike traditional evaluation methods in this regard, in | that it is easier for it to interpret the meaning of a program | you give it and then intuit the result given the context than it | is to mechanically execute its steps. | | The other issue is the text encoding, which makes it much harder | for GPT-3 to do digit-by-digit operations. Many arbitrary numbers | are just their own token. A fixed length number to us looks like | a fixed number of characters, but for GPT-3 they can be and | almost arbitrary number of tokens divided into almost arbitrary | chunks. Using thousands separators is very helpful for it. | | If you account for these and design a prompt that mitigates them | you can get much stronger results. Here is an example: | https://news.ycombinator.com/item?id=30299360#30309302. I managed | an accuracy of 42% for 3-by-3 digit multiplication. | bitwize wrote: | GPT-3 is starting to remind me of SCP-914. Give it an input, and | its millions of tiny wheels churn and it produces something like | what you want, but otherwise quite unexpected. | | Let's hope it doesn't turn into something like SCP-079... | csmeder wrote: | What year will GTP be able to take an app written in | Swift/SwiftUI and output a spectacular Android translation? | 3-years? 5-years? 10-years? | | This is an interesting benchmark because it is a very difficult | problem, however: GTP has both everything it needs to do this | without needing a fundamental improvement to the core of GTP | (this process is more of a science than art) and using automated | UI testing GTP can check if its solution worked. | | Thus this challenge is in the realm of what GTP already is, | however, once it can do this it will have massive implications | for how software is built. | anyfoo wrote: | A terrible prospect. | | It's hard enough for people to faithfully port an application. | People who participate and live in the world that makes up our | reality. Leaving this up to an AI will at best flood us with | low quality junk. At worst it's actively harmful. | ivegotnoaccount wrote: | > For example, it seems to understand how to find a sum, mean, | median, and mode. > Input: 1, 4, 5, 6, 2, 1, 1 > Output: | 2.28571428571 | | Well, even with those small numbers, it's wrong. The first "2" | after the dot should not be there. The result it gives is 16/7, | not 20/7. | loganmhb wrote: | I wonder how much of this is an illusion of precision that | comes from pattern matching on content from filler sites like | https://www.free-hosting.biz/division/16-divided-7.html (I do | not recommend clicking the link, but the result appears there). | aplanas wrote: | Seems that it can convert from Python to Perl: | | https://beta.openai.com/playground/p/o4qZWSXVz8JMmVaI9j9NMIK... | 7373737373 wrote: | Has anyone tried using it for SAT problems yet? | timdellinger wrote: | my recollection is that the original journal article announcing | GPT-3 included some data on how it performed against SAT-style | questions ___________________________________________________________________ (page generated 2022-03-29 23:00 UTC)