[HN Gopher] OpenAI Codex
       ___________________________________________________________________
        
       OpenAI Codex
        
       Author : e0m
       Score  : 237 points
       Date   : 2021-08-10 17:33 UTC (5 hours ago)
        
 (HTM) web link (openai.com)
 (TXT) w3m dump (openai.com)
        
       | throwaway128327 wrote:
       | I don't understand what is going on, why are people even spending
       | time on this? I think this and copilot and etc are solving a non
       | problem of "we will remove the boring part of programming" by
       | generating a bunch of code, so now it's even more boring to read
       | it and check if it actually does what you want.
       | 
       | In the same time zero of the developers I interviewed know how a
       | linked list is laid out in memory, or what is the pro/con of
       | continuous memory layouts, or even how a cpu works actually.
       | 
       | Maybe those things are not needed anymore, but I see their
       | code... I think it will be better if they know them.
        
         | parksy wrote:
         | This is just nascent technology leading toward something like
         | this:
         | 
         | "Computer, I want to play a game."
         | 
         | "Okay, what will the game be?"
         | 
         | "I want to be a starship captain, give me a cool space ship I
         | can explore the galaxy with"
         | 
         | "Okay... like this?"
         | 
         | "Not quite, make the galaxy more realistic, with real stars and
         | planets. Also make it 3d. I want to be the captain inside the
         | ship."
         | 
         | "How about now?"
         | 
         | "Cool, and there should be space stations I can visit near
         | planets, and I can fly my ship to stars with hyperspace. Make
         | it so I have to trade for fuel at the space stations, maybe I
         | need to mine asteroids or search derelict space ships for
         | treasure. I want to play with my friends too, they can have
         | their own ships or walk around my ship."
         | 
         | "Done, was there anything else?"
         | 
         | "Yes, add different alien races to some of the star systems,
         | and make some of them have alliances. I want to talk to the
         | aliens about their history and culture. Sometimes aliens are
         | unfriendly and we'll have space battles if talking doesn't
         | work. Make it so I can command a fleet and call for
         | reinforcements."
         | 
         | "Processing... Done. Anything else?"
         | 
         | "Actually this is boring, can we start over?"
         | 
         | "Game erased. Please provide new prompt."
        
           | vimy wrote:
           | Also know as the holodeck from Star Trek.
        
           | throwaway128327 wrote:
           | Oh! this will be so cool! do you really think it could lead
           | in that direction? To me it seems more like a metaphysical
           | cargo cult. I think I am too pessimistic, I should shake it
           | off, nothing good comes out of being pessimistic (by
           | definition).
           | 
           | Thanks for the inspiration!
        
             | parksy wrote:
             | > do you really think it could lead in that direction?
             | 
             | If you asked me 20 years ago, or even 10, I'd have said it
             | was total science fiction. I wouldn't have been able to
             | imagine how to do it. If you asked me 5 years ago, I'd have
             | vaguely said something about AI, half jokingly. At the time
             | I thought perhaps the models could be trained so we can do
             | test-only development and let AI trained on formal test
             | cases generate endless code until all tests pass, but I
             | didn't really imagine it would be possible to get a
             | computer to take freeform written English (even in a
             | tightly controlled manner) and produce functioning code.
             | 
             | Over the past couple of years I have seen increasingly
             | fluent demonstrations and tried a few myself, and I have
             | fallen off the fence and I think that with the pace that
             | machine learning and AI assisted programming keeps
             | advancing, this outcome is all but inevitable, as far
             | fetched as it seems.
             | 
             | I was messing with the OpenAI sandbox over the weekend and
             | it helped me generate several game design concepts from
             | prompts similar to my post above that I could see myself
             | being interested in building and playing. It's not
             | difficult to imagine down the line with a few more
             | advancements in this tech that the generated design could
             | then instruct the code generator, fetch the assets, and
             | stage the environment for a player or user to enter without
             | ever touching a line of code.
             | 
             | I'm not close enough to the research itself know which of
             | those problems are hard and which are easy, so I don't know
             | if we'll see the first totally AI-generated "proto-
             | holodeck" tech demo in the next 5 years, or the next 20
             | years, but I can't see it being more than 50 years away,
             | and something tells me with the pace of things it will be
             | much sooner than that, assuming we're all still around at
             | the time to enjoy it.
        
               | throwaway128327 wrote:
               | I wonder what will it make when you ask it to make a good
               | bot AI for a game.
               | 
               | "make a game with a formidable opponent that plays good
               | enough to win with 51% probability"
               | 
               | and of course the inevitable "make a better version of
               | yourself"
        
               | parksy wrote:
               | From what I've seen the technology can fuse together a
               | remarkable range of outputs, but all of them are
               | essentially fused together from within the training set.
               | If there were enough examples of AI opponents, it
               | conceivably could do it since most game AIs are some form
               | of state machine combined with a degree of statistical
               | analysis and pathfinding (for mobile AI actors). It would
               | "just" be replicating existing patterns.
               | 
               | As I understand it, it would take a dramatic leap from
               | this kind of interpolation to being able to extrapolate
               | and "self improve". So far I haven't seen anything that
               | convinces me we're close to this, but again I'm not close
               | to the wheel on the research side of things.
        
         | woah wrote:
         | You're interviewing programmers for a job in operating systems
         | programming?
        
           | throwaway128327 wrote:
           | Just full stack devs react native + go. Is it too wrong to
           | think they are the same? Programming is programming, most
           | computers work in a similar way no?
           | 
           | But they also don't know how garbage collection works in
           | their language, or how to work with 1 million things in an
           | efficient manner. Or why does the app pause for 100 ms
           | because someone does sort while parsing dates within the
           | sort.
           | 
           | For example, I have seen people that cant imagine what is the
           | cost of a leaked database transaction, just back of the
           | napkin wise, like you would think well, how many changes
           | happened in between, how much we have to unwind when the
           | session disconnects, when will it even disconnect because of
           | the connection pool.. etc etc. Because the sql server is this
           | magic rds thing. As if aws will solve everything with its
           | pixie dust.
        
         | reducesuffering wrote:
         | Think bigger. Say I'm starting a startup:
         | 
         | 1. "Setup Django, Nginx, and Postgres deployed on a Digital
         | Ocean Ubuntu droplet." Done.
         | 
         | 2. "Make a shopping page like $URL." Done.
         | 
         | 3. "Fill it with data from X and connect with Stripe." Done.
         | 
         | 4. ???
         | 
         | 5. Profit
         | 
         | Seems like even a great dev will take 20x the time to do that
         | if the model is able to correctly generate this, even with an
         | error, customization, or two.
        
           | throwaway128327 wrote:
           | but does it really matter, if 20x is 1 week instead of 2
           | hours?
           | 
           | are startups really that shallow?
        
             | motoxpro wrote:
             | 1/20th of the time? That's kind of a big deal.
        
               | qayxc wrote:
               | That depends: https://xkcd.com/1205/
               | 
               | A one-time setup is perfectly OK to take a few days,
               | especially if afterwards you have a documented process
               | that allows you to modify and improve the result.
        
               | throwaway128327 wrote:
               | i think the 1/20th of the time was mentioned was only at
               | start, i dont think you will gain a lot after that as the
               | spaghetti AI will come to collect.
               | 
               | You have a debt to pay. -- Davy Jones
        
           | dimal wrote:
           | If you don't have someone that understands the generated
           | code, you'll be kinda screwed. Most of my work isn't writing
           | a function to do X. It's reading and understanding all the
           | surrounding code and architecture and then knowing that I
           | need a function to do X. Writing the actual function isn't
           | usually much of a challenge. I get the feeling that this tool
           | will just encourage write-only code that ultimately no one
           | understands. Will all of the generated code follow a
           | consistent style? Will it know to use the framework you built
           | or will it just reinvent everything it needs for each problem
           | you give it? I already see tons of code that people copy and
           | paste without really understanding it, and a lot of the time
           | they're just adding complexity by solving non-problems. This
           | just automates that process. I can see it being useful in
           | certain narrow cases, but the potential for misuse is huge.
        
           | holler wrote:
           | at the point where 1/2/3 are possible, what value does the
           | startup have when anyone else can ask it to do the same
           | thing?
        
             | tux3 wrote:
             | Do your competitors have access to this tool that gets you
             | started 20x faster? If so, you want the tool.
             | 
             | Your copycat startup may not have incredible value, but
             | selling shovels always pays.
        
           | tome wrote:
           | Why would you mention "Django", "Nginx", "Postgres", "Digital
           | Ocean", "Ubuntu" or "Stripe"? Surely those are implementation
           | details that the user wouldn't care about.
        
           | [deleted]
        
         | nradov wrote:
         | It seems like they're going in totally the wrong direction. If
         | program content is predictable based on patterns (low entropy)
         | then that's a sign that our programming languages are too low
         | level. If we want to improve developer productivity then the
         | solution is the same as it always has been: create higher level
         | languages which abstract away all the repetitive patterns.
        
           | mxwsn wrote:
           | Tools are relatively low level compared to any single use
           | case or field because they should universally support all
           | uses cases or fields. The more narrow your field or use case
           | is, the fewer resources there are to create a higher level
           | language that abstracts away the details that aren't
           | important for your area, but are important to other areas. In
           | this manner, Codex has enormous potential.
        
       | temp8964 wrote:
       | Can this read existing code and fix one missing piece? That will
       | be cool.
       | 
       | Say I have a question I can't solve by searching through
       | stackoverflow. If the AI can solve a problem like that, it will
       | be great.
        
         | priyanmuthu wrote:
         | Program Synthesis can do some rudimentary fixes. But I would
         | love to explore this problem of program correction using AI.
        
       | maxwells-daemon wrote:
       | The "language models don't really understand anything" corner is
       | getting smaller and smaller. In the last few months we've seen
       | pretty definitive evidence that transformers can recombine
       | concepts ([1], [2]) and do simple logical inference using
       | contextual information ([3], "make the score font color
       | visible"). I see no reason that this technology couldn't smoothly
       | scale into human-level intelligence, yet lots of people seem to
       | think it'll require a step change or is impossible.
       | 
       | That being said, robust systematic generalization is still a hard
       | problem. But "achieve symbol grounding through tons of multimodal
       | data" is looking more and more like the answer.
       | 
       | [1] https://openai.com/blog/dall-e/ [2]
       | https://distill.pub/2021/multimodal-neurons/ [3]
       | https://openai.com/blog/openai-codex/
        
         | fpgaminer wrote:
         | > "language models don't really understand anything"
         | 
         | I have a sneaking suspicion that, if blinded, the crowd of
         | people saying variations of that quote would also identify the
         | vast majority of human speech as regurgitated ideas as well.
         | 
         | > I see no reason that this technology couldn't smoothly scale
         | into human-level intelligence
         | 
         | Yup, the OpenAI scaling paper makes this abundantly clear.
         | There is currently no end in sight for the size that we can
         | scale GPT to. We can literally just throw compute at the
         | problem and GPT will get smarter. That's never been seen before
         | in ML. Last time I ran the calculations I estimated that,
         | everything else being equal, we'd reach GPT-human in 20 years
         | (GPT with similar parameter scale as a human brain). That's
         | everything else being equal. It is more than likely that in the
         | next twenty years innovation will make GPT and the platforms we
         | use to train and run models like it more efficient.
         | 
         | And the truly terrifying thing is that, to me, GPT-3 has about
         | the intelligence of a bug. Yet it's a bug who's whole existence
         | is human language. It doesn't have to dedicate brain power to
         | spatial awareness, navigation, its body, handling sensory
         | input, etc. GPT-human will be an intelligence with the size of
         | a human brain, but who's sole purpose is understanding human
         | language. And it's been to every library to read every book
         | ever written. In every language. Whatever failings GPT may have
         | at that point, it will be more than capable of compensating for
         | in sheer parameter count, and leaning on the ability to combine
         | ideas across the _entire_ human corpus.
         | 
         | All available through an API.
        
         | maxwells-daemon wrote:
         | As an add-on to this: I'd encourage anyone interested in this
         | debate to read Rich Sutton's "The Bitter Lesson"
         | (http://www.incompleteideas.net/IncIdeas/BitterLesson.html).
         | 
         | At every point in time, the best systems we can build today
         | will be ones leveraging lots of domain-specific information.
         | But the systems that will continue to be useful in five years
         | will always be the ones freely that scale with increased
         | parallel compute and data, which grow much faster than domain-
         | specific knowledge. Learning systems with the ability to use
         | context to develop domain-specific knowledge "on their own" are
         | the only way to ride the wave of this computational bounty.
        
           | pchiusano wrote:
           | https://rodneybrooks.com/a-better-lesson/ is an interesting
           | retort to the Sutton post.
        
         | Voloskaya wrote:
         | The definition of "understanding" behaves just like the
         | definition of "intelligence": The threshold to qualify gets
         | pushed by as much as the technology progresses, so that nothing
         | we create is ever intelligent and nothing ever understands.
        
         | karmasimida wrote:
         | > The "language models don't really understand anything"
         | 
         | This is still true. By all account, human doesn't need to read
         | 159GB of Python code to write Python, or we simply can't.
         | 
         | But it doesn't necessarily indicate language models aren't
         | useful.
        
           | hackinthebochs wrote:
           | Considering the sum total of data and computation that goes
           | in to creating an intelligent human mind, including the
           | forces of natural selection in creating our innate structure
           | and dispositions, it's not obvious that any conclusions can
           | be drawn from the fact that so much data and compute goes
           | into training these models.
        
             | nightski wrote:
             | Has this transfer of knowledge from one domain to another
             | really been demonstrated by these models/learning
             | processes? I know transfer learning is a thing (I have a
             | couple books on my shelf on it). But it seems far from what
             | you are describing.
        
               | talor_a wrote:
               | they mention in the demo video that the inspiration for
               | codex came from GPT-3 users training it to respond to
               | queries with code samples. I saw some pretty impressive
               | demos of the original model creating SQL queries from
               | plain questions. I'm not sure if that counts as switching
               | domains, but it's something?
        
               | visarga wrote:
               | DALL-E + CLIP models show a deep understanding of the
               | relation between images and text.
        
               | sbierwagen wrote:
               | The AlphaZero algorithm swapped between board games
               | pretty easily. OpenAI could also have been gesturing at
               | this when they named the GPT paper "Language Models are
               | Few-Shot Learners".
        
           | maxwells-daemon wrote:
           | I would argue humans ingest a lot more than 159GB before they
           | can write code. Most of it isn't Python, and humans currently
           | transfer knowledge a lot more efficiently than NNs, but I
           | suspect that'll change as incorporating more varied data
           | sources becomes feasible.
        
         | bufferoverflow wrote:
         | It probably can scale, but we're nowhere near the computational
         | power we need to even recreate the brain. And don't forget, our
         | brain took a billion years to evolve.
         | 
         | A typical brain has 80-90 billion neurons and 125 trillion
         | synapses. That's a big freaking network to train.
         | 
         | Hopefully we can figure out how to train parts of it and then
         | assemble something very smart.
        
           | jacquesm wrote:
           | Takes on average 2.5 decades to train it.
        
             | mattkrause wrote:
             | That's just from the most recent checkpoint :-)
             | 
             | If you were to build it "from scratch" you'd also need to
             | include the millions of years of (distributed) evolution
             | required to get that particular kid to that point.
             | 
             | Tony Zador has some interesting thoughts about that,
             | including"A critique of pure learning", here:
             | https://www.nature.com/articles/s41467-019-11786-6)
        
         | jdonaldson wrote:
         | I think intelligence as defined as "mapping inputs into goal
         | states" is pretty well handled by models, and the models may be
         | able to pick and choose states that are sufficient for
         | achieving the goals.
         | 
         | However, the intelligence that's created by language models is
         | very schizophrenic, and the human-level reflective intelligence
         | that it displays is at best a bit of Frankenstein's monster (an
         | agglomeration of utterances from other people that it uses to
         | form sentences that form opinions of itself or its world).
         | 
         | I think that modeling will help us learn more about human
         | intelligence, but we're going to have to do a lot better than
         | just training models blindly on huge amounts of text.
        
           | visarga wrote:
           | Maybe we're also >50% Frankenstein monsters, an agglomeration
           | of utterances from other people.
        
         | 6gvONxR4sf7o wrote:
         | > The "language models don't really understand anything" corner
         | is getting smaller and smaller.
         | 
         | In my mind, understanding a thing means you can justify an
         | answer. Like a student showing their work and being able to
         | defend it. An answer with a proof understands the answer with
         | respect to the proof it provides. E.g. to understand an answer
         | with regards to first order logic, it'll have to be able to
         | defend a logical deduction of that answer.
         | 
         | These models still can't justify their answers very well, so
         | I'd say they're accurate but only understand with respect to a
         | fairly dumb proof system (e.g. they can select relevant
         | passages or just appeal to overall accuracy statistics).
         | They're still far from being able to justify answers in the
         | various ways we do, which I'd say means that by definition that
         | they still don't understand with regards to the "proof systems"
         | that we understand things with regards to.
         | 
         | Maybe the next step will require increasingly interesting
         | justification systems.
        
           | beering wrote:
           | > In my mind, understanding a thing means you can justify an
           | answer.
           | 
           | What if the language model can generate a step-by-step
           | explanation in the form of text? [0]
           | 
           | There's no guarantee that the reasoning was used to come up
           | with the answer in the first place, and no proof that the
           | reasoning isn't just the product of "a really fancy markov
           | chain generator", but would you accept it?
           | 
           | We're really walking into Searle's Chinese Room at this
           | point.
           | 
           | [0] https://nitter.hu/kleptid/status/1284069270603866113#m
        
           | sbierwagen wrote:
           | >In my mind, understanding a thing means you can justify an
           | answer.
           | 
           | Sure, but how does that work with superhuman AI? Consider
           | some kind of math bot that proves theorems about formal
           | systems which are just flat out too large to fit into human
           | working memory. Even if it could explain its answers, there
           | would just be too many moving parts to keep in your head at
           | once.
           | 
           | We already see something this in quant funds. The stock
           | trading robot finds a price signal, and trades on it. You can
           | look at it, but it's nonsensical: if rainfall in the Amazon
           | basin is above this amount, and cobalt price is below this
           | amount, then buy municipal bonds in Topeka. The price signal
           | is durable and casual. If you could hold the entire global
           | economy in your head, you could see the chain of actions that
           | produce the effect, but your brain isn't that big.
           | 
           | Or you just take it on faith. Why do bond prices in Topeka go
           | up, but not in Wichita? "It just does." Okay, then what was
           | the point of the explanation? A machine can't justify
           | something you physically don't have enough neurons to
           | comprehend.
        
             | gnramires wrote:
             | > Even if it could explain its answers, there would just be
             | too many moving parts to keep in your head at once.
             | 
             | While this is possible in practice, consider the
             | (universal) Turing machine principle: in principle, you can
             | simulate any system given enough memory; we may not have it
             | our brains, but we have pen and paper or simply digital
             | text scratchpad (both of which we use extensively in our
             | lives).
        
             | gnramires wrote:
             | Also, you should note the memory and capabilities required
             | to reach a conclusion might be much greater than to show
             | it's true. Showing a needle may be easy, finding it in the
             | haystack very hard. In this sense the hope for
             | explainability is expanded. But still, I guess the real
             | world is really messy "the full explanation" may be too
             | large -- like when you explain a human intuition, the "full
             | explanation" might have been your entire brain, your entire
             | set of experiences up to that point; yet we can give
             | partial explanations that should be satisfactory
             | 
             | A have a hypothesis that inevitably, reasoning needs to
             | 'funnel' through explicit, logical representations (like we
             | do with mathematics, language, etc.) to occur effectively.
             | Or at least (quasi-)formalization is an important element
             | of reasoning. This formal subset can be communicated.
        
             | 6gvONxR4sf7o wrote:
             | It's not about us being able to interpret answer or
             | justification, but the reasoner's ability to justify. If a
             | superhuman AI can justify its answers in terms of first
             | order logic, for example, it could be defined as
             | understanding the answers with respect to FOL. Whether we
             | as humans are able to check whether this specific bot in
             | fact meets that definition is a separate empirical
             | question.
             | 
             | If that quant algo you mentioned just says "it'll go up
             | tomorrow" that's different than "it'll go up tomorrow" with
             | an attached "it's positively correlated with Y, which is up
             | today" which is different from a full causal DAG model of
             | the world attached, which is again different from those
             | same things expressible in english. But again, those are
             | definitions, which are separate from our ability to check
             | whether they're met.
             | 
             | Luckily, we're not in the realm of bots spitting out
             | unfeasible to check proofs, except for a few niche areas
             | like theorem proving (e.g. four color theorem). For
             | language models like in the article, the best I'm aware of
             | is finding relevant passages to an answer and classifying
             | entailments.
             | 
             | > A machine can't justify something you physically don't
             | have enough neurons to comprehend.
             | 
             | We can't always verify its justification, but it either can
             | or can't justify an answer with respect to a given
             | justification system.
        
             | cscurmudgeon wrote:
             | We build another system we fully understand that can
             | process the justification and see if it is correct/makes
             | sense.
        
           | joshjdr wrote:
           | I found it on Stack Overflow!
        
           | visarga wrote:
           | > Maybe the next step will require increasingly interesting
           | justification systems.
           | 
           | You can just ask it to comment what it intends to do. It's
           | surprising actually.
        
           | maxwells-daemon wrote:
           | Look at the "math test" video.
           | 
           | Given the question: "Jane has 9 balloons. 6 are green and the
           | rest are blue. How many balloons are blue?" The model
           | outputs: "jane_balloons = 9; green_balloons = 6;
           | blue_balloons = jane_balloons - green_balloons;
           | print(blue_balloons)"
           | 
           | That seems like a good justification of a (very simple) step-
           | by-step reasoning process!
        
             | wizzwizz4 wrote:
             | Except I could do that with a few regex substitutions,
             | which would not be reasoning. The "intelligence" is in the
             | templates provided by the training data. (Extracting that
             | is _impressive_ , but not _that_ impressive.)
        
         | lstmemery wrote:
         | I have to disagree with you here. In the Codex paper[1], they
         | have two datasets that Codex got correct about 3% of the time.
         | These are interview and code competition questions. From the
         | paper:
         | 
         | "Indeed, a strong student who completes an introductory
         | computer science course is expected to be able to solve a
         | larger fraction of problems than Codex-12B."
         | 
         | This suggests to me that Codex really doesn't understand
         | anything about the language beyond syntax. I have no doubt that
         | future systems will improve on this benchmark, but they will
         | likely take advantage of the AST and could use unit tests in a
         | RL-like reward function.
         | 
         | [1] https://arxiv.org/abs/2107.03374
        
           | nmca wrote:
           | 12B, though. What about 1.2T?
        
             | lstmemery wrote:
             | You need to scale the amount of data to take advantage of
             | the increase in parameters. I'm not sure where we would
             | find another 100 GitHubs worth of data.
        
           | ruuda wrote:
           | > but they will likely take advantage of the AST
           | 
           | In the end, a more general approach with more compute, always
           | wins over applying domain knowledge like taking advantage of
           | the AST. This is called "the bitter lesson".
           | http://www.incompleteideas.net/IncIdeas/BitterLesson.html
        
             | lstmemery wrote:
             | I don't think the bitter lesson is applies to ASTs.
             | 
             | From the Bitter Lesson:
             | 
             | "Early methods conceived of vision as searching for edges,
             | or generalized cylinders, or in terms of SIFT features. But
             | today all this is discarded. Modern deep-learning neural
             | networks use only the notions of convolution and certain
             | kinds of invariances, and perform much better."
             | 
             | Those models are taking advantage of inductive biases.
             | Every model has them, including the massive language
             | models. They are not the same as engineered features (such
             | as SIFTs) or heuristics.
             | 
             | Using the AST is just another way of looking at the code
             | already in your dataset. For the model to understand what
             | it is writing, it needs to map the text sequences map to
             | ASTs anyways. It can attempt to learn this, but the 12B
             | model still makes illegal Python code so it clearly hasn't.
        
             | kevinqi wrote:
             | "the bitter lesson" is a very interesting, thank you!
             | However, I wonder if AST vs. text analysis is fully
             | comparable to the examples given in the post. Applying
             | human concepts for chess, go, image processing, etc. failed
             | over statistical methods, but I don't think AST vs. text is
             | exactly the same argument. IMO, using an AST is simply a
             | more accurate representation of a program and doesn't
             | necessarily imply an attempt to bring in human
             | intuition/concepts.
        
       | abeppu wrote:
       | I'm still surprised by the approach. I mean, great that it works
       | this well -- but program synthesis is one of those rare domains
       | where you can observe exactly what the outcome is after you
       | generate something. You can see execution traces, variable
       | values, what the JIT produced, etc. And all of this is relatively
       | cheap -- often executing a code snippet should be far cheaper
       | than an extra pass through a giant DNN right? So it's fascinating
       | to me that they train entirely from dealing with code as text.
       | 
       | Imagine learning to develop recipes, not by ever cooking or
       | eating or even seeing food, but only reading a giant library of
       | cookbooks. Or learning to compose music but never hearing or
       | playing anything -- only seeing scores.
        
         | wantsanagent wrote:
         | FWIW execution guided code synthesis is a thing. Get a few
         | possible outputs and ditch those that don't pass a parser as an
         | example. At least in the SQL generation realm this is well
         | worth the time it takes to tack onto a large language model.
        
         | [deleted]
        
       | jmportilla wrote:
       | Very cool, will be interesting to see if this is ever added in to
       | VisualStudio as some sort of "super" auto-complete.
        
       | mensetmanusman wrote:
       | If this actually worked, wouldn't that be amazing? If you could
       | break down a software idea into a blue print of concepts that
       | need to be accomplished, and then dictate what should be done...
       | 
       | I doubt it works, but I wonder how many decades from now we will
       | be able to walk through a finite number of simple requests and
       | wrap them together as working software. Then people will be able
       | to convert their blueprint into action!
        
       | GistNoesis wrote:
       | Can I use this to write solidity contracts ?
        
         | mxwsn wrote:
         | That has got to be one of the worst possible use cases one
         | could imagine. In page 33 of the appendix, the authors note
         | that nearly 40% of RSA encryption keys created by Codex are
         | clearly insecure.
        
           | GistNoesis wrote:
           | Only if tokens have value.
           | 
           | If codex is able to handle a generic api from reading the
           | doc, it maybe could use a python library for solidity
           | contracts like
           | https://web3py.readthedocs.io/en/stable/contracts.html
           | 
           | As a contract user, I'd probably have more trust in a
           | contract written by an independent AI from a short natural
           | language specification which can't hide intent, than a
           | contract with hidden backdoor, or a subtle bug.
           | 
           | Also the AI will probably improve with usage.
           | 
           | You probably can generate multiple version of your contract,
           | and maybe a high level bug correction scheme like taking the
           | median action between those version can increase bug
           | robustness and find those edge cases when action differ.
        
           | woah wrote:
           | What does that have to do with anything?
        
       | northfoxz wrote:
       | A new way to talk to the computer I guess.
        
       | vincnetas wrote:
       | Will really be impressed when one could say: "here is this
       | codebase, modify this function so that it would preduce [insert
       | desired efect]" and also other functionality of project would not
       | crash thumbling down...
       | 
       | Because writing code from scratch now is i think much rearer than
       | improoving existing codebases. Aka bugfixing.
        
         | vincnetas wrote:
         | Also curious what this ai would produce when provided with
         | contradictory requests. Because often there are multiple
         | requirements which on theyr own sounds reasonable but when you
         | try to fit all requirements in one system, things get nasty.
        
           | polyanos wrote:
           | It is only able to translate small instructions into code. I
           | think it will take a while to get to a situation where you
           | can just give it a list of requirements and it spits a
           | working program.
           | 
           | Hell it messed up when they gave it the instruction "make
           | every fifth line bold" in their Word api part of the demo,
           | where it made the first line of every paragraph (which is
           | only 4 lines long in total) bold instead of every fifth line.
        
       | 3wolf wrote:
       | I think integrations like the MS Word example they show off at
       | the end of the live demo have the potential to be even more
       | impactful than just generating code for programmers.
        
         | polyanos wrote:
         | That still needs work though, it messed up the "Make every
         | fifth line bold" pretty bad. Still, it showed it could adapt to
         | a new API pretty well.
        
           | 3wolf wrote:
           | Yeah, definitely. I guess my point was that converting
           | natural language to source code can be even more valuable for
           | people who don't know how to code, but want to perform
           | actions more complicated than a simple button press. For
           | example, I often find myself doing regex based find-and-
           | replace-alls in text files, and even that feels inefficient
           | while also being over the head of the vast majority of users.
           | I'd imagine there are a lot of people out there spending many
           | hours manually editing documents and spreadsheets.
        
       | amalive wrote:
       | Would like to say "Fix that something of undefined error" some
       | day.
        
       | dmurray wrote:
       | They should have released this first instead of GitHub Copilot.
       | The focus would then have been much more on "look at the cool
       | stuff they can do" rather than "Microsoft is releasing a product
       | that plagiarizes GPL code".
       | 
       | Once people had digested that and there had been a few other
       | proof-of-concept business ideas around turning Codex into a SaaS
       | (because some people will always queue to build their product on
       | your API), announce the evil version. Not that I really think
       | Copilot is evil, but the IP concerns are legitimate.
        
       | mark_l_watson wrote:
       | I watched their 30 minute demo on Twitch this morning, really
       | good!
       | 
       | I use their OpenAI beta APIs as a paying customer, I am still
       | waiting for access to Codex.
        
       | leesec wrote:
       | The Writing On The Wall
        
       | z77dj3kl wrote:
       | I thought OpenAI was originally supposed to be some kind of for-
       | the-good, non-profit institution studying AI and its safe use in
       | particular with an effort to make it more accessible and
       | available to all through more open collaboration. This is cool
       | research, sure; but what happened to making models available for
       | use by others instead of just through some opaque APIs?
       | 
       | Maybe I'm just remembering wrong or conflating OpenAI with some
       | other entity? Or maybe I bought too much of the marketing early
       | on.
        
         | mark_l_watson wrote:
         | They very transparently transitioned to a for profit company.
         | It doesn't seem like they are aggressively profit oriented
         | though: I am a paying customer of OpenAI beta APIs and the cost
         | to use the service is very low. It also solves several classes
         | of tough NLP problems. I used to sell my own commercial NLP
         | library - glad I gave up on the years ago.
        
         | keewee7 wrote:
         | OpenAI was founded in 2015. In 2015 Google was AI and AI was
         | Google. There was legitimate concern that one American
         | corporation was going to dominate AI. OpenAI was created to
         | challenge that dominance and let "AI benefit all of humanity".
         | 
         | In the meantime China and Chinese companies have catched up.
         | Turns out the fear that one company and one country dominating
         | AI was overblown.
         | 
         | Maybe the OpenAI founders feel that the original goal has been
         | fulfilled because AI is no longer dominated by the US and
         | Google.
        
         | Buttons840 wrote:
         | No, they did some good, they've done a few things to personally
         | help me. They created OpenAI Gym which is a great help when
         | doing reinforcement learning research and defined the standard
         | interface for reinforcement learning libraries for a
         | generation. But they not longer maintain OpenAI Gym.
         | 
         | They also created Spinning Up [0], one of the best resources
         | I've found for learning reinforcement learning. Their teaching
         | resources are detailed but relatively brief and are focused on
         | implementing the algorithms, even if some of the "proofs" are
         | neglected. But they no longer maintain Spinning Up.
         | 
         | So yes, originally they were for-the-good, but lately I've
         | noticed them moving away from that in more ways than one. It
         | seems they learned one cool trick with language sequence
         | modelling, and they have a lot of compute, and this is all they
         | do now.
         | 
         | [0]: https://spinningup.openai.com/en/latest/
        
         | blt wrote:
         | That was the marketing message. They became for-profit in 2019
         | and took investment from Microsoft. Many people were skeptical
         | before that because the main investors were mostly known for
         | for-profit ventures.
        
         | webmaven wrote:
         | You're remembering correctly. OpenAI transitioned from non-
         | profit to for-profit in 2019, took about $1 billion from
         | Microsoft (there has been speculation that this was mostly in
         | the form of Azure credits), and announced that Microsoft would
         | be their preferred partner for commercializing OpenAI
         | technologies: https://openai.com/blog/microsoft/
        
         | stingraycharles wrote:
         | I remember Sam Altman, when asked "How will you make money?",
         | reply they would ask the AI. I thought it was a fairly creative
         | answer.
         | 
         | It turns out, however, that the way they plan on earning money
         | is much less creative, and more run-of-the-mill SaaS
         | monetization. In a way, I like to believe that a real AI would
         | also end up with such a mundane strategy, as it's the most
         | likely to actually make them profitable and return money to
         | investors.
        
       | amrrs wrote:
       | I feel that OpenAI Codex could become like Webflow for coding. It
       | might sound ironic, but what tools like Webflow in the world of
       | Web programming is to give the power of creators to build
       | something fast that can long last (without the speciality of a
       | decent web programmer).
       | 
       | If the same thing can happen in the world of programming, I guess
       | evaluations like LeetCode and Whiteboarding can go away and bring
       | in a new of logical thinking evaluation which could ultimately be
       | a more realistic method of some programmer rising above the
       | chain.
        
       | Vermeulen wrote:
       | A warning to devs building on OpenAI APIs: We spent months
       | developing a chatbot using GPT3 for our game and released a video
       | showcasing it: https://www.youtube.com/watch?v=nnuSQvoroJo&t=264s
       | 
       | Afterwards OpenAI then added GPT3 chatbot guidelines disallowing
       | basically anything like this. We were in communication with them
       | beforehand, but they decided later that any sort of free form
       | chatbot was dangerous.
       | 
       | What they allow changes on a weekly basis, and is different for
       | each customer. I don't understand how they expect companies to
       | rely on them
        
         | nradov wrote:
         | The notion of a toy like a chatbot being "dangerous" is just so
         | ludicrous. The OpenAI folks take themselves way too seriously.
         | Their technology is cool and scientifically interesting, but in
         | the end it's nothing more than a clever parlor trick.
        
           | mszcz wrote:
           | I think different kind of dangerous, not the SkyNet stuff.
           | The first idea that popped into my mind is below. I know,
           | it's dark but...
           | 
           | 8 year old to AI: "my parents won't let me watch TV, what do
           | I do?". AI: "stab them, they'll be too busy to forbid you".
           | 
           | Then again the same thing can be said by a non-AI. My
           | thinking is that you'd be talking to an _actual average_
           | person. I 'm not so sure that that is such a good thing.
        
           | EamonnMR wrote:
           | Definitely dangerous from a legal perspective if AI Dungeon
           | is any indication.
        
             | elefanten wrote:
             | The general public basically races to test the most
             | controversial content. As exhibited by several other high-
             | profile chatbot launches.
             | 
             | > Tay responded to a question on "Did the Holocaust
             | happen?" with "It was made up"
             | 
             | https://en.m.wikipedia.org/wiki/Tay_(bot)
        
           | aeternum wrote:
           | It's pretty easy to get GPT-3 to say things that are
           | incredibly sexist and racist. I think OpenAI is more
           | concerned about the bad press associated with that than AI-
           | safety.
        
             | Siira wrote:
             | Which is even less ethically defensible.
        
         | andreyk wrote:
         | Oh man, I was looking forward to this a ton! Are you thinking
         | to keep working on it with the open source GPT J or something
         | similar by any chance?
        
           | Vermeulen wrote:
           | I am looking at GPTJ, and also hoping OpenAI comes to their
           | senses on how dangerous a video game chatbot can be
        
         | MasterScrat wrote:
         | > Afterwards OpenAI then added GPT3 chatbot guidelines
         | disallowing basically anything like this. We were in
         | communication with them beforehand, but they decided later that
         | any sort of free form chatbot was dangerous.
         | 
         | Was this announced anywhere? We applied to deploy an
         | application in this space, and they refused without providing
         | any context, so I'd be really interested if they published
         | details about restrictions in this space somewhere.
        
           | Vermeulen wrote:
           | https://beta.openai.com/docs/use-case-guidelines/use-case-
           | re... "reliably being able to limit the conversational topics
           | to strictly X, Y, and Z topics"
        
         | Miraste wrote:
         | OpenAI cloaks themselves in false "open" terminology to hide
         | how proprietary and incredibly restrictive they've made their
         | tech. That's a very cool demo; have you considered trying to
         | make it run on GPT-J instead? It's an open source alternative
         | you can run yourself or pay an independent api provider without
         | supporting OpenAI.
        
           | Vermeulen wrote:
           | Haven't been able to find a GPT-J service with good latency -
           | though we haven't tried hosting ourselves
        
             | spullara wrote:
             | I have gotten it running on AWS in a container if you want
             | the Dockerfile/scripts I can send it to you. Email is in my
             | profile.
        
         | fpgaminer wrote:
         | It sucks that OpenAI has no competition right now. They have
         | every right to control their technology however they like. But
         | it's a shame that they're being so stifling with that right,
         | killing really fun stuff like you demonstrated.
         | 
         | But that monopoly won't last, and I think it's more than likely
         | that competition will crop up within the next year. There's
         | definitely a lot of secret sauce to getting a 175B parameter
         | model trained and working the way OpenAI has. The people
         | working there are geniuses. But it can still be reproduced, and
         | will. Once competition arrives I'm hoping we'll see these
         | shackles disappear and see the price drop as well. Meanwhile
         | the open source alternatives will get better. We already have
         | open source 6B models. A 60B model shouldn't be far off, and is
         | likely to give us 90% of GPT-3.
        
         | option_greek wrote:
         | That's a really interesting demo. What makes the responses so
         | laggy? Does the model take that long to generate text? You can
         | also experiment with things like repeating the user question or
         | adding pauses like "hmm let's see" to make it less noticeable
         | at least some of the time.
         | 
         | Too bad they asked you to pull it. What's the danger they are
         | worried about? Annoying thing from their press releases is how
         | seriously they take their GPT3 bot impact on humans. Despite
         | all the hype, it's difficult to see the end of humanity by GPT3
         | bots any time soon. Honestly they need to rename themselves -
         | can't see what's open about openai.
        
           | maxwells-daemon wrote:
           | Autoregressive transformers take a while to generate text,
           | since you need to run the whole model once for every word in
           | the output.
        
           | Vermeulen wrote:
           | It's laggy since it needs to do speech to text, gpt3 text
           | response, then text to speech. Not sure what adds the most
           | latency actually.
           | 
           | They only allow gpt3 chatbots if the chatbot is designed to
           | speak only about a specific subject, and literally never says
           | anything bad/negative (and we have to keep logs to make sure
           | this is the case). Which is insane. Their reasoning to me was
           | literally a 'what if' the chatbot "advised on who to vote for
           | in the election". As if a chatbot in the context of a video
           | game saying who to vote for was somehow dangerous
           | 
           | I understand the need to keep GPT3 private. There is a lot of
           | possibility for deception using it. But they are so scared of
           | their chatbot saying a bad thing and the PR around that
           | they've removed the possibility of doing anything useful with
           | it. They need to take context more into account - a clearly
           | labeled chatbot in a video game is different than a Twitter
           | bot
        
             | dfraser992 wrote:
             | But what if it wasn't clearly labeled? I did my MSc thesis
             | on fake reviews and discussed the phenomena known as
             | "covert marketing" a bit. e.g. a guy you're talking to in a
             | bar at some point steers the conversation to the excellent
             | beer he is drinking and heavily recommends it to you. Good
             | enough actors will be very convincing. "Influencers" are a
             | somewhat more ethical alternative that takes advantage of
             | humans' lemming-like nature.
             | 
             | I mean, quite a lot of people truly believe Hilary Clinton
             | is the mastermind behind a DNC run pedophile ring. Yes, she
             | is a problem, but that theory is completely schizophrenic.
             | A NPC masquerading as a real person who spouts positive
             | talking points about Tucker Carlson's respect for Hungary
             | is quite reasonable compared to that and it will suck some
             | people in.
             | 
             | So all it takes is some right wing developers for a not-
             | entirely-just-a-game like Second Life or Minecraft to
             | introduce a bug that allows certain instances of NPC to be
             | unlabeled... or a mod to a game that drives a NPC... and an
             | equivalent to GPT-3 funded by the Kochs or the Mercers...
             | 
             | Very hypothetical, very hand waving. But it is possible. So
             | I can see the PR and legal departments flat out stopping
             | this idea.
        
             | minimaxir wrote:
             | > But they are so scared of their chatbot saying a bad
             | thing and the PR around that they've removed the
             | possibility of doing anything useful with it.
             | 
             | It's not unreasonable to have checks-and-balances on AI
             | content, and there should be.
             | 
             | However, in my testing of GPT-3's content filter when it
             | was released (it could be improved now), it was _very_
             | sensitive to the point that it had tons of false positives.
             | Given that passing content filter checks is required for
             | productionizing a GPT-3 app, it makes using the API too
             | risky to use, and part of the reason I 'm researching more
             | with train-your-own GPT models.
        
               | nradov wrote:
               | Why should there be checks and balances on AI content?
               | What most people label as "AI" today is literally just
               | fancy statistics. Should there be checks and balances on
               | the use of linear regression analysis and other
               | statistical techniques? Where do we draw the line?
        
               | minimaxir wrote:
               | > Should there be checks and balances on the use of
               | linear regression analysis and other statistical
               | techniques?
               | 
               | That rhetorical question actually argues against your
               | point: even in academic contexts, statistics can be used
               | (intentionally or otherwise) to argue
               | incorrect/misleading points, which is why reputable
               | institutions have peer reviews/boards as a level of
               | validation for papers.
               | 
               | The point I was making was more on general content
               | moderation in response to user-generated content, which
               | is _required_ for every service that does so for legal
               | reasons at minimum, as they 're the ones who will get
               | blamed if something goes wrong.
        
               | mola wrote:
               | Ofcourse statistical techniques need checks and balances,
               | hence peer reviewed academic papers, meta analysis, etc.
               | statistics is a major tool for science these days.
               | science needs checks and balances otherwise it's a pretty
               | idle effort. Without checks and balances, you could just
               | imagine any theory and believe it's the truth because you
               | want to.
        
             | ummonk wrote:
             | Eh, I could still see a clearly labeled chatbot on a video
             | game causing a major PR scandal if it says something
             | offensive. Not really worth the risk.
             | 
             | Pretty bad that they took so long to decide on this,
             | though, pulling out the rug from under developers' feet.
        
         | qwertox wrote:
         | This stunning. Imagine being able to practice your foreign
         | language lessons this way.
        
           | TchoBeer wrote:
           | How many languages does GPT3 support at the moment?
        
         | make3 wrote:
         | I work in this domain, and you can make these things say
         | anything with a little probing, even stuff like "Hitler was
         | right to kill all the Jews, I wish he was still alive today."
         | 
         | They likely don't want to have "OpenAI GPT-3" and such stuff
         | associated to one another in such demos, would be really bad
         | for their appearence.
        
       | refulgentis wrote:
       | I'm trying to extract some signal from this link...lots of
       | upvotes, no comments, 30 min old, top 3 on HN...I'm worried this
       | will be read as negative, but it's not, just learning, and enough
       | time has passed I'm itching to jump in and ask:
       | 
       | - Is the significance here exactly what it says on the tin: the
       | model behind GitHub's AI code completion will be shared with
       | people on an invite basis? Or am I missing something?
       | 
       | - What is the practical import of the quote at the end of this
       | comment?
       | 
       | "can now" makes me think its a new feature over Github's
       | implementation, which would then indicate the "simple commands"
       | could be general UI, or at least IDE UI, navigation.
       | 
       | If "can now" means "it is currently capable of, but will be
       | capable of more", then I'd expect it to be the same as the
       | current implementation on Github.
       | 
       | Quote: "Codex can now interpret simple commands in natural
       | language and execute them on the user's behalf--making it
       | possible to build a natural language interface to existing
       | applications."
        
         | sbierwagen wrote:
         | Take a look at the video demo. It takes natural text in a box
         | and generates code. Copilot was super-autocomplete, so the
         | interface was writing code in an IDE that it filled out for
         | you. Natural language interface will be a little easier for
         | non-programmers. (Though, how would you read the code to make
         | sure it does what you meant...)
        
           | polyanos wrote:
           | >Take a look at the video demo. It takes natural text in a
           | box and generates code. Copilot was super-autocomplete, so
           | the interface was writing code in an IDE that it filled out
           | for you.
           | 
           | No it wasn't, you can literally describe, in natural text,
           | what you want in a comment and CoPilot will do its best to
           | generate a complete method based on that comment. It seemed
           | like it was so auto-compltely because that focussed on the
           | "helping the developer" part.
           | 
           | I'm fairly sure CoPilot could have shown something similar if
           | they had a demo where you could make something visual easily,
           | like HTML + Javascript/Typescript/whatever scripting
           | language. They're using exactly the same model (Codex) after
           | all.
        
       | am17an wrote:
       | I really want to just play with this tech- it's frightening but
       | also the future, but I'm still waiting to be accepted on the
       | GitHub copilot waitlist. I wonder how long this will take for
       | people who don't know someone who knows someone...
        
         | [deleted]
        
         | febrilian wrote:
         | Uhh... I'm literally no one but got the access for like a week
         | or so. I got 134 repos and 12,060 contributions in the last
         | year. Idk if that mattered.
        
         | andyxor wrote:
         | that's not the future, these large language models have no
         | understanding of language, they repeat the most frequently
         | occurring patterns like parrots. They miss this whole thing
         | called semantics.
        
       | f0e4c2f7 wrote:
       | They just finished a demo on twitch. Pretty crazy!
       | 
       | https://www.twitch.tv/videos/1114111652
       | 
       | Starts at 15:45.
        
         | j0ej0ej0e wrote:
         | aaaand they've blocked audio until 18:17ish, timestamp url:
         | https://www.twitch.tv/videos/1114111652?t=00h18m17s
        
         | raidicy wrote:
         | lmao; copyright muted so you can't even hear them speaking.
        
           | [deleted]
        
         | karmasimida wrote:
         | It is simultaneously impressive and underwhelming for me.
         | 
         | I mean yes this is a super impressive demo, but it didn't go
         | beyond my expectation. I really want to see whether this model
         | can write a correct binary search method without seeing one
         | before.
         | 
         | Or even correctly using the binary search, does it understand
         | concept like index boundaries?
        
           | stavros wrote:
           | > I really want to see whether this model can write a correct
           | binary search method without seeing one before.
           | 
           | I don't believe the model was trained on Google interview
           | answers, sadly.
        
           | polyanos wrote:
           | I found the whole UI/sandbox they created the most
           | interesting part. Now don't get me wrong, the tech is
           | certainly great and all, but I really didn't had the feeling
           | I watched/learned more than I already knew from what was
           | shown with Github CoPilot, although I was kinda impressed, if
           | it really is as simple as they stated, at how it is able to
           | adapt to new apis.
           | 
           | It's a shame they only limited the demo to relatively simple
           | instructions.
        
       ___________________________________________________________________
       (page generated 2021-08-10 23:00 UTC)