[HN Gopher] Competitive Programming with AlphaCode
       ___________________________________________________________________
        
       Competitive Programming with AlphaCode
        
       Author : yigitdemirag
       Score  : 456 points
       Date   : 2022-02-02 16:13 UTC (6 hours ago)
        
 (HTM) web link (deepmind.com)
 (TXT) w3m dump (deepmind.com)
        
       | pretendscholar wrote:
       | I am a little bitter that it is trained on stuff that I gave away
       | for free and will be used by a billion dollar company to make
       | more money. I contributed the majority of that code before it was
       | even owned by Microsoft.
        
         | visarga wrote:
         | Paying it forward, it will help others in turn.
        
           | pretendscholar wrote:
           | Yes it will help the already powerful players
           | disproportionately.
        
             | alphabetting wrote:
             | They opensourced alphafold for anyone to use commercially
             | despite big financial incentive to keep it private and use
             | in their new drug discovery lab. No idea how this works or
             | differs from alphafold but imagine they'll do the same here
             | if possible
        
               | pretendscholar wrote:
               | Only after another lab made their own open source one
               | that was comparable.
        
             | kzrdude wrote:
             | The problem is not really that microsoft owns github, or
             | that licenses allow corporations free use, but that the
             | tech giants are so big and have so much power.
        
         | Permit wrote:
         | Can you elaborate and give some history? What code did you
         | contribute, and how did it end up being used by Microsoft and
         | then DeepMind?
        
           | arendtio wrote:
           | > We pre-train our model on selected public GitHub code and
           | fine-tune it on our relatively small competitive programming
           | dataset.
           | 
           | But since the code was 'selected' you don't know if your code
           | was used. However, they seem to have used Python and C++, so
           | my code is probably not part of it.
        
           | [deleted]
        
       | hmate9 wrote:
       | Between this and OpenAI's Github Copilot "programming" will
       | slowly start dying probably. What I mean by that is that sure,
       | you have to learn how to program, but our time will be spent much
       | more on just the design part and writing detailed
       | documentation/specs and then we just have one of these AIs
       | generate the code.
       | 
       | It's the next step. Binary code < assembly < C < Python <
       | AlphaCode
       | 
       | Historically its always been about abstracting and writing less
       | code to do more.
        
         | mhzsh wrote:
         | Creating a higher level abstraction is something people have
         | been trying to do for decades with so-called 4th-generation
         | languages. At some point, abstracting away too much makes a
         | tool too cookie-cutter, and suddenly deviating from it causes
         | more difficulty.
        
           | visarga wrote:
           | Maybe it's not more abstraction we need, just automating the
           | drudgery. Abstractions are limited - by definition they
           | abstract things away, they are brittle.
        
           | vvilliamperez wrote:
           | Read: Ruby on Rails
        
         | streetcat1 wrote:
         | First, If this is correct, if alpha code succeeded, this will
         | bring to its own demise.
         | 
         | I.e. as soon as it starts replacing humans, it will not have
         | enough human generated training data, since all of programming
         | will be done by models like himself.
         | 
         | Second, alphacode was specifically trained for competitive
         | programming :
         | 
         | 1. short programs. 2. Each program has 100's of human generated
         | solutions.
         | 
         | However, commercial program are:
         | 
         | 1. long. 2. Have no predefined answer or even correct answer.
         | 3. Need to use/reuse a lot of legacy code.
        
           | chroem- wrote:
           | Reinforcement learning and adversarial training can render
           | both of those concerns as non-issues in practice.
        
             | ialyos wrote:
             | The phrase "in practice" doesn't really work when you're
             | referring to highly finicky strategies like RL and
             | adversarial training
        
           | AnIdiotOnTheNet wrote:
           | > as soon as it starts replacing humans, it will not have
           | enough human generated training data, since all of
           | programming will be done by models like himself.
           | 
           | As a natural born pessimist, I can't help but feel that by
           | the time we get to that point we'll just keep blundering
           | forward and adapting our world around the wild nonsense
           | garbage code the model ends up producing in this scenario.
           | 
           | After all, that's basically what we've done with the entire
           | web stack.
        
         | pjmorris wrote:
         | I'd note that assembly, C, and Python didn't replace
         | 'programming' but were expected to do so. I'd wager that what
         | you now call 'detailed documentation/specs' will still be
         | called programming in 10 or even 20 years.
        
           | falcor84 wrote:
           | If you could change a sentence in the documentation and then
           | run a ~1min compilation to see the resulting software, it
           | would be a very different kind of programming. I suppose
           | it'll give a new meaning to Readme-Driven-Development.
        
         | wittycardio wrote:
         | Solving competitive programming problems is essentially solving
         | hard combinatorial optimization problems. Throwing a massive
         | amount of compute and gradient descent at the problem has
         | always been possible. If I'm not mistaken what this does is
         | reduce the representation of the problem to a state where it
         | can run gradient descent and then tune parameters. The real
         | magic is in finding structurally new approaches. If anything
         | I'd say algorithms and math continue to be the core of
         | programming. The particular syntax or level of abstraction
         | don't matter so much.
        
           | jdlshore wrote:
           | > If anything I'd say algorithms and math continue to be the
           | core of programming.
           | 
           | I disagree; I think the core of programming is analyzing
           | things people want and expressing solutions to those wants
           | clearly, unambiguously, and in a way that is easy to change
           | in the future. I'd say algorithms and math are a very small
           | part of this work.
        
             | wittycardio wrote:
             | That's not programming, that's called being a good
             | employee. Any person in any role should be doing that.
             | Programming is about algorithms and math. Now a good
             | employee who's in a technical role should have both.
        
               | jdlshore wrote:
               | > Programming is about algorithms and math.
               | 
               | You've simply restated your opinion without providing any
               | supporting arguments, and as I already said, I disagree.
               | The vast majority of programming I see (and as a
               | consultant, I see a fairly wide variety) is not about
               | algorithms and math, but instead gluing together systems
               | and expressing domain logic.
               | 
               | Now, I suppose you could argue that domain logic is
               | "algorithms and math," but in my experience, it's less
               | about the specific algorithms and more about precisely
               | describing fuzzy human behavior.
               | 
               | It's that "precisely describing" and "easy to change in
               | the future" parts that makes what programmers do
               | different than what any good employee does.
               | 
               | (I do agree that there is some programming that is
               | focused on algorithms and math, but it's in the minority,
               | in my experience. Perhaps the type of work you do _is_
               | focused on algorithms and math, but I believe that 's a
               | relatively small part of the software development
               | ecosystem.)
        
           | chroem- wrote:
           | > Solving competitive programming problems is essentially
           | solving hard combinatorial optimization problems.
           | 
           | True, but if you relax your hard requirements of optimality
           | to admit "good enough" solutions, you can use heuristic
           | approaches that are much more tractable. High quality
           | heuristic solutions to NP-hard problems, enabled by ML, are
           | going to be a big topic over the next decade, I think.
        
             | wittycardio wrote:
             | I should correct myself, this isn't even that. This is just
             | text analysis on codeforces solutions, which makes it even
             | worse than I thought. Very pessimistic about it's
             | generalizability.
        
         | Inufu wrote:
         | I agree, I expect programmers will just move up the levels of
         | abstraction. I enjoyed this recent blog post on the topic:
         | https://eli.thegreenplace.net/2022/asimov-programming-and-th...
        
           | hackinthebochs wrote:
           | The "problem" is that as you move up the levels of
           | abstraction, you need fewer people to do the same amount of
           | work. Unless the complexity of the work scales as well. I've
           | always felt that programmers would be the first class of
           | knowledge workers to be put out of work by automation. This
           | may be the beginning of the end for the programming gravy
           | train.
        
             | NicoJuicy wrote:
             | There aren't enough developers either way.
        
             | bmh100 wrote:
             | On the other hand, as the value of an hour of programming
             | increases, the quantity demanded may also increase.
        
             | paxys wrote:
             | > as you move up the levels of abstraction, you need fewer
             | people to do the same amount of work
             | 
             | Yes, but the total amount of work (and surrounding
             | complexity) also increases with it. Just look at the
             | evolution of the software industry over the last few
             | decades.
        
               | hackinthebochs wrote:
               | History isn't a great guide here. Historically the
               | abstractions that increased efficiency begat further
               | complexity. Coding in Python elides over low-level issues
               | but the complexity of how to arrange the primitives of
               | python remains for the programmer to engage with. AI
               | coding has the potential to elide over all the complexity
               | that we identify as programming. I strongly suspect this
               | time is different.
        
             | visarga wrote:
             | > The "problem" is that as you move up the levels of
             | abstraction, you need fewer people to do the same amount of
             | work.
             | 
             | This will lower the entry barrier to developing software so
             | more people will go into the field. Before you needed to
             | know a programming language, now you will just have a
             | dialogue with a language model.
             | 
             | > I've always felt that programmers would be the first
             | class of knowledge workers to be put out of work by
             | automation.
             | 
             | We've been automating our work for 70 years, and look how
             | many programmers are employed now. The more we automate,
             | the more capable our field becomes and more applications
             | pop up.
        
               | hackinthebochs wrote:
               | >This will lower the entry barrier to developing software
               | so more people will go into the field.
               | 
               | Indeed. The ideal future of programming is something out
               | of star trek. I often noticed how everyone on the ship is
               | a programmer of a sort, they whip up a simulation as the
               | problem warrants regardless of their field. But in this
               | future, the job of programmer basically doesn't exist. As
               | a programmer, I should be allowed to have mixed feelings
               | about that.
        
               | visarga wrote:
               | Let your imagination fly. We always want more than it's
               | possible, our wishes fill up any volume like an expanding
               | gas. Humans are going to be crucial to orchestrate AI and
               | extract the most utility out of it.
        
             | hmate9 wrote:
             | Or you can do things at a faster pace and increase your
             | productivity.
        
             | Inufu wrote:
             | Yes, this is how you increase prosperity (see: agricultural
             | revolution, industrial revolution, etc). You can now create
             | more with the same number of people.
        
         | elwell wrote:
         | > writing detailed documentation/specs
         | 
         | That's what code is.
        
         | bmc7505 wrote:
         | I disagree that programming is dying -- tools like Copilot will
         | lead to a Renaissance in the art of computer programming by
         | enabling a larger population to design programs and explore the
         | implications of their design choices. I wrote a short essay [1]
         | on the history automated programming and where I think it is
         | heading in the future.
         | 
         | [1]:
         | https://breandan.net/public/programming_with_intelligent_mac...
        
         | 62951413 wrote:
         | Model-driven development and code generation from UML were once
         | supposed to be the future. It will be interesting to see how
         | much further this approach takes us.
         | 
         | Assuming ANNs resemble the way human brain function you'd also
         | expect them to introduce bugs. And so the actual humans beings
         | would partake in debugging too.
        
         | diehunde wrote:
         | My bet would be that it will never happen in a reasonable time
         | frame. And also by that logic, writing that
         | "documentation/spec" would just mean learning a new programming
         | language the AI engine can parse making it as useful as a
         | compiler. Anyone who has been writing and designing software
         | for a while knows the cycle is way more complex than take some
         | input and write code.
         | 
         | Let me know when the AI engine is able to do complex
         | refactoring or adding features that keeps backwards
         | compatibility, find a bug in a giant codebase by debugging a
         | test case or write code that's performant but also
         | maintainable.
        
           | ctoth wrote:
           | You ever notice how the "let me know when" part of this keeps
           | changing? Let me know when computers can ... play
           | Go/understand a sentence/compose music/write a program/ ...
           | 
           | But surely they'll never be able to do this new reference
           | class you have just now come up with, right?
        
             | diehunde wrote:
             | Not really? I mean I would never say "let me know when
             | computer can do X" when X is something that doesn't require
             | too much creativity and imagination. Like, a computer
             | composing music, doesn't impress me too much because music
             | itself has structure. A computer creating music that would
             | wow a professional composer? That would be impressive. Same
             | with this topic. A computer that solves some (because it
             | failed several) short programming challenges and OP says it
             | will kill programming entirely? Not even close. Pretty cool
             | though.
        
             | Jensson wrote:
             | It keeps changing since our imagination of what tasks
             | requires intelligence are weak. We think that when a
             | computer can do X it can also do Y. But then someone builds
             | a computer that can do X but can't do Y, and we say "oh, so
             | that doesn't require intelligence, let me know when it can
             | do Z and we can talk again.". That doesn't mean that Z
             | means the computer is intelligent, just that Z is a point
             | where we can look at it and discuss again if we made any
             | progress. What we really want is a computer that can do Y,
             | but we make small mini tasks that are easier to test
             | against.
             | 
             | The Turing test is a great example of this. Turing thought
             | that a computer needs to be intelligent to solve this task.
             | But it was solved by hard coding a lot of values and better
             | understanding of human psychology and what kind of
             | conversation would seem plausible when most things are
             | hardcoded. That solution obviously isn't AI, I bet you
             | don't think so either, but it still passed the Turing test.
        
               | ctoth wrote:
               | At what point do we give up and realize that there is no
               | one thing called intelligence, just a bunch of hacks that
               | work pretty well for different things sometimes? I think
               | that's probably where people keep failing here. The
               | reason that we keep failing to find the special thing in
               | every new field that AI conquers is because there's
               | nothing special to actually find? I mean, we could keep
               | moving the goalposts, a sort of intelligence of the gaps
               | argument? But this doesn't seem productive.
        
           | Enginerrrd wrote:
           | I agree, from a totally different angle. Let's take something
           | I know better as an example: Structural engineering.
           | Structural engineering should be a "solved problem". It
           | seems, ostensibly, relatively simple compared to a more open
           | ended activity like "programming".(For "technical reasons",
           | it ends up being more similar than you might think.) Still,
           | you are ultimately dealing with the same materials, the same
           | physics, and very similar configurations.
           | 
           | And yet, despite the fact that we have programs to help
           | calculate all the things, test code-required load-
           | combinations, even run simulations and size individual
           | components... it turns out that, it doesn't actually save
           | that much work, and you still need an engineer to do most of
           | it. And not just because of regulatory requirements. It's
           | just, that's not the hard part. The hard part is assembling
           | the components and specifications, specifying the correct
           | loads based on location-specific circumstances, coming up
           | with coherent and sensible design ideas, chasing down every
           | possible creative nook and cranny of code to make something
           | that was originally a mistake actually work, and know when
           | the model is just wrong for some reason and the computer
           | isn't simulating load paths accurately.
           | 
           | Specifying the inputs and interpreting results is still about
           | as much work as it was before you started with all the fancy
           | tools. Those tools still have advantages mind you, and they
           | do make one slightly more efficient. Substantially so in some
           | cases, but most of the time it still comes out as a slight
           | assist rather than a major automation.
        
           | fvold wrote:
           | I hear that.
           | 
           | Machine Learning also has a long way to go before it can take
           | a long, rambling mess of a meeting and somehow generate a
           | halfway usable spec from it. I mean, the customer says they
           | want X, but X is silly in this context, so we'll give them Y
           | and tell them it's "X-like, but faster". For example, SQL is
           | "Blockchain-like, but faster" for a lot of buzzword use-cases
           | of blockchain.
        
       | mirrorlake wrote:
       | I've been wondering this for a while:
       | 
       | In the future, code-writing AI could be tasked with generating
       | the most reliable and/or optimized code to pass your unit tests.
       | Human programmers will decide what we want the software to do,
       | make sure that we find all the edge cases and define as many unit
       | tests as possible, and let the AI write significant portions of
       | the product. Not only that, but you could include benchmarks that
       | pit AI against itself to improve runtime or memory performance.
       | Programmers can spend more time thinking about what they want the
       | final product to do, rather than getting mired in mundane
       | details, and be guaranteed that portions of software will perform
       | extremely well.
       | 
       | Is this a naive fantasy on my part, or actually possible?
        
         | phreeza wrote:
         | And a second AI to generate additional test cases similar to
         | yours (which you accept as also in scope) to avoid the first AI
         | gaming the test.
        
         | machiaweliczny wrote:
         | First you need really good infra to make it easy to test
         | working multiple solutions for AI but I think this will be
         | bleeding edge in 2030.
         | 
         | EDIT: with in-memory DBs I can imagine AI assisted mainframe
         | than can solve 90% of business problems.
        
         | EVa5I7bHFq9mnYK wrote:
         | It seems to me that writing an exhausting set of unit cases is
         | harder than writing the actual code.
        
       | mrsuprawsm wrote:
       | Does this mean that we can all stop grinding leetcode now?
        
       | BoardsOfCanada wrote:
       | Do I understand it correctly that it generated (in the end) ten
       | solutions that then were examined by humans and one picked? Still
       | absolutely amazing though.
        
         | thomasahle wrote:
         | No human examination was done.
         | 
         | But it generated 10 solutions which it ran against the example
         | inputs, and picked the one that passed.
         | 
         | Actually I'm not sure if it ran the solutions against the
         | example inputs or the real inputs.
        
           | [deleted]
        
           | aliceryhl wrote:
           | They used the real inputs. The example inputs were used to
           | filter out which candidates to submit for the 10 tries.
        
         | aliceryhl wrote:
         | No, they gave the algorithm 10 tries and tested all of them,
         | and said that it was solved if any one of them worked.
        
       | mcast wrote:
       | The year is 2025, Google et al. are now conducting technical on-
       | site interviews purely with AI tools and no human bias behind the
       | camera (aside from GPT-3's quirky emotions). The interview starts
       | with a LC hard, you're given 20 minutes -- good luck!
        
         | jakey_bakey wrote:
         | I think Amazon already tried this and it had surprisingly
         | racist results
        
       | qualudeheart wrote:
       | Calling it now: If current language models can solve competitive
       | programming at an average human level, we're only a decade or
       | less off from competitive programming being as solved as Go or
       | Chess.
       | 
       | Deepmind or openAI will do it. If not them, it will be a Chinese
       | research group on par with them.
       | 
       | I'll be considering a new career. It will still be in computer
       | science but it won't be writing a lot of code. There'll be
       | several new career paths made possible by this technology as
       | greater worker productivity makes possible greater
       | specialization.
        
         | keewee7 wrote:
         | AI is being aggressively applied to areas where AI
         | practitioners are domain experts. Think programming, data
         | analysis etc.
         | 
         | Programmers and data scientists might find ourselves among the
         | first half of knowledge workers to be replaced and not among
         | the last as we previously thought.
        
         | muds wrote:
         | It can be really tempting to think about research progression
         | on a "linear" timescale but more often than not it eventually
         | ends up following an "exponential" curve because of technical
         | debt. And there appears to be a _lot_ of techniques used here
         | which we don't fully understand.
         | 
         | I wouldn't be surprised if a specifically engineered system ten
         | years from now wins an ICPC gold medal but I'm pretty sure that
         | a general purpose specification -> code synthesizer that would
         | actually threaten software engineering would require us to
         | settle a lot of technical debts first -- especially in the area
         | of verifying code/text generation using large language models.
        
         | EVa5I7bHFq9mnYK wrote:
         | Don't worry, there are a lot of much simpler jobs, like drivers
         | or cashiers that will surrender to AI before coder's job does.
         | So UBI will be implemented long before that happens.
        
           | solididiot wrote:
           | I wouldn't be so sure. Programmers (and drivers and cashiers)
           | can "survive" in poverty like millions others already do.
           | This transformation is coming in waves that keep the
           | proverbial frog in the pan.
        
         | simpleguitar wrote:
         | It doesn't even have to be average human.
         | 
         | Let's say AI only gets to 10% (or 20% or 30% or whatever, it
         | doesn't really matter), that's a huge number of jobs being
         | lost.
         | 
         | Imagine having a machine write all the "simple/boring" code for
         | you. Your productivity will go through the roof. The smartest
         | programmer who can most effectively leverage the machine could
         | replace many hundreds of programmers.
         | 
         | I should brush up on my plumbing and apply for a plumbing
         | license soon. (I think plumbing is safer than electricians,
         | because many CS people have good EE foundations).
        
         | phendrenad2 wrote:
         | Calling it now: Your prediction is off by an order of magnitude
         | or two (10 years -> 100 years, or 1000 years)
        
         | abecedarius wrote:
         | Three months ago in the Copilot thread I was saying
         | 
         | > in 5 years will there be an AI that's better than 90% of
         | unassisted working programmers at solving new leetcode-type
         | coding interview questions posed in natural language?
         | 
         | and getting pooh-poohed.
         | https://news.ycombinator.com/item?id=29020401 (And writing
         | that, I felt nervous that it might not be aggressive enough.)
         | 
         | There's this general bias in discussions of AI these days, that
         | people forget that the advance they're pooh-poohing was
         | dismissed in the same way as probably way off in the indefinite
         | future, surprisingly recently.
        
           | hackinthebochs wrote:
           | The issue is these techniques are growing in capabilities
           | exponentially, while we have a habit of extrapolating
           | linearly. Some saw the glaring deficits in copilot then
           | reasoned that linear improvements is still glaring deficits.
           | I don't know that this bias can ever be corrected. A large
           | number of intelligent people simply will never be convinced
           | general AI is coming soon no matter what evidence is
           | presented.
        
             | Jensson wrote:
             | > techniques are growing in capabilities exponentially,
             | while we have a habit of extrapolating linearly
             | 
             | What does this even mean? How do you put a number on AI
             | capability? You can say it is growing faster than people
             | expect, but what is even exponential or linear growth in AI
             | capability?
        
               | hackinthebochs wrote:
               | I take your point that the linear/exponential terminology
               | is a bit dubious. But the simple way to make sense of it
               | is just going by various benchmarks. E.g. the power-law
               | relationship between the model accuracy and the model
               | size: https://eliaszwang.com/paper-reviews/scaling-laws-
               | neural-lm/
        
         | redsummer wrote:
        
         | pkaye wrote:
         | How long before it can write the code without plagiarizing code
         | from online?
        
           | stnmtn wrote:
           | Humans study CS for 5 years, reading code from online to be
           | able to solve these problems.
        
           | falcor84 wrote:
           | How long before the typical human coder can do so?
        
             | pkaye wrote:
             | Are you saying you cannot write code from scratch?
        
               | sheikheddy wrote:
               | Not the parent comment, but I cannot code from scratch
               | (outside of very simple and small applications).
               | Competitive Programming is at about the limit of what I
               | can do without looking things up, and only because I've
               | had practice specifically for that kind of artificial
               | environment.
        
               | falcor84 wrote:
               | I can write some code from scratch, but my ability to
               | write code is improved by an order of magnitude when I
               | can refer to online resources, including example code.
        
         | Jensson wrote:
         | This is in line with what other code generation AI's have
         | accomplished.
         | 
         | To reach average level at codeforces you need to be able to
         | apply a standard operation like a sort, or apply a standard
         | math formula, as the first 1-2 problems in the easy contests
         | are just that. It is impressive that they managed to get this
         | result in real contests with real unaltered questions and see
         | that it works. But generalizing this to harder problems isn't
         | as easy, as there you need to start to device original
         | algorithms instead of just applying standard algorithms, for
         | such problems the model needs to understand computer science
         | instead of just mapping language to algorithms.
        
         | zerr wrote:
         | The thing is, Competitive Programming (CP) is a completely
         | different discipline/subject with its own trivia knowledge and
         | tricks. CP uses Computer Science the same way as e.g. Biology
         | uses Mathematics. It has very little in common with a real
         | world software development.
        
           | qualudeheart wrote:
           | I said as much in another comment.
           | 
           | Automating the software development profession proper is
           | going to be much harder and will require autonomous agents
           | with coherent world models, because that's what you need to
           | act in a business context.
        
         | f38zf5vdt wrote:
         | A programming genie that grants programming wishes to the
         | general public. Since most of what I do on a daily basis is
         | engineering solutions based on tradeoffs, I can only imagine
         | the number of programmers needed to debug solutions given by
         | the programming genie in response to poorly described feature
         | requests.
         | 
         | If we become mechanics of the software AI vehicles of the
         | future, so be it.
        
         | csee wrote:
         | You're extrapolating across very different types of problems.
         | Go and Chess have unlimited training data. Competitive
         | programming does not.
        
           | raphlinus wrote:
           | To me, that's actually one of the more interesting questions.
           | It's possible to grade the output of the AI against objective
           | criteria, like does it run, and resources consumed (RAM, CPU
           | time, and, particularly of interest to me, parallel scaling,
           | as GPU algorithms are too hard for most programmers). To what
           | extent can you keep training by having the AI generate better
           | and better solutions to a relatively smaller input pool of
           | problems? I skimmed the paper to see how much they relied on
           | this but didn't get a clear read.
        
         | solididiot wrote:
         | >> There'll be several new career paths made possible by this
         | technology as greater worker productivity makes possible
         | greater specialization.
         | 
         | Can you list a few?
        
         | Der_Einzige wrote:
         | I'm already anticipating having the job title of "Query
         | Engineer" sometime in the next 30 years, and I do NLP including
         | large scale language model training. :(
        
           | qualudeheart wrote:
           | One of the big venture capitalists predicted "prompt
           | engineering" as a future high paid and high status position.
           | 
           | Essentially handling large language models.
           | 
           | Early prompt engineers will probably be drawn from "data
           | science" communities and will be similarly high status, well
           | but not as well paid, and require less mathematical
           | knowledge.
           | 
           | I'm personally expecting an "Alignment Engineer" role
           | monitoring AI systems for unwanted behavior.
           | 
           | This will be structurally similar to current cyber security
           | roles but mostly recruited from Machine Learning communities,
           | and embedded in a broader ML ecosystem.
        
             | jonas_kgomo wrote:
             | I like this descriptions better, considering that companies
             | like Anthropic are working specifically on Alignment and AI
             | Safety. Being that the team actually spun out of Deep Mind,
             | it is interesting.
        
               | qualudeheart wrote:
               | Alignment is going be a giant industry and will also
               | include many people not originally in Stem. The
               | humanities and "civil society" will both have their
               | contributions to make.
               | 
               | It's likely that alignment jobs won't themselves be
               | automated because noone will trust AI systems to align
               | themselves.
        
             | sjg007 wrote:
             | >"Alignment Engineer" role monitoring AI systems for
             | unwanted behavior.
             | 
             | ha, I know people already doing this..
        
         | lugu wrote:
         | Depending on what you want to do, you can either choose an
         | industry with very fuzzy requirements (to stay near the
         | programming side) or one with very complex but with strict
         | requirements (to benefit from those coding robots). I guess we
         | will need simulators for most of what we do in order to train
         | those robots.
        
         | buscoquadnary wrote:
         | The problem is this view continues to view software engineers
         | as people that write code, that's not what my job is, it is
         | figuring out how to solve a business problem using technology,
         | and getting people on board with that solution and updating and
         | refining it.
         | 
         | This viewpoint seems to me to be very similar to the idea of
         | 3rd generation languages replacing developers because
         | programming will be so easy, it isn't about how easy it is to
         | write code, I function as a limited mentat taking all the
         | possible requirements, tradeoffs constraints, analyzing them
         | and then building the model, then I write out the code, the
         | code artifact is not the value I add. The artifact is how I
         | communicate the value to the world.
         | 
         | This doesn't make programmers redundant anymore than Ruby, PHP,
         | or Java made developers redundant because it freed them from
         | having to manually remember and track memory usage and
         | pointers, it is at most a tool to reduce the friction of
         | getting what is in my head into the world.
         | 
         | I control the code and whoever controls the code controls the
         | business. I posses the ability to make out the strands of flow
         | control and see the future state of the application. For I am
         | the Sr. Software engineer and I have seen where no Project
         | Manager can see.
         | 
         | Apologies to Frank Herbet I just finished listening to Dune.
         | 
         | EDIT:
         | 
         | I got off track at the end but my point is that no matter how
         | good the tools for developing the code are, they will never
         | replace a software engineer anymore than electric drills and
         | power saws replace home builders. It merely elevates our work.
        
           | qualudeheart wrote:
           | I actually agree with you on that. I had another comment
           | further down the thread where I said that software
           | engineering can't be fully automated by anything short of
           | artificial general intelligence.
           | 
           | As humans we have a coherent world model that current AI
           | systems are nowhere near close to having.
           | 
           | That coherent world model is a necessary precondition for
           | both understanding a business goal and implementing a program
           | to solve it. AlphaCode can do the second part but not the
           | first.
           | 
           | AlphaCode doesn't have that world model and even if it did it
           | still wouldn't autonomously act on it, just follow orders
           | from humans.
           | 
           | Competitive programming is going to be solved much earlier
           | than programming in a business context will, because it's
           | completely independent of business requirements. It's at most
           | half as hard of a problem .
        
         | udev wrote:
         | Yes, for very precise, comprehensive text descriptions of
         | problems.
         | 
         | It will take a far-far more advanced AI to write such
         | descriptions for real-world problems.
         | 
         | Writing requirements for a project is difficult work, and not
         | for technical reasons, but for human reasons (people don't know
         | what they want exactly, people have trouble imagining things
         | they haven't seen yet, people are irrational, people might want
         | something that is different from what they need, etc.)
         | 
         | In this regard, we are safe for a few more decades at least.
        
           | andy_ppp wrote:
           | I would actually argue the programmers job has never been
           | 100% writing the code, it's always been interpreting, fixing
           | and decoding the ideas of others.
        
             | bcrosby95 wrote:
             | I would argue that we figured this out over 50 years ago
             | but oddly enough some people still hold onto the idea.
        
             | tluyben2 wrote:
             | The older I get the more I see it has not been about
             | programming for most tasks for quite a long time. In the
             | early 80s it was a bit more (but not even much more); at
             | that time as well I spent most of my time debugging and
             | changing behaviour slightly (but in a lot of pages) instead
             | of just cranking out huge bags of code.
        
           | tluyben2 wrote:
           | Yes, they have been trying to create 'sufficiently formal
           | human readable text' to spec out projects; not detailed
           | enough to execute by a computer but formal and precise enough
           | so humans know exactly what they are getting. That still
           | doesn't work at all and that is between humans. If the specs
           | are clear enough, the act of programming is already mostly
           | not the issue, however, they never are. I am looking forward
           | to ML helping me writing boring code (which CoPilot already
           | does, but again, that's not really where time/energy is spent
           | anyway) and protect against security issues, scalability
           | issues and all kinds of bugs (it could rewrite algo's it
           | knows; it could recommend libraries that I should use instead
           | of the crap I rolled myself etc).
        
           | qualudeheart wrote:
           | Fully automating software engineering won't happen until AGI.
           | As a good Yuddite I expect us to have bigger problems when
           | that happens.
           | 
           | You need an agent with a large and coherent world model, in
           | order to understand how your programs relate to the real
           | world, in order to solve business tasks.
           | 
           | This isn't something any program synthesis tech currently
           | available can do, because none of it has a coherent world
           | model.
           | 
           | GPT-3 comes closest to this, but isn't able to engage in any
           | kind of planning or abstract modeling, beyond semi coherent
           | extrapolations from training data.
           | 
           | Maybe scaling up GPT by a few more orders of magnitude would
           | work, by generating an emergent world model along the way.
        
             | CobrastanJorji wrote:
             | What is a "Yuddite?" I tried Googling for it and got the
             | impression it was LessWrong forum terminology for people
             | who believed too strongly in LessWrong, but I couldn't find
             | many references.
        
               | nikkwong wrote:
               | I believe he's referring to "luddites" -- a group of
               | people who resisted technological innovation during the
               | industrial revolution.
        
               | indiv0 wrote:
               | Luddite but mixed with "Eliezer Yudkowsky" who is a
               | researcher working on the problem of friendly AI (or
               | whatever they're calling it these days). Basically trying
               | to prevent skynet.
               | 
               | The GP is saying that once we have AGI, then "AGI is
               | going to make the human race irrelevant" outweighs "AGI
               | makes software devs irrelevant".
        
               | qualudeheart wrote:
               | That's the idea.
        
               | qualudeheart wrote:
               | I am a follower of Elizier Yudkowsky.
        
         | steve76 wrote:
        
       | NicoJuicy wrote:
       | I would stop programming if all we needed to write was unit tests
       | :p
        
         | FartyMcFarter wrote:
         | To compensate, lots of people would _start_ programming if that
         | happened though. Many scientists would be interested in solving
         | their field 's problems so easily - certainly maths would
         | benefit from it.
        
           | rmujica wrote:
           | wasn't it this the motivation for Prolog?
        
       | [deleted]
        
       | 37ef_ced3 wrote:
       | The example problem (essentially, is T a subsequence of S with
       | deletions of size N) is a classic problem with no doubt dozens of
       | implementations in AlphaCode's training set.
       | 
       | And yet, what a garbage solution it produces.
       | 
       | To illustrate the difference between intelligence and
       | regurgitation, someone tell me what CoPilot generates for this:
       | // A Go function to swap the sixth bit and seventeenth bit of a
       | 32-bit signed integer.
       | 
       | Here is a human solution:                 func swap(x int32)
       | int32 {           const mask = 1 << 5           var (
       | xor1 = (x>>11 ^ x) & mask               xor2 = xor1 << 11
       | )           return x ^ xor1 ^ xor2       }
       | 
       | CoPilot cannot reason numerically like this (understand
       | "seventeenth bit" and "sixth bit" and generate the right code for
       | that combination). It needs to understand the size of the gap
       | between the bits, i.e., 11, and that's too hard.
        
         | [deleted]
        
         | deanmen wrote:
         | You can do it without a subtraction                    unsigned
         | int swapbits(unsigned int a) {          bool bit6 = a & (1 <<
         | 5); bool bit17 = a & (1 << 16);          if (bit6 == bit17)
         | return a; //bits are the same, do nothing          return (a ^
         | (1 << 5) ^ (1 << 16));           // flip both 6th and 17th bits
         | }
        
           | 37ef_ced3 wrote:
           | And, to be clear, this is a human solution.
           | 
           | Not as efficient as mine, but kudos.
        
         | dskloet wrote:
         | There's really no need for an 11 in the code. I'd say that
         | makes the code worse, not better.
        
           | 37ef_ced3 wrote:
           | This is a toy problem to illustrate that CoPilot cannot write
           | code that requires mathematical reasoning. It regurgitates
           | solutions from the training set, via a mixed internal
           | reresentation.
        
             | deanmen wrote:
             | unsigned int swapbits(unsigned int a)        {
             | bool bit6 = a & (1 << 5);            bool bit17 = a & (1 <<
             | 16);            if (bit6 == bit17) return a; //bits are the
             | same, do nothing            return (a ^ (1 << 5) ^ (1 <<
             | 16)); // flip both 6th and 17th bits        }
        
             | dskloet wrote:
             | What requires mathematical reasoning? Getting or setting
             | the nth bit? Or swapping two variables? What am I missing?
        
         | altcognito wrote:
         | Would we be able to generate unit tests? Strikes me that this
         | would be important to verify given that we didn't even "write"
         | the code. At some point we might not even be looking at the
         | generated code? I almost guarantee that's what is going to
         | happen eventually.
        
           | 37ef_ced3 wrote:
           | You can see it happening already.
           | 
           | Solutions are posted, and they're wrong.
           | 
           | But the CoPilot user can't see the code is wrong.
        
         | [deleted]
        
         | quenix wrote:
         | Here is what CoPilot wrote in Python:                 # A
         | function to swap the sixth bit and seventeenth bit of a 32-bit
         | signed integer.       def swap_bits(x):           # Get the
         | value of the sixth bit.           bit6 = x & (1 << 5)
         | # Get the value of the seventeenth bit.           bit17 = x &
         | (1 << 16)                  # Swap the values of the sixth and
         | seventeenth bit.           bit6 = bit6 >> 5           bit17 =
         | bit17 >> 16                # Combine the swapped values of the
         | sixth and seventeenth bit.           x = x ^ (bit6 << 16)
         | x = x ^ (bit17 << 5)                return x
         | 
         | It actually synthesised about 10 viable solutions. Some
         | interesting ones:                 def swap_six_seventeen(x):
         | # Get the binary representation of the integer.
         | binary = bin(x)[2:]           # Add zeros to the beginning of
         | the binary representation.           binary = '0' * (32 -
         | len(binary)) + binary           # Swap the sixth and
         | seventeenth bit.           binary = binary[:5] + binary[17] +
         | binary[5:17] + binary[18:]           # Convert the binary back
         | to an integer.           return int(binary, 2)
        
           | omnicognate wrote:
           | The first one (swap_bits) sets both bits to the same value,
           | which is the original two bits XORed together. Eg.
           | bin(swap_bits(0b_1_0000000000_0_00000))
           | '0b10000000000100000'
           | bin(swap_bits(0b_0_0000000000_1_00000))
           | '0b10000000000100000'
           | bin(swap_bits(0b_1_0000000000_1_00000))         '0b0'
           | bin(swap_bits(0b_0_0000000000_0_00000))         '0b0'
           | 
           | The second one converts the value to a string and uses string
           | operations, which is wildly inefficient and a very common
           | mistake made by inexperienced programmers unaware of bitwise
           | operations (so presumably common in the training set). It
           | also attempts to swap the 6th and 17th _most_ significant
           | bits rather than the 6th and 17th _least_ significant bits,
           | i.e. counts in the opposite direction to the first one (the
           | comment doesn 't specify but typically you count from the
           | least significant bit in these situations).
           | 
           | Worse, though, it gets the string manipulation completely
           | wrong. I think it's trying for `binary[:5] + binary[16] +
           | binary[6:16] + binary[5] + binary[17:]`, i.e. characters 1-5,
           | then character 17, then characters 7-16, then character 6,
           | then characters 18-32. The manipulation it does just
           | completely mangles the string.
           | 
           | I'm very keen to try Github Copilot if they ever admit me to
           | the beta (I've been waiting forever) and will adopt it
           | enthusiastically if it's useful. However, this is exactly
           | what I've pessimistically expected. Analysing these truly
           | awful implementations to identify the subtle and bizarre
           | misbehaviours has taken me far, far longer than it would have
           | taken me to just write and test a working implementation
           | myself. And I'm supposed to evaluate 10 of these to see if
           | one of them might possibly do the right thing?!?!
        
             | Veedrac wrote:
             | The first example is almost correct, conditioned off a
             | sentence description. The second example is the right idea,
             | it just bit off more than it could chew when slicing it all
             | together. Using string ops for binary manipulation in
             | Python isn't even stupid; it can be faster in a lot of
             | cases.
             | 
             | This feels a lot like screaming at a child for imperfect
             | grammar.
        
               | 37ef_ced3 wrote:
               | It illustrates that CoPilot is generating maximum
               | likelihood token strings and has no real understanding of
               | the code.
               | 
               | That's what is happening here. There is no intelligence,
               | just regurgitation. Randomization and maximum likelihood
               | completion.
               | 
               | Just like with the competitive programming example, we're
               | asking it to produce solutions that it has seen in its
               | training set. If you ask for a nontrivial twist on one of
               | those solutions, it fails.
        
               | hackinthebochs wrote:
               | >It illustrates that CoPilot is generating maximum
               | likelihood token strings and has no real understanding of
               | the code.
               | 
               | Funny, today I was just thinking of people's tendencies
               | to dismiss AI advances with this very pattern of
               | reasoning: take a reductive description of the system and
               | then dismiss it as obviously insufficient for
               | understanding or whatever the target is. The assumption
               | is that understanding is fundamentally non-reductive, or
               | that there is insufficient complexity contained within
               | the reductive description. But this is a mistake.
               | 
               | The fallacy is that the reductive description is glossing
               | over the source of the complexity, and hence where the
               | capabilities of the model reside. "Generating maximum
               | likelihood token strings" doesn't capture the complexity
               | of the process that generates the token strings, and so
               | an argument that is premised on this reductive
               | description cannot prove the model deficient. For
               | example, the best way to generate maximum likelihood
               | human text is just to simulate a human mind. Genuine
               | understanding is within the solution-space of the problem
               | definition in terms of maximum likelihood strings, thus
               | you cannot dismiss the model based on this reductive
               | description.
        
               | 37ef_ced3 wrote:
               | The difference between me and you is that I implement
               | neural nets professionally. Here is one of my (non-
               | professional) open source projects: https://NN-512.com
               | 
               | I'm sure if you understood what the transformer was
               | doing, you would be less impressed.
        
               | hackinthebochs wrote:
               | This is the wrong context to go with an appeal to
               | authority. I know what the transformer is doing, I've
               | also developed neural networks before (though not
               | professionally). Your experience is working against you
               | in developing your intuition. There's another common
               | fallacy that because we're somehow "inside" the system,
               | that we understand exactly what is going on, or in this
               | case what isn't going on. Language models are composed of
               | variations of matrix multiplications, but that isn't a
               | complete description of their behavior. It's like saying
               | because we've looked inside the brain and there's just
               | electrical and chemical signals, the mind must reside
               | somewhere else. It's just a specious argument.
        
               | Veedrac wrote:
               | It got the value of the sixth and seventeenth bits, moved
               | them into the right positions, and inserted them into the
               | original value. Off a one-line description _written in
               | English_! I really cannot empathize with the idea that
               | this is not a meaningful capability. If intelligence only
               | means to you "equal in all capabilities to an experienced
               | human", you are never going to be able to see anything
               | coming ever.
        
               | 37ef_ced3 wrote:
               | If you ask CoPilot to solve something it hasn't seen, it
               | won't be able to solve it.
               | 
               | It's a transformer. Do you understand what that means?
               | It's just matrix multiplication.
               | 
               | It generates maximum likelihood token strings, based on
               | its training data.
               | 
               | It doesn't "understand" what those token string mean.
               | 
               | You are amazed because you're testing the transformer by
               | asking the transformer to generate human-written code
               | THAT IT WAS TRAINED ON. To make CoPilot fail, all you
               | have to do is ask it to generate something unlikely,
               | something it hasn't seen in training.
               | 
               | Maximum likelihood token strings. Period.
        
               | omnicognate wrote:
               | You're misunderstanding my point. Nobody's screaming at
               | anything. Whether this thing is impressive isn't at
               | issue. It's utterly astonishing.
               | 
               | I'm trying to figure out whether copilot in its current
               | form is a tool that will be useful to me in my job. (I'd
               | be able to do this evaluation properly if they'd just let
               | me on the damned beta.)
               | 
               | Nearly right isn't good enough for this afaics. In fact,
               | I expect there to be a slightly paradoxical effect where
               | nearly-right is worse than obviously-wrong. An analysis
               | of a piece of code like I did above is time consuming and
               | cognitively taxing. An obviously wrong solution I can
               | just reject immediately. An almost-right (or at least
               | vaguely plausible) one like these takes _thought_ to
               | reject. Much more thought, in this case (for me, at
               | least) than just writing the thing myself in the first
               | place.
               | 
               | Edit: BTW, I don't get what you're saying with
               | 
               | "The first example is almost correct, conditioned off a
               | sentence description. The second example is the right
               | idea, it just bit off more than it could chew when
               | slicing it all together."
               | 
               | The first one is completely (if subtly) wrong. It's
               | supposed to swap two bits but it sets them to the same
               | value. There's no interpretation of the description in
               | which that's correct.
               | 
               | The second one is definitely not "the right idea". It
               | tries to do it with string manipulations, which
               | (regardless of the fact that it does so incorrectly) is
               | completely the wrong approach. This one is actually
               | "better" than the other in the paradoxical sense I
               | mentioned above, because I could reject it the moment I
               | saw it convert the number to a string.
        
               | Veedrac wrote:
               | > The second one is definitely not "the right idea". It
               | tries to do it with string manipulations, which
               | (regardless of the fact that it does so incorrectly) is
               | completely the wrong approach. This one is actually
               | "better" than the other in the paradoxical sense I
               | mentioned above, because I could reject it the moment I
               | saw it convert the number to a string.
               | 
               | In this case string ops are a worse idea, but as I said
               | before, this is not generally true of Python, at least
               | when using CPython. Eg. the string method is
               | significantly the faster in this example:
               | # https://stackoverflow.com/a/20918545/1763356
               | def reverse_mask(x):             x = ((x & 0x55555555) <<
               | 1) | ((x & 0xAAAAAAAA) >> 1)             x = ((x &
               | 0x33333333) << 2) | ((x & 0xCCCCCCCC) >> 2)             x
               | = ((x & 0x0F0F0F0F) << 4) | ((x & 0xF0F0F0F0) >> 4)
               | x = ((x & 0x00FF00FF) << 8) | ((x & 0xFF00FF00) >> 8)
               | x = ((x & 0x0000FFFF) << 16) | ((x & 0xFFFF0000) >> 16)
               | return x              # My ver         def
               | reverse_format(x):             return
               | int(f"{x:032b}"[::-1], 2)
               | 
               | Python's dynamic object overhead (and to a lesser extent,
               | interpreter overhead) makes a lot of seemingly-expensive
               | operations not matter very much.
        
               | Veedrac wrote:
               | Well, that also seems like the wrong question to ask.
               | Whether it's currently useful to you for writing short
               | algorithms, rather than as the non-programmer's API
               | interface it's primarily marketed as, seems about the
               | least interesting take-away for it. We'll get to
               | smoothing over the cracks later, once it's not a
               | capability we literally just discovered exists. Heck,
               | Codex is already not SOTA for that, AlphaCode is.
        
               | omnicognate wrote:
               | It may not be the question that interests you but who are
               | you to say it's the "wrong question" for me to ask? I
               | want to know if I'm on the verge of having access to a
               | tool that is going to transform the way I do my job, as
               | people keep claiming.
        
           | 37ef_ced3 wrote:
           | All wrong. These "solutions" are not correct.
        
           | UncleMeat wrote:
           | The xors are wrong. Stick in the integer 1<<16.
        
         | c4wrd wrote:
         | Ask and you shall receive, here is the output it gave me: //
         | Write a Go function to swap the sixth bit and seventeenth bit
         | of a 32-bit signed integer.                   package main
         | import (          "fmt"          "math"         )
         | func main() {          var a int32 = 0b101010101010101010101010
         | 1010101010101010101010101010101010101010
         | fmt.Printf("%b\n", a)          fmt.Printf("%b\n", swapBits(a,
         | 6, 17))         }                  func swapBits(a int32, i
         | int, j int) int32 {          // convert to binary          bin
         | := fmt.Sprintf("%b", a)          // get the bits          bit1
         | := bin[i-1 : i]          bit2 := bin[j-1 : j]          // swap
         | the bits          bin = bin[:i-1] + bit2 + bin[i:]          bin
         | = bin[:j-1] + bit1 + bin[j:]          // convert back to int
         | return int32(bin2int(bin))         }                  func
         | bin2int(bin string) int64 {          var sum int64          for
         | i, v := range bin {           if v == '1' {            sum +=
         | int64(math.Pow(2, float64(len(bin)-i-1)))           }
         | }          return sum         }
        
           | 37ef_ced3 wrote:
           | Ridiculous. It's a joke, right? Perhaps the most inefficient
           | and naive solution ever?
           | 
           | CoPilot is regurgitating some solution from its training set,
           | the solution of an inept programmer who would manipulate bits
           | via conversion to string... yikes.
        
             | skulk wrote:
             | The next iteration of code assistant needs to be able to
             | parse responses like your comment and update the code
             | accordingly. Once a human+computer pair can converge on a
             | correct and admissible solution to _any_ tractable
             | programming task through natural language dialogue, we
             | should start worrying about our jobs going away. Until
             | then, for each line of code generated by AI, there will be
             | two jobs created to maintain that code.
        
               | electroly wrote:
               | Copilot can do that, sorta. You undo the completion and
               | add something like "... but don't convert it to a string"
               | to the comment, then have it try completing again.
        
               | hackinthebochs wrote:
               | Which direction in feature space do you move in response
               | to "you inept POS"?
        
       | jdrc wrote:
       | "And so in 2022 the species programmus programmicus went extinct"
        
       | udev wrote:
       | I am thinking whether this result can create a type of loop that
       | can self-optimize.
       | 
       | We have AI to generate reasonable code from text problem
       | description.
       | 
       | Now what if the problem description text is to generate such a
       | system in the first place?
       | 
       | Would it be possible to close the loop, so to speak, so that over
       | many iterations:
       | 
       | - text description is improved
       | 
       | - output code is improved
       | 
       | Would it be possible to create something that converges to
       | something better?
        
         | machiaweliczny wrote:
         | I am actually trying this. Basically by asking questions to AI
         | and teaching it to generate code / google when it doesn't know
         | something. The other process checks if code is valid and either
         | ask it to get more context or executes code and feeds back to
         | file :)
        
           | machiaweliczny wrote:
           | I think one can make problem "differentiable" via some
           | heuristics and if you have NN trained to rate code quality
           | and some understanding what should be used for type of
           | problem, memory and speed and than can classify problem to
           | group then rate solution it should be able to guide the
           | process (in competitive programming).
        
           | indiv0 wrote:
           | Do you have a blog or a github or something? This sounds
           | really neat.
        
       | wilde wrote:
       | Oh sweet! When can skip the bullshit puzzle phone screens?
        
       | doctor_eval wrote:
       | I sometimes read these and wonder if I need to retrain. At my
       | age, I'll struggle to get a job at a similar level in a new
       | industry.
       | 
       | And then I remember that the thing I bring to the table is the
       | ability to turn domain knowledge into code.
       | 
       | Being able to do competitive coding challenges is impressive, but
       | a very large segment of software engineering is about eliciting
       | what the squishy humans in management actually want, putting it
       | into code, and discovering as quickly as possible that it's not
       | what they really wanted after all.
       | 
       | It's going to take a sufficiently long time for AI to take over
       | management that I don't think oldies like me need to worry too
       | much.
        
       | prideout wrote:
       | It is obvious to me that computer programming is an interesting
       | AI goal, but at the same time I wonder if I'm biased, because I'm
       | a programmer. The authors of AlphaCode might be biased in this
       | same way.
       | 
       | I guess this makes sense though, from a practical point of view.
       | Verifying correctness would be difficult in other intellectual
       | disciplines like physics and higher mathematics.
        
         | thomasahle wrote:
         | Just make it output a proof together with the program.
        
       | EGreg wrote:
       | To me, coding in imperative languages are one of the hardest
       | things to produce an AI for with current approaches (CNN's, MCTS
       | and various backpropagation). Something like Cyc would seem to be
       | a lot more promising...
       | 
       | And yet, I am starting to see (with GitHub's Copilot, and now
       | this) a sort of "GPT-4 for code". I do see many problems with
       | this, including:
       | 
       | 1. It doesn't actually "invent" solutions on its own like
       | AlphaZero, it just uses and remixes from a huge body of work that
       | humans put together,
       | 
       | 2. It isn't really ever sure if it solved the problem, unless it
       | can run against a well-defined test suite, because it could have
       | subtle problems in both the test suite and the solution if it
       | generated both
       | 
       | This is a bit like readyplayer.me trying to find the closest
       | combination of noses and lips to match a photo (do you know any
       | open source alternatives to that site btw?)
       | 
       | But this isn't really "solving" anything in an imperative
       | language.
       | 
       | Then again, perhaps human logic is just an approaching with
       | operations using low-dimensional vectors, able to capture simple
       | "explainable" models while the AI classifiers and adversarial
       | training produces far bigger vectors that help model the
       | "messiness" of the real world and also find simpler patterns as a
       | side effect.
       | 
       | In this case, maybe our goal shouldn't be to get solutions in the
       | form of imperative language or logic, but rather unleash the
       | computer on "fuzzy" inputs and outputs where things are "mostly
       | correct 99.999% of the time". The only areas where this could
       | fail is when some intelligent adversarial network exploits
       | weaknesses in that 0.001% and makes it more common. But for
       | natural phenomena it should be good enough !
        
         | qualudeheart wrote:
         | Can you write more about how Cyc would help? The idea behind
         | Cyc is cool but I don't think I've seen anyone discuss using it
         | for program synthesis.
        
       | gfd wrote:
       | Relevant blogpost on codeforces.com (the competitive programming
       | site used): https://codeforces.com/blog/entry/99566
       | 
       | Apparently the bot would have a rating of 1300. Although the elo
       | rating between sites is not comparable, for some perspective,
       | mark zuckerberg had a rating of ~1k when he was in college on
       | topcoder: https://www.topcoder.com/members/mzuckerberg
        
         | baobabKoodaa wrote:
         | The median rating is not descriptive of median ability, because
         | a large number of Codeforces competitors only do one or a few
         | competitions. A very small number of competitors hone their
         | skills over multiple competitions. If we were to restrict our
         | sample to competitors with more than 20 competitions, the
         | median rating would be much higher than 1300. It's amazing that
         | Alphacode achieved a 1300 rating, but compared to humans who
         | actually practice competitive coding, this is a low rating.
         | 
         | To clarify, this is a HUGE leap in AI and computing in general.
         | I don't mean to play it down.
        
           | YeGoblynQueenne wrote:
           | >> To clarify, this is a HUGE leap in AI and computing in
           | general. I don't mean to play it down.
           | 
           | Sorry, but it's nothing of the sort. The approach is
           | primitive, obsolete, and its results are very poor.
           | 
           | I've posted this three times already but the arxiv preprint
           | includes an evaluation against a formal benchmark dataset,
           | APPS. On that more objective measure of performance, the best
           | performing variant of AlphaCode tested, solved 25% of the
           | easiest tasks ("introductory") and less than 10% of the
           | intermediary ("interview") and advanced ("competition")
           | tasks.
           | 
           | What's more, the approach that AlphaCode takes to program
           | generation is primitive. It generates _millions_ of candidate
           | programs and then it  "filters" them by running them against
           | input-output examples of the target programs taken from the
           | problem descriptions. The filtering still leaves thousands of
           | candidate programs (because there are very few I/O examples
           | and the almost random generation can generate too many
           | programs that pass the tests, but still don't solve the
           | problem) so there's an additional step of clustering applied
           | to pare this down to 10 programs that are finally submitted.
           | Overall, that's a brute-force, almost random approach that is
           | ignoring entire decades of program synthesis work.
           | 
           | To make an analogy, it's as if DeepMind had just published an
           | article boasting of its invention of a new sorting
           | algorithm... bubblesort.
        
           | gfd wrote:
           | You can find the rating distribution filtered for >5 contests
           | here: https://codeforces.com/blog/entry/71260
           | 
           | I am rated at 2100+ so I do agree that 1300 rating is low.
           | But at the same time it solved
           | https://codeforces.com/contest/1553/problem/D which is rated
           | at 1500 which was actually non-trivial for me already. I had
           | one wrong submit before getting that problem correct and I do
           | estimate that 50% of the regular competitors (and probably
           | the vast majority of the programmers commenting in this
           | thread right now) should not be able to solve it within 2hrs.
        
             | rfoo wrote:
             | 1553D is a quite confusing case though.
             | 
             | On the AlphaCode Attention Visualization website [1], the
             | _Accepted_ code shown for 1553D is a O(n^2) Python one,
             | which is supposed to be TLE. It correctly implements a two-
             | pointer solution, but failed to  "realize" that list.pop(0)
             | is O(n) in Python. I'm not sure how it passed.
             | 
             | [1] https://alphacode.deepmind.com/#layer=30,problem=34,hea
             | ds=11...
        
               | Jensson wrote:
               | Likely the python runtime has a strange string
               | implementation for cases like this, just like javascript
               | strings.
        
             | the-smug-one wrote:
             | I'm trying to solve this for fun, but I'm stuck! I've got a
             | recursive definition that solves the problem by building a
             | result string. I think it's a dynamic programming problem,
             | but right now I can't see the shared sub-problems so :).
             | Some real sour cherries being experienced from not getting
             | this one!
        
             | johndough wrote:
             | The proposed O(N2) solution contains many unnecessary
             | operations, e.g. the creation of list c or reversal of the
             | input strings. Maybe it has been copied from a related
             | problem? You can easily solve the task with half as many
             | lines in O(N).                   for _ in
             | range(int(input())):             a = list(input())
             | b = list(input())             while a and b:
             | if a[-1] == b[-1]:                     a.pop()
             | b.pop()                 else:                     a.pop()
             | if a: a.pop()             print("NO" if b else "YES")
        
             | pedrosorio wrote:
             | > But at the same time it solved
             | https://codeforces.com/problemset/problem/1553/D
             | 
             | To be fair, it generated a set of (10) possible solutions,
             | and at least one of them solved the problem.
        
         | captain_price7 wrote:
         | For comparison, I used to be a very average, but pretty regular
         | user about 5 years ago. I could reliably solve easiest 2 out of
         | 5 problems, 3 in my lucky days.
         | 
         | My rating is 1562.
        
       | jakey_bakey wrote:
       | At the risk of sounding relentlessly skeptical - surely by
       | training the code on GitHub data you're not actually creating an
       | AI to solve problems, but creating an extremely obfuscated
       | database of coding puzzle solutions?
        
         | ogogmad wrote:
         | _We validated our performance using competitions hosted on
         | Codeforces, a popular platform which hosts regular competitions
         | that attract tens of thousands of participants from around the
         | world who come to test their coding skills. We selected for
         | evaluation 10 recent contests, each newer than our training
         | data. AlphaCode placed at about the level of the median
         | competitor, marking the first time an AI code generation system
         | has reached a competitive level of performance in programming
         | competitions._
         | 
         | [edit] Is "10 recent contests" a large enough sample size to
         | prove whatever point is being made?
        
           | [deleted]
        
           | YeGoblynQueenne wrote:
           | The test against human contestants doesn't tell us anything
           | because we have no objective measure of the ability of those
           | human coders (they're just the median in some unknown
           | distribution of skill).
           | 
           | There's more objective measures of performance, like a good,
           | old-fashioned, benchmark dataset. For such an evaluation, see
           | table 10 in the arxiv preprint (page 21 of the pdf), listing
           | the results against the APPS dataset of programming tasks.
           | The best performing variant of AlphaCode solves 25% of the
           | simplest ("introductory") APPS tasks and less than 10% of the
           | intermediary ("interview") and more advanced ones
           | ("competition").
           | 
           | So it's not very good.
           | 
           | Note also that the article above doesn't report the results
           | on APPS. _Because_ they 're not that good.
        
         | solididiot wrote:
         | Does it need to solve original problems? Most of the code we
         | write is dealing with the same problems in a slightly different
         | context each time.
         | 
         | As others say in commends it might be the case where we meet in
         | the middle. Us writing some form of tests for AI-produced code
         | to pass.
        
         | qualudeheart wrote:
         | That's been a common objection to Copilot and other recent
         | program synthesis papers.
         | 
         | The models regurgitate solutions to problems already
         | encountered in the training set. This is very common with
         | Leetcode problems and seems To still happen with harder
         | competitive programming problems.
         | 
         | I think someone else in this thread even pointed put an example
         | of AlphaCode doing the same thing.
        
       | FiberBundle wrote:
       | It never ceases to amaze me what you can do with these
       | transformer models. They created millions of potential solutions
       | for each problem, used the provided examples for the problems to
       | filter out 99% of incorrect solutions and then applied some more
       | heuristics and the 10 available submissions to try to find a
       | solution.
       | 
       | All these approaches just seem like brute-force approaches: Let's
       | just throw our transformer on this problem and see if we can get
       | anything useful out of this.
       | 
       | Whatever it is, you can't deny that these unsupervised models
       | learn some semantic representations, but we have no clue at all
       | what that actually is and how these model learn that. But I'm
       | also very sceptical that you can actually get anywhere close to
       | human (expert) capability in any sufficiently complex domain by
       | using this approach.
        
         | bricemo wrote:
         | What do you think then is the difference between going from
         | 50th to 99.9th percentile in their other domains? Is there
         | something materially different between ago, protein folding, or
         | coding? (I don't know the answer, just curious if anyone else
         | does)
        
           | jahewson wrote:
           | That's a big question but I'm tempted to answer it with a
           | yes. A protein sequence contains a complete description of
           | the structure of a protein but a coding question contains
           | unknowns and the answers contain subjective variability.
        
           | FiberBundle wrote:
           | Well with respect to Go the fundamental difference afaict is
           | that you can apply self-supervised learning, which is an
           | incredibly powerful approach (But note e.g. that even this
           | approach wasn't successful in "solving" Starcraft).
           | Unfortunately it's extremely difficult to frame real-world
           | problems in that setting. I don't know anything about
           | protein-folding and don't know what Deepmind uses to try to
           | solve that problem, so I cannot comment on that.
        
             | cjbprime wrote:
             | > this approach wasn't successful in "solving" Starcraft)
             | 
             | Why do you say that? As I understand it, AlphaStar beat
             | pros consistently, including a not widely reported
             | showmatch against Serral when he was BlizzCon champ.
        
               | zwaps wrote:
               | Not once humans adapted to it afaik. AlphaStar got to top
               | grandmaster level and then that was it, as people found
               | ways to beat it. Now, it may be that the team considered
               | the project complete and stopped training it. But
               | technically - as it stands - Starcraft is still the one
               | game where humans beat AI.
        
               | gavagai691 wrote:
               | Two possible reasons.
               | 
               | 1. First, though I am not sure of this (i.e. this should
               | be verified), I heard that the team working on AlphaStar
               | initially tried to create a Starcraft AI entirely through
               | "self-play," but this was not successful. (Intuitively,
               | in a real-time game, there are too many bad options too
               | early on that even with a LOT of time to learn, if your
               | approach is too "random" you will quickly enter an
               | unwinnable position and not learn anything useful.) As a
               | result, they replaced this approach with an approach
               | which incorporated learning from human games.
               | 
               | 2. "including a not widely reported showmatch against
               | Serral when he was BlizzCon champ." is a
               | mischaracterization. It was not a "showmatch," rather
               | there was a setup at Blizzcon where anyone could sit down
               | and play against AlphaStar, and Serral at some point sat
               | down to play AlphaStar there. He went 0-4 vs AlphaStar's
               | protoss and zerg, and 1-0 vs its Terran. However, not
               | only was he not using his own keyboard and mouse, but he
               | could not use any custom hotkeys. If you do not play
               | Starcraft it may not be obvious just how large of a
               | difference this could make. BTW, when Serral played
               | (perhaps an earlier iteration of) AlphaStar's terran on
               | the SC2 ladder, he demolished it.
               | 
               | I remember when seeing the final report, I was a bit
               | disappointed. It seemed like they cut the project off at
               | a strange point, before AlphaStar was clearly better than
               | humans. I feel that if they had continued they could have
               | gotten to that point, but now we will never know.
        
         | briga wrote:
         | Another way to frame it is that these models still perform very
         | poorly at the task they're designed to do. Imagine if real
         | programmer needed to write a solution a hundred times before
         | they were able to achieve (average) performance. You'd probably
         | wonder if it was just blind luck that got them to the solution.
         | You'd also fire them. What these models are very good at doing
         | is plagiarizing content, so part of me wonders if they aren't
         | just copying previous solutions with slight adjustments.
        
       | zmmmmm wrote:
       | Has nobody yet asked it to write itself?
        
       | ensan wrote:
       | Wake me up when an AI creates an operating system on the same
       | level of functionality as early-years Linux.
        
       | timetotea wrote:
       | If you want some video explanation https://youtu.be/Qr_PCqxznB0
        
       | pedrobtz wrote:
       | What about finding bugs, zero-day exploits?
        
       | erwincoumans wrote:
       | It would be interesting if a future 'AlphaZeroCode' with access
       | to a compiler and debugger can learn to code, generating data
       | using self-play. Haven't read the paper yet, seems some
       | impressive milestone.
        
       | [deleted]
        
       | throwaway5752 wrote:
       | Most people here are programmers (or otherwise involved in the
       | production of software). We shouldn't look at RPA and other job
       | automation trends dispassionately. SaaS valuations aren't were
       | they are (and accounting doesn't treat engineering salary as cost
       | of goods sold) because investors believe that they will require
       | armies of very well paid developers in perpetuity.
        
         | countvonbalzac wrote:
         | what?
        
       | londons_explore wrote:
       | > AlphaCode placed at about the level of the median competitor,
       | 
       | In many programming contests, a large number of people can't
       | solve the problem at all, and drop out without submitting
       | anything. Frequently that means the median scoring solution is a
       | blank file.
       | 
       | Therefore, without further information, this statement shouldn't
       | be taken to be as impressive as it sounds.
        
       | [deleted]
        
       | d0mine wrote:
       | It reminds me that median reputation on StackOverflow is 1. All
       | AlphaSO would have to do is to register to receive median
       | reputation on SO ;) (kidding aside AlphaCode sounds like magic)
       | 
       | Inventing relational DBs hasn't replaced programmers, we just
       | write custom DB engines less often. Inventing electronic
       | spreadsheets hasn't deprecated programmers, it just means that we
       | don't need programmers for corresponding tasks (where
       | spreadsheets work well).
       | 
       | AI won't replace programmers until it grows to replace the
       | humanity as a whole.
        
         | falcor84 wrote:
         | >AI won't replace programmers until it grows to replace the
         | humanity as a whole.
         | 
         | Yes, but after seeing this progress in the former, my time
         | estimate of time remaining until the latter had just
         | significantly shortened.
        
         | qualudeheart wrote:
         | I don't even think the "will AI replace human programmers"
         | question is that interesting anymore. My prediction is that a
         | full replacement won't happen until we achieve general
         | artificial intelligence, and have it treat programming as it
         | would any other problem.
         | 
         | Elsewhere ITT I've claimed that to fully automate programming
         | you also need a model of the external world that's on par with
         | a humans.
         | 
         | Otherwise you can't work a job because you don't know how to do
         | the many other tasks that aren't coding.
         | 
         | You need to understand what the business goals are and how your
         | program solves them.
        
       | a-dub wrote:
       | > In our preprint, we detail AlphaCode, which uses transformer-
       | based language models to generate code at an unprecedented scale,
       | and then smartly filters to a small set of promising programs
       | 
       | if you're using a large corpus of code chunks from working
       | programs as symbols in your alphabet, i wonder how much entropy
       | there actually is in the space of syntactically correct solution
       | candidates.
        
       | softwaredoug wrote:
       | I think CoPilot, etc will be revolutionary tools AND I think
       | human coders are needed. Specifically I love CoPilot for the task
       | of "well specified algorithm to solve problem with well-defined
       | inputs and outputs". The kind of problem you could describe as a
       | coding challenge.
       | 
       | BUT, our jobs have a lot more complexity
       | 
       | - Local constraints - We almost always work in a large, complex
       | existing code base with specific constraints
       | 
       | - Correctness is hard - writing lots of code is usually not the
       | hard part, it's proving it correct against amorphous
       | requirements, communicated in a variety of human social contexts,
       | and bookmarked.
       | 
       | - Precision is extremely important - Even if 99% of the time,
       | CoPilot can spit out a correct solution, the 1% of the time it
       | doesn't creates a bevy of problems
       | 
       | Are those insurmountable problems? We'll see I suppose, but we
       | begin to verge on general AI if we can gather and understand half
       | a dozen modalities of social context to build a correct solution.
       | 
       | Not to mention much of the skill needed in our jobs has much more
       | to do with soft skills, and the bridge between the technical and
       | the non technical, and less to do with hardcore heads-down
       | coding.
       | 
       | Exciting times!
        
       | tasubotadas wrote:
       | I just hope that this shows how useless competitive programming
       | is that it can be replace by the Transformer-model.
       | 
       | Additionally, people should REALLY rething their coding
       | interviews if they can be solved by a program.
        
       | msoad wrote:
       | This seems to have a narrower scope than GitHub Copilot. It
       | generates more lines of code to a more holistic problem vs.
       | GitHub Copilot that works as a "more advanced autocomplete" in
       | code editors. Sure Copilot can synthesize full functions and
       | classes but for me, it's the most useful when it suggests another
       | test case's title or writes repetitive code like this.foo = foo;
       | this.bar = bar etc...
       | 
       | Having used Copilot I can assure you that this technology won't
       | replace you as a programmer but it will make your job easier by
       | doing things that programmers don't like to do as much like
       | writing tests and comments.
        
         | ipnon wrote:
         | The big question seems to be whether par with professional
         | programmers is a matter of increasing training set and flop
         | size, or whether different model or multi-model architectures
         | are required.
         | 
         | It does look like we've entered an era where programmers who
         | don't use AI assistants will be disadvantaged, and that this
         | era has an expiration date.
        
         | stupidcar wrote:
         | Having used Copilot for a while, I am quite certain it _will_
         | replace me as a programmer.
         | 
         | It appears to me that when it comes to language models,
         | intelligence = experience * context. Where experience is the
         | amount what's encoded in the model, and context is the prompt.
         | And the biggest limitation on Copilot currently is context. It
         | behaves as an "advanced autocomplete" because it all is has to
         | go on is what regular autocomplete sees, e.g. the last few
         | characters and lines of code.
         | 
         | So, you can write a function name called createUserInDB() and
         | it will attempt to complete it for you. But how does it know
         | what DB technology you're using? Or what your user record looks
         | like? It doesn't, and so you typically end up with a "generic"
         | looking function using the most common DB tech and naming
         | conventions for your language of choice.
         | 
         | But now imagine a future version of Copilot that is
         | automatically provided with a lot _more_ context. It also gets
         | fed a list of your dependencies, from which it can derive which
         | DB library you 're using. It gets any locatable SQL schema
         | file, so it can determine the columns in the user table. It
         | gets the text of the Jira ticket, so it can determine the
         | requirements.
         | 
         | As a programmer a great deal of time is spent checking these
         | different sources and synthesising them in your head into an
         | approach, which you then code. But they are all just text, of
         | one form or another, and language models can work with them
         | just as easily, and much faster, than you can.
         | 
         | And one the ML train coding gets running, it'll only get
         | faster. Sooner or later Github will have a "Copilot bot" that
         | can automatically make a stab at fixing issues, which you then
         | approve, reject, or fix. And as thousands of these issues pile
         | up, the training set will get bigger, and the model will get
         | better. Sooner or later it'll be possible to create a repo,
         | start filing issues, and rely on the bot to implement
         | everything.
        
           | solarmist wrote:
           | I'm skeptical it'll replace programmers, as in no more human
           | programmers, but agree in the sense 100% human programmers ->
           | 50%, 25%, 10% human programmers + computers doing most of the
           | writing of actual code.
           | 
           | I see it continuing to evolve and becoming a far superior
           | auto-complete with full context, but, short of actual general
           | AI, there will always be a step that takes a high-level
           | description of a problem and turns it into something a
           | computer can implement.
           | 
           | So while it will make the remaining programmers MUCH more
           | productive, thereby reducing the needed number of
           | programmers, I can't see it driving that number to zero.
        
             | mabub24 wrote:
             | It will probably change the types of things a programmer
             | does, and what it looks like to be a programmer. The nitty
             | gritty of code _writing_ will probably get more and more
             | automated. But the architecture of the code, and
             | establishing and selecting it 's purpose in the larger
             | scheme of a business, will probably be more what
             | programmers do. Essentially, they might just become
             | managers for automated code writers, similar to the
             | military's idea of future fighter pilots relating to
             | autonomous fighters/drones as described in this article:
             | 
             | https://www.newyorker.com/magazine/2022/01/24/the-rise-of-
             | ai...
             | 
             | Maybe. It might never get to that level though.
        
               | solarmist wrote:
               | Yup, I think that's it exactly. I just described this in
               | another comment as a reverse of the evolution that
               | graphic design has undergone in bringing them into
               | programming front-ends.
               | 
               | I can't wait to see how far we're able to go down that
               | path.
        
           | TSiege wrote:
           | I have a feeling this is the correct read in terms of
           | progression. But I'm skeptical if it'll ever be able to
           | synthesize a program entirely. I imagine that in the future
           | we'll have some sort of computer language more like written
           | language that will be used by some sort of AI to generate
           | software to meet certain demands, but might need some manual
           | connections when requirements are hazy or needs a more human
           | touch in the UI/UX
        
             | Veedrac wrote:
             | > But I'm skeptical if it'll ever be able to synthesize a
             | program entirely.
             | 
             | Emotional skepticism carries a lot more weight in worlds
             | where AI isn't constantly doing things that are meant to be
             | infeasible, like coming 54th percentile in a competitive
             | programming competition.
             | 
             | People need to remember that AlexNet is 10 years old. At no
             | point in this span have neural networks stopped solving
             | things they weren't meant to be able to solve.
        
               | solarmist wrote:
               | I feel like you're taking that sentence a bit too
               | literally. I read it as "I'm skeptical that AI will ever
               | be able to take a vague human description from a product
               | manager/etc. and solve it without an engineer-type person
               | in the loop." The issue is humans don't know what they
               | want and realistically programs require a lot of
               | iteration to get right, no amount of AI can solve that.
               | 
               | I agree with you; it seems obvious to me that once you
               | get to a well-specified solution a computer will be able
               | to create entire programs that solve user requirements.
               | And that they'll start small, but expand to larger and
               | more complex solutions over time in the same way that no-
               | code tools have done.
        
           | Hgsb wrote:
           | Google Ambiguity.
        
         | sharemywin wrote:
         | To me it's not about it's current capabilities. It's the
         | trajectory. This tech wasn't even a thing 2 years ago. There's
         | billions being poured into it and every time someone uses these
         | tools there's more free training data.
        
         | chongli wrote:
         | _repetitive code like this.foo = foo; this.bar = bar etc..._
         | 
         | This sort of boilerplate code is best solved by the programming
         | language. Either via better built-in syntax or macros. Using an
         | advanced machine learning model to generate this code is both
         | error-prone and a big source of noise and code bloat. This is
         | not an issue that will go away with better tooling; it will
         | only get worse.
        
           | xmprt wrote:
           | I don't think I agree. Most people spend more time reading
           | than writing code so programming languages should be
           | optimized to be easier to read whereas tooling should be made
           | to simplify writing code. New syntax or macros sounds like it
           | would make the language harder to read. I agree that an
           | advanced machine learning model for generating boilerplate
           | code isn't the right approach but I also don't think we
           | should extend languages for this. Tooling like code
           | generators and linters are a good middle ground.
        
             | RangerScience wrote:
             | FYI+IMO: Both Ruby and Scala have excellent ways to reduce
             | these issues that occur at the language level, and make it
             | easier to both read and write. I don't know either way if
             | that means you should extend languages to handle it, but at
             | least it's definitively possible to write the language that
             | way from the beginning.
             | 
             | Otherwise yup, agree with you; ML for problematic
             | boilerplate isn't the right approach, but other code
             | generators and linters are really good and get you most of
             | the way there.
        
             | orangecat wrote:
             | _New syntax or macros sounds like it would make the
             | language harder to read._
             | 
             | Often the opposite is true. For example Java records are
             | far easier to read and understand than the pages of
             | boilerplate that they replace.
        
           | valyagolev wrote:
           | it is a very similar argument to the one for powerful IDEs
           | and underwhelming languages. to be fair, it's not necessarily
           | fruitless - e.g. with smalltalk. i fail to see the analoguous
           | smalltalk-style empowerment of language using AI but perhaps
           | something is there.
           | 
           | anyway. programming is automation; automation of programming
           | is abstraction. using AI to write your code is just a bad
           | abstraction - we are used to them
        
         | jxcole wrote:
         | I feel like you are very defensive here and I want to be sure
         | we take time to recognize this as a real accomplishment.
         | 
         | Seriously though, I do doubt I can be fully replaced by a robot
         | any time soon, it may be the case that soon enough I can make
         | high-level written descriptions of programs and hand them off
         | to an AI to do most of the work. This wouldn't completely
         | replace me, but it could make developers 50x productive. The
         | question is how elastic is the market...can the market grow in
         | step with our increase in productivitiy?
         | 
         | Also, please remember that as with anything, within 5 years we
         | should see vast improvements to this AI. I think it will be an
         | important thing to watch.
        
           | nsxwolf wrote:
           | Yesterday, I spent several hours figuring out if the business
           | requirement for "within the next 3 days" meant 3 calendar
           | days or 72 hours from now. Then about 10 minutes actually
           | writing the code. Everyone thought my efforts were very
           | valuable.
        
             | RangerScience wrote:
             | 100%. What makes us what we are is the mindset (in this
             | case, this kind of "attention to detail); that didn't
             | change with (first) compilers, (then) scripting languages,
             | or (future?) AI-assisted programming.
             | 
             | PS - Lawyers aren't even as detail-oriented as we are, it's
             | surprising.
        
               | solarmist wrote:
               | Really?
               | 
               | Maybe that's true in general because the spread in skill
               | for being able to make a living as a lawyer and the same
               | as a programmer depends far less on that attention to
               | detail being a core skill. Still, I wonder if that also
               | holds at the high levels of the profession. I get the
               | impression that at the FAANG-level, lawyers would compare
               | pretty favorably to programmers in detail orientation. In
               | particular, patent and contract law.
               | 
               | That said, it's just my general impression of what
               | lawyers get up to.
               | 
               | ...Hmm, thinking about the contract law thing a bit more.
               | Yeah, I do believe you are right. Lawyers aren't writing
               | nearly as many extremely detail-oriented texts as
               | programmers are on a day-to-day basis. Their jobs are
               | much more around finding, reading, and understanding
               | those things and building stories around them.
        
           | visarga wrote:
           | The GPT family has already shown more than 50x productivity
           | increase by being able to solve not one, but hundreds and
           | perhaps thousands of tasks on the same model. We used to need
           | much more data, and the model would be more fragile, and
           | finding the right architecture would be a problem. Now we
           | plug a transformer with a handful of samples and it works.
           | 
           | I just hope LMs will prove to be just as useful in software
           | development as they are in their own field.
        
           | thomasahle wrote:
           | If you make developers 50x more efficient, won't you need 50x
           | fewer developers?
        
             | bmh100 wrote:
             | Not necessarily. Demand may be much higher than available
             | supply right now. Tech companies will continue to compete,
             | requiring spending on developers to remain competitive.
             | Software is unlike manufacturing, in that the output is a
             | service, not a widget. Worker productivity in general has
             | not decreased the demand for full work weeks, despite
             | projections in the early 20th century to the contrary. Of
             | course, it is possible that fewer developers would be
             | needed, but I don't think it's likely, yet.
        
             | alasdair_ wrote:
             | >If you make developers 50x more efficient, won't you need
             | 50x fewer developers?
             | 
             | Developers today are 50X more efficient than when they had
             | to input machine code on punched tape, yet the number of
             | developers needed today is far larger than it was in those
             | times.
        
               | throw10920 wrote:
               | There's no reason to believe that we'll need _another_
               | 50x more developers, though.
        
               | solarmist wrote:
               | There isn't? I feel like there's still a ton of places
               | software hasn't even touched and not because it doesn't
               | make sense, but because no one's gotten to it. It's not
               | the most profitable thing people could write software
               | for.
        
               | alasdair_ wrote:
               | Even if not, the original claim was that we may see a 50X
               | _decrease_ and I personally don 't think that is likely,
               | pre-Singularity anyway :)
        
               | qualudeheart wrote:
               | But think how large of a job program that would have
               | been.
               | 
               | Hundreds of people manually writing assembly and paid
               | middle class wages. Not a compiler in sight.
               | 
               | In the years leading up to the singularity I'd expect to
               | see a lot of Graeberian "Bullshit Jobs".
               | 
               | Everyone knows they're BS but as a society we allow them
               | because we aren't willing to implement socialism or UBI.
        
               | woadwarrior01 wrote:
               | https://en.m.wikipedia.org/wiki/Jevons_paradox
        
             | kevlened wrote:
             | Greater efficiency leads to greater consumption unless
             | demand is saturated. Given software's ability to uncover
             | more problems that are solvable by software, we're more
             | likely to build 50x more software.
        
             | RangerScience wrote:
             | This happened with the introduction of power tools to set
             | building in Hollywood back in the day - literally this same
             | question.
             | 
             | People just built bigger sets, and smaller productions
             | became financially feasible. Ended up creating demand, not
             | reducing it.
        
           | 0xdeadbeefbabe wrote:
           | > but it could make developers 50x productive
           | 
           | More likely it will translate the abstraction level by some
           | vector of 50 elements.
        
       | blt wrote:
       | I am always surprised by the amount of skepticism towards deep
       | learning on HN. When I joined the field around 10 years ago,
       | image classification was considered a grand challenge problem
       | (e.g. https://xkcd.com/1425/). 5 years ago, only singularity
       | enthusiast types were envisioning things like GPT-3 and Copilot
       | in the short term.
       | 
       | I think many people are uncomfortable with the idea that their
       | own "intelligent" behavior is not that different from pattern
       | recognition.
       | 
       | I do not enjoy running deep learning experiments. Doing resource-
       | hungry empirical work is not why I got into CS. But I still
       | believe it is very powerful.
        
       | jonas_kgomo wrote:
       | Genuine question, what are the reasons to be a software engineer
       | without much ML knowledge in 2022. Seems like a wake up call for
       | developers
        
         | eulers_secret wrote:
         | > what are the reasons to be a software engineer without much
         | ML knowledge in 2022.
         | 
         | I'm not quite sure what you're asking, but my reason is that I
         | _do not enjoy_ working on /with ML. I'd personally rather quit
         | the industry.
         | 
         | But I work in embedded/driver development. I do not worry about
         | ML models replacing me yet, but if I were just gluing together
         | API calls I would be a bit worried and try to specialize.
        
         | qualudeheart wrote:
         | Find something that's hard and interesting. Someone will
         | probably have a business trying to solve it and will hire you.
        
         | jonas_kgomo wrote:
         | 7 months ago, I asked natfriedman the same question, of which
         | he responded: "We think that software development is entering
         | its third wave of productivity change. The first was the
         | creation of tools like compilers, debuggers, garbage
         | collectors, and languages that made developers more productive.
         | The second was open source where a global community of
         | developers came together to build on each other's work. The
         | third revolution will be the use of AI in coding. The problems
         | we spend our days solving may change. But there will always be
         | problems for humans to solve."
         | 
         | https://news.ycombinator.com/item?id=27676266&p=2
        
         | slingnow wrote:
         | Genuine question: what are the reasons to be a carpenter
         | without much robotics / automation knowledge in 2022. Seems
         | like a wakeup call for carpenters.
        
         | 0xdeadbeefbabe wrote:
         | I hope you are right, but just to answer the question: all
         | those other AI winters.
        
           | jonas_kgomo wrote:
           | Thats a good meditation. I think the winters were more driven
           | by research dichotomy, for example Marvin Minsky's critique
           | of the perceptron really slowed the research by 10 years.
           | Advances made thus far have too much commercial relevance
           | that companies invested dont look like they are gonna stop
           | soon. But its a valid point. Looks like there is more upside
           | being in subsets of computing like quantum computing, web3,
           | metaverse etc than being a regular front-end engineer
        
       | agentultra wrote:
       | This is kind of neat. I wonder if it will one day be possible for
       | it to find programs that maintain invariant properties we state
       | in proofs. This would allow us to feel confident that even though
       | it's generating huge programs that do weird things a human might
       | not think of... well that it's still _correct_ for the stated
       | properties we care about, ie: that it 's not doing anything
       | underhanded.
        
       | jdrc wrote:
       | I think it would be interesting the train a system end-to-end
       | with assembly code instead of various programming languages. This
       | would make it a much more generic compiler
        
       | ahgamut wrote:
       | I find almost every new advance in deep learning is accompanied
       | by contrasting comments: it's either "AI will soon automate
       | programming/<insert task here>", or "let me know when AI can
       | actually do <some-difficult-task>". There are many views on this
       | spectrum, but these two are sure to be present in every comment
       | section.
       | 
       | IIUC, AlphaCode was trained on Github code to solve competitive
       | programming challenges on Codeforces, some of which are
       | "difficult for a human to do". Suppose AlphaCode was trained on
       | Github code that contains the entire set of solutions on
       | Codeforces, is it actually doing anything "difficult"? I don't
       | believe it would be difficult for a human to solve problems on
       | Codeforces when given access to the entirety of Github (indexed
       | and efficiently searchable).
       | 
       | The general question I have been trying to understand is this: is
       | the ML model doing something that we can _quantify_ as
       | "difficult to do (given this particular training set)"? I would
       | like to compute a number that measures how difficult it is for a
       | model to do task X given a large training set Y. If the X is part
       | of the training set, the difficulty should be _zero_. If X is
       | obtained only by combining elements in the training, maybe it is
       | harder to do. My efforts to answer this question:
       | https://arxiv.org/abs/2109.12075
       | 
       | In recent literature, the RETRO Transformer
       | (https://arxiv.org/pdf/2112.04426.pdf) talks about "quantifying
       | dataset leakage", which is related to what I mentioned in the
       | above paragraph. If many training samples are also in the test
       | set, what is the model actually learning?
       | 
       | Until deep learning methods provide a measurement of
       | "difficulty", it will be difficult to gauge the prowess of any
       | new model that appears on the scene.
        
         | pedrosorio wrote:
         | > Suppose AlphaCode was trained on Github code that contains
         | the entire set of solutions on Codeforces, is it actually doing
         | anything "difficult"?
         | 
         | They tested it on problems from recent contests. The
         | implication being: the statements and solutions to these
         | problems were not available when the Github training set was
         | collected.
         | 
         | From the paper [0]: "Our pre-training dataset is based on a
         | snapshot of selected public GitHub repositories taken on
         | 2021/07/14" and "Following our GitHub pre-training dataset
         | snapshot date, all training data in CodeContests was publicly
         | released on or before 2021/07/14. Validation problems appeared
         | between 2021/07/15 and 2021/09/20, and the test set contains
         | problems published after 2021/09/21. This temporal split means
         | that only information humans could have seen is available for
         | training the model."
         | 
         | At the very least, even if some of these problems had been
         | solved exactly before, you still need to go from "all of the
         | code in Github" + "natural language description of the problem"
         | to "picking the correct code snippet that solves the problem".
         | Doesn't seem trivial to me.
         | 
         | > I don't believe it would be difficult for a human to solve
         | problems on Codeforces when given access to the entirety of
         | Github (indexed and efficiently searchable).
         | 
         | And yet, many humans who participate in these contests are
         | unable to do so (although I guess the issue here is that Github
         | is not properly indexed and searchable for humans?).
         | 
         | [0] https://storage.googleapis.com/deepmind-
         | media/AlphaCode/comp...
        
           | ahgamut wrote:
           | > They tested it on problems from recent contests. The
           | implication being: the statements and solutions to these
           | problems were not available when the Github training set was
           | collected.
           | 
           | Yes, and I would like to know how similar the dataset(s)
           | were. Suppose the models were trained only on greedy
           | algorithms and then I provided a dynamic programming problem
           | in the test set, (how) would the model solve it?
           | 
           | > And yet, many humans who participate in these contests are
           | unable to do so (although I guess the issue here is that
           | Github is not properly indexed and searchable for humans?).
           | 
           | Indeed, so we don't know what "difficult" means for
           | <human+indexed Github>, and hence we cannot compare it to
           | <model trained on Github>.
           | 
           | My point is, whenever I see a new achievement of deep
           | learning, I have no frame of reference (apart from my
           | personal biases) of how "trivial" or "awesome" it is. I would
           | like to have a quantity that measures this - I call it
           | generalization difficulty.
           | 
           | Otherwise the datasets and models just keep getting larger,
           | and we have no idea of the full capability of these models.
        
             | pedrosorio wrote:
             | > Suppose the models were trained only on greedy algorithms
             | and then I provided a dynamic programming problem in the
             | test set, (how) would the model solve it?
             | 
             | How many human beings do you personally know who were able
             | to solve a dynamic programming problem at first sight
             | without ever having seen anything but greedy algorithms?
             | 
             | Deepmind is not claiming they have a machine capable of
             | performing original research here.
             | 
             | Many human programmers are unable to solve DP problems even
             | after having them explained several times. If you could get
             | a machine that takes in all of Github and can solve "any"
             | DP problem you describe in natural language with a couple
             | of examples, that is AI above and beyond what many humans
             | can do, which is "awesome" no matter how you put it.
        
               | sibeshk96 wrote:
               | > that is AI above and beyond what many humans can do,
               | which is "awesome" no matter how you put it.
               | 
               | That's not the point being made. The point OP is making
               | is that it is not possible to understand how impressive
               | at "generalizing" to uncertainty a model is if you don't
               | know how different the training set is from the test set.
               | If they are extremely similar to each other, then the
               | model generalizes weakly (this is also why the world's
               | smartest chess bot needs to play a million games to beat
               | the average grandmaster, who has played less than 10,000
               | games in her lifetime). Weak generalization vs strong
               | generalization.
               | 
               | Perhaps all such published results should contain info
               | about this "difference" so it becomes easier to judge the
               | model's true learning capabilities.
        
               | ahgamut wrote:
               | > How many human beings do you personally know who were
               | able to solve a dynamic programming problem at first
               | sight without ever having seen anything but greedy
               | algorithms?
               | 
               | Zero, which is why if a trained network could do it, that
               | would be "impressive" to me, given my personal biases.
               | 
               | >. If you could get a machine that takes in all of Github
               | and can solve "any" DP problem you describe in natural
               | language with a couple of examples, that is AI above and
               | beyond what many humans can do, which is "awesome" no
               | matter how you put it.
               | 
               | I agree with you that such a machine would be awesome,
               | and AlphaCode is certainly a great step closer towards
               | that ideal. However, I would like to have a number
               | measures the "awesomeness" of the machine (not elo rating
               | because that depends on a human reference), so I will
               | have something as a benchmark to refer to when the next
               | improvement arrives.
        
               | pedrosorio wrote:
               | I understand wanting to look at different metrics to
               | gauge progress, but what is the issue with this?
               | 
               | > not elo rating because that depends on a human
               | reference
        
               | sibeshk96 wrote:
               | Using my previous chess analogy, the world's smartest
               | chess bot has played a million games to beat the average
               | grandmaster, who has played less than 10,000 games in her
               | lifetime. So while they both will have the same elo
               | rating, which is a measure of how well they are at the
               | narrow domain of chess, there is clearly something
               | superior about the how the human grandmaster learns from
               | just a few data points i.e. strong generalization vs the
               | AI's weak generalization. Hence the task-specific elo
               | rating does not give enough context to understand how
               | well a model adapts to uncertainty. For instance - a
               | Roomba would beat a human hands down if there was an elo
               | rating for vacuuming floors.
        
               | ahgamut wrote:
               | The Turing Test
               | (https://en.wikipedia.org/wiki/Turing_test) for
               | artificial intelligence required the machine to convince
               | a human questioner that it was a human. Since then, most
               | AI methods rely on a human reference of performance to
               | showcase their prowess. I don't find this appealing
               | because:
               | 
               | 1) It's an imprecise target: believers can always hype
               | and skeptics can always downplay improvements. Humans can
               | do lots of different things somewhat well at the same
               | time, so a machine beating human-level performance in one
               | field (like identifying digits) says little about other
               | fields (like identifying code vulnerabilities).
               | 
               | 2) ELO ratings, or similar metrics are measurements of
               | _skill_ , and can be brute-forced to some extent,
               | equivalent to grinding up levels in a video game. Brute-
               | forcing a solution is "bad", but how do we know a new
               | method is "better/more elegant/more efficient"? For
               | algorithms we have Big-O notation, so we know (brute
               | force < bubble sort < quick sort), perhaps there is an
               | analogue for machine learning.
               | 
               | I would like performance comparisons that focus on
               | quantities unique to machines. I don't compare the
               | addition of computer processors with reference to human
               | addition, so why not treat machine intelligence
               | similarly?
               | 
               | There are many interesting quantities with which we can
               | compare ML models. Energy usage is a popular metric, but
               | we can also compare the structure of the network, the
               | code used, the hardware, the amount of training data, the
               | amount of training time, and the similarity between
               | training and test data. I think a combination of these
               | would be useful to look at every time a new model
               | arrives.
        
       | mwattsun wrote:
       | Seems to me that this accelerates the trend towards a more
       | declarative style of programming where you tell the computer what
       | you want to do, not how to do it
        
       | aidenn0 wrote:
       | > Creating solutions to unforeseen problems is second nature in
       | human intelligence
       | 
       | If this is true then a lot of the people I know lack human
       | intelligence...
        
       | algon33 wrote:
       | How suprising did you guys find this? I'd have said there was a
       | 20% chance of this performing at the median+level if I was asked
       | to predict things beforehand.
        
         | Isinlor wrote:
         | There is a prediction market called Metaculus.
         | 
         | On Dec 31, 2016 in partnership with Center for the Study of
         | Existential Risk, Machine Intelligence Research Institute, and
         | The Future of Life Institute they asked:
         | 
         | How long until a machine-learning system can take a simple text
         | description and turn it into a program coded in C/Python?
         | 
         | https://www.metaculus.com/questions/405/when-will-programs-w...
         | 
         | First 19 forecasters in March 2017 were predicting mid-2021,
         | the best forecasters were predicting late 2024. When the
         | question closed in 2020 the community was predicting January
         | 2027 and the best forecasters were predicting March 2030.
         | 
         | The question resolved on July 2021 when Codex was published.
         | 
         | Community and the best forecasters were assigning ~15% that it
         | will happen by July 2021.
         | 
         | I'm currently 14th best forecaster there and I was predicting
         | 33% before July 2021. It was my last prediction, and it was
         | made on October 2018.
         | 
         | I'm also predicting 75% that we will have AGI by 2040 as
         | defined in this question:
         | 
         | https://www.metaculus.com/questions/3479/when-will-the-first...
         | 
         | 20% that it will happen before 2030.
         | 
         | There is also stronger operationalization:
         | 
         | https://www.metaculus.com/questions/5121/when-will-the-first...
         | 
         | My prediction here is 60% before 2040 and 5% before 2030.
         | 
         | I have also "canary in the coal mine" questions:
         | 
         | When will AI achieve competency on multi-choice questions
         | across diverse fields of expertise? Community predicts 50%
         | before 2030, I agree.
         | 
         | https://www.metaculus.com/questions/5276/ai-competence-in-di...
         | 
         | When will AI be able to learn to play Montezuma's Revenge in
         | less than 30 min? Community predicts 50% before 2025, I think
         | 50% before 2027.
         | 
         | https://www.metaculus.com/questions/5460/ai-rapidly-learning...
        
         | baobabKoodaa wrote:
         | I would have said there is a ~0% chance of this happening
         | within our lifetimes.
        
         | hackinthebochs wrote:
         | I didn't find it very surprising, but then I tend to be more
         | optimistic than average about the capabilities of transformer
         | models and the prospect of general AI in the relatively near
         | term.
        
         | machiaweliczny wrote:
         | I am surprised, as recently OpenAI had ~25% of easy problems
         | and ~2% in competitive problems. Seems like DeepMind is ahead
         | in this topic as well.
         | 
         | Actually I think Meta AI had some interesting discovery
         | recently that could possibly improve NNs in genral, so probably
         | this as well.
         | 
         | I am not in field but wonder if some other approaches like
         | Tsetlin machines would be more useful for programming.
        
         | marcusbuffett wrote:
         | I would have guessed around the same chance, this was
         | surprising to me after playing around with copilot and not
         | being impressed at all.
        
       | knowmad wrote:
       | I agree with most of the comments I've read in this thread.
       | Writing code to solve a well defined narrowly scoped problem
       | isn't that hard or valuable. It's determining what the problem
       | actually is and how software could be used to solve it that is
       | challenging and valuable.
       | 
       | I would really like to see more effort in the AI/ML code
       | generation space being put into things like code review, and
       | system observation. It seems significantly more useful to use
       | these tools to augment human software engineers rather than
       | trying to tackle the daunting and improbable task of completely
       | replacing them.
       | 
       | *Note: as a human software engineer I am biased
        
       | [deleted]
        
       | FemmeAndroid wrote:
       | This is extremely impressive, but I do think it's worth noting
       | that these two things were provided:
       | 
       | - a very well defined problem. (One of the things I like about
       | competitive programming and the like is just getting to implement
       | a clearly articulated problem, not something I experience on most
       | days.) - existing test data.
       | 
       | This is definitely a great accomplishment, but I think those two
       | features of competitive programming are notably different than my
       | experience of daily programming. I don't mean to suggest these
       | will always be limitations of this kind of technology, though.
        
         | baobabKoodaa wrote:
         | > One of the things I like about competitive programming and
         | the like is just getting to implement a clearly articulated
         | problem
         | 
         | English versions of Codeforces problems may be well-defined but
         | they are often very badly articulated and easy to misunderstand
         | as a human reader. I still can't understand how they got AI to
         | be able to generate plausible solutions from these problem
         | statements.
        
         | jakub_g wrote:
         | 100% agree. Someone (who?) had to take time and write the
         | detailed requirements. In real jobs you rarely get good tickets
         | with well defined expectations; it's one of most important
         | developer's jobs to transform fuzzy requirement into a good
         | ticket.
         | 
         | (Side note: I find that many people skip this step, and go
         | straight from fuzzy-requirement-only-discussed-on-zoom-with-Bob
         | to code; open a pull request without much context or comments;
         | and then a code reviewer is supposed to review it properly
         | without really knowing what problem is actually being solved,
         | and whether the code is solving a proper problem at all).
        
           | jensensbutton wrote:
           | Maybe the problem transformation will be both the beginning
           | _and_ end of the developer's role.
        
           | ctoth wrote:
           | So what happens when OpenAI releases TicketFixer 0.8 which
           | synthesizes everything from transcripts of your meetings to
           | the comments to the JIRA ticket to the existing codebase and
           | spits out better tickets to feed into the programming side?
        
             | solarmist wrote:
             | Yup, I hope that'll happen. Then engineers would just end
             | up being done at a higher level of abstraction closer to
             | what designers do with wireframes and mockups.
             | 
             | Kind of the opposite of the way graphic design has evolved.
             | Instead of getting more involved in the process and, in
             | many cases, becoming front-end developers, it'll become
             | more abstract where humans make the decisions and reason
             | about what to include/exclude, how it'll flow, etc.
             | 
             | Even TicketFixer wouldn't be able to do more than offer a
             | handful of possible solutions to design-type issues.
        
               | bmhin wrote:
               | Yeah, we need our TicketFixer to also include the No_Bob
               | 0.2 plugin that figures out that a decent percentage of
               | the time whatever "Bob" is asking for in that meeting is
               | not what "Bob" thinks he is asking for or should be
               | asking for and can squash those tickets. Without that
               | we're gonna somehow end up with spreadsheets in
               | everything.
        
               | solarmist wrote:
               | Haha, yeah, there's that, but there are also things like
               | "adding a dark mode." There are a dozen ways to
               | accomplish that kind of thing, and every company's
               | solution will diverge when you get down to the details.
        
             | jakub_g wrote:
             | Take my money.
        
           | machiaweliczny wrote:
           | But it's easy to create AI conversation that will refine
           | problem.
        
           | ohwellhere wrote:
           | Is the next step in the evolution of programming having the
           | programmer become the specifier?
           | 
           | Fuzzy business requirements -> programmer specifies and
           | writes tests -> AI codes
        
             | buscoquadnary wrote:
             | That's all we've ever been since we invented software.
             | 
             | First we specified the exact flow of the bits with punch
             | cards.
             | 
             | Then we got assembly and we specified the machine
             | instructions.
             | 
             | Then we got higher level languages and we specified how the
             | memory was to be managed and what data to store where.
             | 
             | Now we have object oriented languages that allow us to work
             | with domain models, and functional languages that allow us
             | to work data structures and algorithms.
             | 
             | The next level may be writing business rules, and
             | specifying how services talk to each other, who knows, but
             | it will be no different than it is now just a higher level.
        
             | chinabot wrote:
             | If its anything like my job
             | 
             | while(1) { Fuzzy business requirements -> programmer
             | specifies and writes tests -> AI codes }
        
         | e4e78a06 wrote:
         | I don't think it's quite as impressive as you make it out to
         | be. Median performance in a Codeforces programming competition
         | is solving the easiest 1-2 problems out of 5-6 problems. Like
         | all things programming the top 1% is much, much better than the
         | median.
         | 
         | There's also the open problem of verifying correctness in
         | solutions and providing some sort of flag when the model is not
         | confident in its correctness. I give it another 5 years in the
         | optimistic case before AlphaCode can reliably compete at the
         | top 1% level.
        
           | ctoth wrote:
           | This is technology that simply didn't exist in any form 2
           | years ago. For no amount of money could you buy a program
           | that did what this one does. Having been watching the growth
           | of Transformer-based models for a couple years now really has
           | hammered home that just as soon as we figure out how an AI
           | can do X, X is no longer AI, or at least no longer
           | impressive. How this happens is with comments like yours, and
           | I'd really like to push back against it for once. Also 5
           | years? So assuming that we have all of the future ahead of
           | us, to think that we only have 5 years left of being the top
           | in programming competitions seems like it's somehow important
           | and shouldn't be dismissed with "I don't think it's quite as
           | impressive as you make it out to be."
        
             | BobbyJo wrote:
             | I don't think that's what happening. Let's talk about this
             | case: programming. It's not that people are saying "an AI
             | programming" isn't impressive or isn't AI, it's that when
             | people say "an AI programming" they aren't talking about
             | ridiculously controlled environments like in this case.
             | 
             | It's like self-driving cars. A car driving itself for the
             | first time in a controlled environment, I'm sure, was an
             | impressive feat, and it wouldn't be inaccurate to call it a
             | self-driving car. However, that's not what we're all
             | waiting for when we talk about the arrival of self-driving
             | cars.
        
               | ctoth wrote:
               | And if AI programming were limited to completely
               | artificial contexts you would have a point, though I'd
               | still be concerned. We live in a world, however, where
               | programmers routinely call on the powers of an AI to
               | complete their real code and get real value out of it.
               | This is based on the same technology that brought us this
               | particular win, so clearly this technology is useful
               | outside "ridiculously controlled environments."
        
               | Retric wrote:
               | Programmers do setup completely artificial contexts so AI
               | can work.
               | 
               | None of the self driving systems where setup by giving
               | the AI access to sensors, a car, and the drivers handbook
               | and saying well you figure it out from there. The general
               | trend is solve this greatly simplified problem, this more
               | complex one, up to dealing with the real world.
        
               | ctoth wrote:
               | By AI programming I mean the AI doing programming, not
               | programming the AI. Though soon enough the first will be
               | doing the second and that's where the loop really
               | closes...
        
               | BobbyJo wrote:
               | That's not significantly different than how programming
               | has worked for the last 40 years though. We slowly push
               | certain types of decisions and tasks down into the tools
               | we use, and what's left over is what we call
               | 'programming'. It's cool, no doubt, but as long as
               | companies need to hire 'prorammers', then it's not the
               | huge thing we're all looking out over the horizon waiting
               | for.
        
             | YeGoblynQueenne wrote:
             | >> This is technology that simply didn't exist in any form
             | 2 years ago.
             | 
             | A few examples of neural program synthesis from at least 2
             | years ago:
             | 
             | https://sunblaze-ucb.github.io/program-synthesis/index.html
             | 
             | Another example from June 2020:
             | 
             |  _DreamCoder: Growing generalizable, interpretable
             | knowledge with wake-sleep Bayesian program learning_
             | 
             | https://arxiv.org/abs/2006.08381
             | 
             | RobustFill, from 2017:
             | 
             |  _RobustFill: Neural Program Learning under Noisy I /O_
             | 
             | https://www.microsoft.com/en-us/research/wp-
             | content/uploads/...
             | 
             | I could go on.
             | 
             | And those are only examples from neural program synthesis.
             | Program synthesis, in general, is a field that goes way
             | back. I'd suggest as usual not making big proclamations
             | about its state of the art without being acquainted with
             | the literature. Because if you don't know what others have
             | done every announcement by DeepMind, OpenAI et al seems
             | like a huge advance... when it really isn't.
        
               | qualudeheart wrote:
               | Has someone tried classical program synthesis techniques
               | on competitive programming problems? I wonder what would
               | have been possible with tech from more than 2 years ago.
        
               | YeGoblynQueenne wrote:
               | I don't know if anyone has tried it, but it's not a very
               | objective evaluation. We have no good measure of the
               | coding ability of the "median level competitor" so doing
               | better or worse than that, doesn't really tell us
               | anything useful about the coding capability of an
               | automated system.
               | 
               | So my hunch is that it probably hasn't been done, or
               | hasn't been done often, because the program synthesis
               | community would recognise it's pointless.
               | 
               | What you really want to look at is formal program
               | synthesis benchmarks and how systems like AlphaCode do on
               | them (hint: not so good).
        
               | ctoth wrote:
               | Of course program synthesis has been a thing for years, I
               | remember some excellent papers out of MSR 10 years ago.
               | But which of those could read a prompt and build the
               | program from the prompt? Setting up a whole bunch of
               | constraints and having your optimizer spit out a program
               | that fulfills them is program synthesis and is super
               | interesting, but not at all what I think of when I'm told
               | we can make the computer program for us. For instance,
               | RobustFill takes its optimization criteria from a bundle
               | of pre-completed inputs and outputs of how people want
               | the program to behave instead of having the problem
               | described in natural language and creating the solution
               | program.
        
               | YeGoblynQueenne wrote:
               | Program synthesis from natural language specifications
               | has existed for many years, also. It's not my specialty
               | (neither am I particularly interested in it), but here's
               | a paper I found from 2017, with a quick search:
               | 
               | https://www.semanticscholar.org/paper/Program-Synthesis-
               | from...
               | 
               | AlphaCode is not particularly good at it, either. In the
               | arxiv preprint, besides the subjetive and pretty
               | meaningless "evaluation" against human coders it's also
               | tested on a formal program synthesis benchmark, the APPS
               | dataset. The best performing AlphaCode variant reported
               | in the arxiv preprint solves 25% of the "introductory"
               | APPS tasks (the least challenging ones). All AlphaCode
               | variants tested solve less than 10% of the "interview"
               | and "competition" (intermediary and advanced) tasks.
               | These more objective results are not reported in the
               | article above, I think for obvious reasons (because they
               | are extremely poor).
               | 
               | So it's not doing anything radically new and it's not
               | doing it particularlly well either. Please be better
               | informed before propagating hype.
               | 
               | Edit: really, from a technical point of view, AlphaCode
               | is a brute-force, generate-and-test approach to program
               | synthesis that was state-of-the-art 40 years ago. It's
               | just a big generator that spams programs hoping it will
               | hit a good one. I have no idea who came up with this.
               | Oriol Vinyals is the last author and I've seen enough of
               | that guy's work to know he knows better than bet on such
               | a primitive, even backwards approach. I'm really shocked
               | that this is DeepMind work.
        
           | Jensson wrote:
           | Top 1% competitive programming level means that it can start
           | solving research problems, problem difficulty and creativity
           | needed for problems goes up exponentially for harder problems
           | and programming contests have lead to research papers before.
           | It would be cool if we got there in 5 years but I doubt it.
           | But if we got there it would revolutionize so many things in
           | society.
        
           | xorcist wrote:
           | You don't think it's impressive, yet you surmise that a
           | computer program could compete at a level of the top 1% of
           | all humans _in_ _five_ _years_?
           | 
           | That's wildly overstating the promise of this technology, and
           | I'd be very surprised if the authors of this wouldn't agree.
        
             | bricemo wrote:
             | Agree. If an AI could code within the top 1%, every single
             | person whose career touches code would have their lives
             | completely upended. If that's only 5 years out...ooof.
        
           | Groxx wrote:
           | I do kinda wonder if it'd lead to as good results if you just
           | did a standard "matches the most terms the most times" search
           | against all of github.
           | 
           | I have a suspicion it would - kinda like Stack Overflow,
           | problems/solutions are not that different "in the small".
           | It'd have almost certainly given us the fast square root
           | trick verbatim, like Github's AI is doing routinely.
        
       | YeGoblynQueenne wrote:
       | >> AlphaCode ranked within the top 54% in real-world programming
       | competitions, an advancement that demonstrates the potential of
       | deep learning models for tasks that require critical thinking.
       | 
       | Critical thinking? Oh, wow. That sounds amazing!
       | 
       | Let's read further on...
       | 
       | >> At evaluation time, we create a massive amount of C++ and
       | Python programs for each problem, orders of magnitude larger than
       | previous work. Then we filter, cluster, and rerank those
       | solutions to a small set of 10 candidate programs that we submit
       | for external assessment.
       | 
       | Ah. That doesn't sound like "critical thinking", or any thinking.
       | It sounds like massive brute-force guessing.
       | 
       | A quick look at the arxiv preprint linked from the article
       | reveals that the "massive" amount of prorgams generated is in the
       | millions (see Section 4.4). These are "filtered" by testing them
       | against program input-output (I/O) examples given in the problem
       | descriptions. This "filtering" still leaves a few thousands of
       | candidate programs that are further reduced by clustering to
       | "only" 10 (which are finally submitted).
       | 
       | So it's a generate-and-test approach rather than anything to do
       | with reasoning (as claimed elsewhere in the article) let alone
       | "thinking". But why do such massive numbers of programs need to
       | be generated? And why are there still thousands of candidate
       | programs left after "filtering" on I/O examples?
       | 
       | The reason is that the generation step is constrained by the
       | natural-language problem descriptions, but those are not enough
       | to generate appropriate solutions because the generating language
       | model doesn't understand what the problem descriptions mean; so
       | the system must generate millions of solutions hoping to "get
       | lucky". Most of those don't pass the I/O tests so they must be
       | discarded. But there are only very few I/O tests for each problem
       | so there are many programs that can pass them, and still not
       | satisfy the problem spec. In the end, clustering is needed to
       | reduce the overwhelming number of pretty much randomly generated
       | programs to a small number. This is a method of generating
       | programs that's not much more precise than drawing numbers at
       | random from a hat.
       | 
       | Inevitably, the results don't seem to be particularly accurate,
       | hence the evaluation against programs written by participants in
       | coding competitions, which is not any objective measure of
       | program correctness. Table 10 on the arxiv preprint lists results
       | on a more formal benchmar, the APPS dataset, where it's clear
       | that the results are extremely poor (the best performing
       | AlphaCode variant solves 20% of the "introductory" level
       | problems, though outperforming earlier approaches).
       | 
       | Overall, pretty underwhelming and a bit surpirsing to see such
       | lackluster results from DeepMind.
        
       | thomasahle wrote:
       | Next they can train it on kaggle, and we'll start getting closer
       | to the singularity
        
       ___________________________________________________________________
       (page generated 2022-02-02 23:00 UTC)