[HN Gopher] Competitive Programming with AlphaCode ___________________________________________________________________ Competitive Programming with AlphaCode Author : yigitdemirag Score : 456 points Date : 2022-02-02 16:13 UTC (6 hours ago) (HTM) web link (deepmind.com) (TXT) w3m dump (deepmind.com) | pretendscholar wrote: | I am a little bitter that it is trained on stuff that I gave away | for free and will be used by a billion dollar company to make | more money. I contributed the majority of that code before it was | even owned by Microsoft. | visarga wrote: | Paying it forward, it will help others in turn. | pretendscholar wrote: | Yes it will help the already powerful players | disproportionately. | alphabetting wrote: | They opensourced alphafold for anyone to use commercially | despite big financial incentive to keep it private and use | in their new drug discovery lab. No idea how this works or | differs from alphafold but imagine they'll do the same here | if possible | pretendscholar wrote: | Only after another lab made their own open source one | that was comparable. | kzrdude wrote: | The problem is not really that microsoft owns github, or | that licenses allow corporations free use, but that the | tech giants are so big and have so much power. | Permit wrote: | Can you elaborate and give some history? What code did you | contribute, and how did it end up being used by Microsoft and | then DeepMind? | arendtio wrote: | > We pre-train our model on selected public GitHub code and | fine-tune it on our relatively small competitive programming | dataset. | | But since the code was 'selected' you don't know if your code | was used. However, they seem to have used Python and C++, so | my code is probably not part of it. | [deleted] | hmate9 wrote: | Between this and OpenAI's Github Copilot "programming" will | slowly start dying probably. What I mean by that is that sure, | you have to learn how to program, but our time will be spent much | more on just the design part and writing detailed | documentation/specs and then we just have one of these AIs | generate the code. | | It's the next step. Binary code < assembly < C < Python < | AlphaCode | | Historically its always been about abstracting and writing less | code to do more. | mhzsh wrote: | Creating a higher level abstraction is something people have | been trying to do for decades with so-called 4th-generation | languages. At some point, abstracting away too much makes a | tool too cookie-cutter, and suddenly deviating from it causes | more difficulty. | visarga wrote: | Maybe it's not more abstraction we need, just automating the | drudgery. Abstractions are limited - by definition they | abstract things away, they are brittle. | vvilliamperez wrote: | Read: Ruby on Rails | streetcat1 wrote: | First, If this is correct, if alpha code succeeded, this will | bring to its own demise. | | I.e. as soon as it starts replacing humans, it will not have | enough human generated training data, since all of programming | will be done by models like himself. | | Second, alphacode was specifically trained for competitive | programming : | | 1. short programs. 2. Each program has 100's of human generated | solutions. | | However, commercial program are: | | 1. long. 2. Have no predefined answer or even correct answer. | 3. Need to use/reuse a lot of legacy code. | chroem- wrote: | Reinforcement learning and adversarial training can render | both of those concerns as non-issues in practice. | ialyos wrote: | The phrase "in practice" doesn't really work when you're | referring to highly finicky strategies like RL and | adversarial training | AnIdiotOnTheNet wrote: | > as soon as it starts replacing humans, it will not have | enough human generated training data, since all of | programming will be done by models like himself. | | As a natural born pessimist, I can't help but feel that by | the time we get to that point we'll just keep blundering | forward and adapting our world around the wild nonsense | garbage code the model ends up producing in this scenario. | | After all, that's basically what we've done with the entire | web stack. | pjmorris wrote: | I'd note that assembly, C, and Python didn't replace | 'programming' but were expected to do so. I'd wager that what | you now call 'detailed documentation/specs' will still be | called programming in 10 or even 20 years. | falcor84 wrote: | If you could change a sentence in the documentation and then | run a ~1min compilation to see the resulting software, it | would be a very different kind of programming. I suppose | it'll give a new meaning to Readme-Driven-Development. | wittycardio wrote: | Solving competitive programming problems is essentially solving | hard combinatorial optimization problems. Throwing a massive | amount of compute and gradient descent at the problem has | always been possible. If I'm not mistaken what this does is | reduce the representation of the problem to a state where it | can run gradient descent and then tune parameters. The real | magic is in finding structurally new approaches. If anything | I'd say algorithms and math continue to be the core of | programming. The particular syntax or level of abstraction | don't matter so much. | jdlshore wrote: | > If anything I'd say algorithms and math continue to be the | core of programming. | | I disagree; I think the core of programming is analyzing | things people want and expressing solutions to those wants | clearly, unambiguously, and in a way that is easy to change | in the future. I'd say algorithms and math are a very small | part of this work. | wittycardio wrote: | That's not programming, that's called being a good | employee. Any person in any role should be doing that. | Programming is about algorithms and math. Now a good | employee who's in a technical role should have both. | jdlshore wrote: | > Programming is about algorithms and math. | | You've simply restated your opinion without providing any | supporting arguments, and as I already said, I disagree. | The vast majority of programming I see (and as a | consultant, I see a fairly wide variety) is not about | algorithms and math, but instead gluing together systems | and expressing domain logic. | | Now, I suppose you could argue that domain logic is | "algorithms and math," but in my experience, it's less | about the specific algorithms and more about precisely | describing fuzzy human behavior. | | It's that "precisely describing" and "easy to change in | the future" parts that makes what programmers do | different than what any good employee does. | | (I do agree that there is some programming that is | focused on algorithms and math, but it's in the minority, | in my experience. Perhaps the type of work you do _is_ | focused on algorithms and math, but I believe that 's a | relatively small part of the software development | ecosystem.) | chroem- wrote: | > Solving competitive programming problems is essentially | solving hard combinatorial optimization problems. | | True, but if you relax your hard requirements of optimality | to admit "good enough" solutions, you can use heuristic | approaches that are much more tractable. High quality | heuristic solutions to NP-hard problems, enabled by ML, are | going to be a big topic over the next decade, I think. | wittycardio wrote: | I should correct myself, this isn't even that. This is just | text analysis on codeforces solutions, which makes it even | worse than I thought. Very pessimistic about it's | generalizability. | Inufu wrote: | I agree, I expect programmers will just move up the levels of | abstraction. I enjoyed this recent blog post on the topic: | https://eli.thegreenplace.net/2022/asimov-programming-and-th... | hackinthebochs wrote: | The "problem" is that as you move up the levels of | abstraction, you need fewer people to do the same amount of | work. Unless the complexity of the work scales as well. I've | always felt that programmers would be the first class of | knowledge workers to be put out of work by automation. This | may be the beginning of the end for the programming gravy | train. | NicoJuicy wrote: | There aren't enough developers either way. | bmh100 wrote: | On the other hand, as the value of an hour of programming | increases, the quantity demanded may also increase. | paxys wrote: | > as you move up the levels of abstraction, you need fewer | people to do the same amount of work | | Yes, but the total amount of work (and surrounding | complexity) also increases with it. Just look at the | evolution of the software industry over the last few | decades. | hackinthebochs wrote: | History isn't a great guide here. Historically the | abstractions that increased efficiency begat further | complexity. Coding in Python elides over low-level issues | but the complexity of how to arrange the primitives of | python remains for the programmer to engage with. AI | coding has the potential to elide over all the complexity | that we identify as programming. I strongly suspect this | time is different. | visarga wrote: | > The "problem" is that as you move up the levels of | abstraction, you need fewer people to do the same amount of | work. | | This will lower the entry barrier to developing software so | more people will go into the field. Before you needed to | know a programming language, now you will just have a | dialogue with a language model. | | > I've always felt that programmers would be the first | class of knowledge workers to be put out of work by | automation. | | We've been automating our work for 70 years, and look how | many programmers are employed now. The more we automate, | the more capable our field becomes and more applications | pop up. | hackinthebochs wrote: | >This will lower the entry barrier to developing software | so more people will go into the field. | | Indeed. The ideal future of programming is something out | of star trek. I often noticed how everyone on the ship is | a programmer of a sort, they whip up a simulation as the | problem warrants regardless of their field. But in this | future, the job of programmer basically doesn't exist. As | a programmer, I should be allowed to have mixed feelings | about that. | visarga wrote: | Let your imagination fly. We always want more than it's | possible, our wishes fill up any volume like an expanding | gas. Humans are going to be crucial to orchestrate AI and | extract the most utility out of it. | hmate9 wrote: | Or you can do things at a faster pace and increase your | productivity. | Inufu wrote: | Yes, this is how you increase prosperity (see: agricultural | revolution, industrial revolution, etc). You can now create | more with the same number of people. | elwell wrote: | > writing detailed documentation/specs | | That's what code is. | bmc7505 wrote: | I disagree that programming is dying -- tools like Copilot will | lead to a Renaissance in the art of computer programming by | enabling a larger population to design programs and explore the | implications of their design choices. I wrote a short essay [1] | on the history automated programming and where I think it is | heading in the future. | | [1]: | https://breandan.net/public/programming_with_intelligent_mac... | 62951413 wrote: | Model-driven development and code generation from UML were once | supposed to be the future. It will be interesting to see how | much further this approach takes us. | | Assuming ANNs resemble the way human brain function you'd also | expect them to introduce bugs. And so the actual humans beings | would partake in debugging too. | diehunde wrote: | My bet would be that it will never happen in a reasonable time | frame. And also by that logic, writing that | "documentation/spec" would just mean learning a new programming | language the AI engine can parse making it as useful as a | compiler. Anyone who has been writing and designing software | for a while knows the cycle is way more complex than take some | input and write code. | | Let me know when the AI engine is able to do complex | refactoring or adding features that keeps backwards | compatibility, find a bug in a giant codebase by debugging a | test case or write code that's performant but also | maintainable. | ctoth wrote: | You ever notice how the "let me know when" part of this keeps | changing? Let me know when computers can ... play | Go/understand a sentence/compose music/write a program/ ... | | But surely they'll never be able to do this new reference | class you have just now come up with, right? | diehunde wrote: | Not really? I mean I would never say "let me know when | computer can do X" when X is something that doesn't require | too much creativity and imagination. Like, a computer | composing music, doesn't impress me too much because music | itself has structure. A computer creating music that would | wow a professional composer? That would be impressive. Same | with this topic. A computer that solves some (because it | failed several) short programming challenges and OP says it | will kill programming entirely? Not even close. Pretty cool | though. | Jensson wrote: | It keeps changing since our imagination of what tasks | requires intelligence are weak. We think that when a | computer can do X it can also do Y. But then someone builds | a computer that can do X but can't do Y, and we say "oh, so | that doesn't require intelligence, let me know when it can | do Z and we can talk again.". That doesn't mean that Z | means the computer is intelligent, just that Z is a point | where we can look at it and discuss again if we made any | progress. What we really want is a computer that can do Y, | but we make small mini tasks that are easier to test | against. | | The Turing test is a great example of this. Turing thought | that a computer needs to be intelligent to solve this task. | But it was solved by hard coding a lot of values and better | understanding of human psychology and what kind of | conversation would seem plausible when most things are | hardcoded. That solution obviously isn't AI, I bet you | don't think so either, but it still passed the Turing test. | ctoth wrote: | At what point do we give up and realize that there is no | one thing called intelligence, just a bunch of hacks that | work pretty well for different things sometimes? I think | that's probably where people keep failing here. The | reason that we keep failing to find the special thing in | every new field that AI conquers is because there's | nothing special to actually find? I mean, we could keep | moving the goalposts, a sort of intelligence of the gaps | argument? But this doesn't seem productive. | Enginerrrd wrote: | I agree, from a totally different angle. Let's take something | I know better as an example: Structural engineering. | Structural engineering should be a "solved problem". It | seems, ostensibly, relatively simple compared to a more open | ended activity like "programming".(For "technical reasons", | it ends up being more similar than you might think.) Still, | you are ultimately dealing with the same materials, the same | physics, and very similar configurations. | | And yet, despite the fact that we have programs to help | calculate all the things, test code-required load- | combinations, even run simulations and size individual | components... it turns out that, it doesn't actually save | that much work, and you still need an engineer to do most of | it. And not just because of regulatory requirements. It's | just, that's not the hard part. The hard part is assembling | the components and specifications, specifying the correct | loads based on location-specific circumstances, coming up | with coherent and sensible design ideas, chasing down every | possible creative nook and cranny of code to make something | that was originally a mistake actually work, and know when | the model is just wrong for some reason and the computer | isn't simulating load paths accurately. | | Specifying the inputs and interpreting results is still about | as much work as it was before you started with all the fancy | tools. Those tools still have advantages mind you, and they | do make one slightly more efficient. Substantially so in some | cases, but most of the time it still comes out as a slight | assist rather than a major automation. | fvold wrote: | I hear that. | | Machine Learning also has a long way to go before it can take | a long, rambling mess of a meeting and somehow generate a | halfway usable spec from it. I mean, the customer says they | want X, but X is silly in this context, so we'll give them Y | and tell them it's "X-like, but faster". For example, SQL is | "Blockchain-like, but faster" for a lot of buzzword use-cases | of blockchain. | mirrorlake wrote: | I've been wondering this for a while: | | In the future, code-writing AI could be tasked with generating | the most reliable and/or optimized code to pass your unit tests. | Human programmers will decide what we want the software to do, | make sure that we find all the edge cases and define as many unit | tests as possible, and let the AI write significant portions of | the product. Not only that, but you could include benchmarks that | pit AI against itself to improve runtime or memory performance. | Programmers can spend more time thinking about what they want the | final product to do, rather than getting mired in mundane | details, and be guaranteed that portions of software will perform | extremely well. | | Is this a naive fantasy on my part, or actually possible? | phreeza wrote: | And a second AI to generate additional test cases similar to | yours (which you accept as also in scope) to avoid the first AI | gaming the test. | machiaweliczny wrote: | First you need really good infra to make it easy to test | working multiple solutions for AI but I think this will be | bleeding edge in 2030. | | EDIT: with in-memory DBs I can imagine AI assisted mainframe | than can solve 90% of business problems. | EVa5I7bHFq9mnYK wrote: | It seems to me that writing an exhausting set of unit cases is | harder than writing the actual code. | mrsuprawsm wrote: | Does this mean that we can all stop grinding leetcode now? | BoardsOfCanada wrote: | Do I understand it correctly that it generated (in the end) ten | solutions that then were examined by humans and one picked? Still | absolutely amazing though. | thomasahle wrote: | No human examination was done. | | But it generated 10 solutions which it ran against the example | inputs, and picked the one that passed. | | Actually I'm not sure if it ran the solutions against the | example inputs or the real inputs. | [deleted] | aliceryhl wrote: | They used the real inputs. The example inputs were used to | filter out which candidates to submit for the 10 tries. | aliceryhl wrote: | No, they gave the algorithm 10 tries and tested all of them, | and said that it was solved if any one of them worked. | mcast wrote: | The year is 2025, Google et al. are now conducting technical on- | site interviews purely with AI tools and no human bias behind the | camera (aside from GPT-3's quirky emotions). The interview starts | with a LC hard, you're given 20 minutes -- good luck! | jakey_bakey wrote: | I think Amazon already tried this and it had surprisingly | racist results | qualudeheart wrote: | Calling it now: If current language models can solve competitive | programming at an average human level, we're only a decade or | less off from competitive programming being as solved as Go or | Chess. | | Deepmind or openAI will do it. If not them, it will be a Chinese | research group on par with them. | | I'll be considering a new career. It will still be in computer | science but it won't be writing a lot of code. There'll be | several new career paths made possible by this technology as | greater worker productivity makes possible greater | specialization. | keewee7 wrote: | AI is being aggressively applied to areas where AI | practitioners are domain experts. Think programming, data | analysis etc. | | Programmers and data scientists might find ourselves among the | first half of knowledge workers to be replaced and not among | the last as we previously thought. | muds wrote: | It can be really tempting to think about research progression | on a "linear" timescale but more often than not it eventually | ends up following an "exponential" curve because of technical | debt. And there appears to be a _lot_ of techniques used here | which we don't fully understand. | | I wouldn't be surprised if a specifically engineered system ten | years from now wins an ICPC gold medal but I'm pretty sure that | a general purpose specification -> code synthesizer that would | actually threaten software engineering would require us to | settle a lot of technical debts first -- especially in the area | of verifying code/text generation using large language models. | EVa5I7bHFq9mnYK wrote: | Don't worry, there are a lot of much simpler jobs, like drivers | or cashiers that will surrender to AI before coder's job does. | So UBI will be implemented long before that happens. | solididiot wrote: | I wouldn't be so sure. Programmers (and drivers and cashiers) | can "survive" in poverty like millions others already do. | This transformation is coming in waves that keep the | proverbial frog in the pan. | simpleguitar wrote: | It doesn't even have to be average human. | | Let's say AI only gets to 10% (or 20% or 30% or whatever, it | doesn't really matter), that's a huge number of jobs being | lost. | | Imagine having a machine write all the "simple/boring" code for | you. Your productivity will go through the roof. The smartest | programmer who can most effectively leverage the machine could | replace many hundreds of programmers. | | I should brush up on my plumbing and apply for a plumbing | license soon. (I think plumbing is safer than electricians, | because many CS people have good EE foundations). | phendrenad2 wrote: | Calling it now: Your prediction is off by an order of magnitude | or two (10 years -> 100 years, or 1000 years) | abecedarius wrote: | Three months ago in the Copilot thread I was saying | | > in 5 years will there be an AI that's better than 90% of | unassisted working programmers at solving new leetcode-type | coding interview questions posed in natural language? | | and getting pooh-poohed. | https://news.ycombinator.com/item?id=29020401 (And writing | that, I felt nervous that it might not be aggressive enough.) | | There's this general bias in discussions of AI these days, that | people forget that the advance they're pooh-poohing was | dismissed in the same way as probably way off in the indefinite | future, surprisingly recently. | hackinthebochs wrote: | The issue is these techniques are growing in capabilities | exponentially, while we have a habit of extrapolating | linearly. Some saw the glaring deficits in copilot then | reasoned that linear improvements is still glaring deficits. | I don't know that this bias can ever be corrected. A large | number of intelligent people simply will never be convinced | general AI is coming soon no matter what evidence is | presented. | Jensson wrote: | > techniques are growing in capabilities exponentially, | while we have a habit of extrapolating linearly | | What does this even mean? How do you put a number on AI | capability? You can say it is growing faster than people | expect, but what is even exponential or linear growth in AI | capability? | hackinthebochs wrote: | I take your point that the linear/exponential terminology | is a bit dubious. But the simple way to make sense of it | is just going by various benchmarks. E.g. the power-law | relationship between the model accuracy and the model | size: https://eliaszwang.com/paper-reviews/scaling-laws- | neural-lm/ | redsummer wrote: | pkaye wrote: | How long before it can write the code without plagiarizing code | from online? | stnmtn wrote: | Humans study CS for 5 years, reading code from online to be | able to solve these problems. | falcor84 wrote: | How long before the typical human coder can do so? | pkaye wrote: | Are you saying you cannot write code from scratch? | sheikheddy wrote: | Not the parent comment, but I cannot code from scratch | (outside of very simple and small applications). | Competitive Programming is at about the limit of what I | can do without looking things up, and only because I've | had practice specifically for that kind of artificial | environment. | falcor84 wrote: | I can write some code from scratch, but my ability to | write code is improved by an order of magnitude when I | can refer to online resources, including example code. | Jensson wrote: | This is in line with what other code generation AI's have | accomplished. | | To reach average level at codeforces you need to be able to | apply a standard operation like a sort, or apply a standard | math formula, as the first 1-2 problems in the easy contests | are just that. It is impressive that they managed to get this | result in real contests with real unaltered questions and see | that it works. But generalizing this to harder problems isn't | as easy, as there you need to start to device original | algorithms instead of just applying standard algorithms, for | such problems the model needs to understand computer science | instead of just mapping language to algorithms. | zerr wrote: | The thing is, Competitive Programming (CP) is a completely | different discipline/subject with its own trivia knowledge and | tricks. CP uses Computer Science the same way as e.g. Biology | uses Mathematics. It has very little in common with a real | world software development. | qualudeheart wrote: | I said as much in another comment. | | Automating the software development profession proper is | going to be much harder and will require autonomous agents | with coherent world models, because that's what you need to | act in a business context. | f38zf5vdt wrote: | A programming genie that grants programming wishes to the | general public. Since most of what I do on a daily basis is | engineering solutions based on tradeoffs, I can only imagine | the number of programmers needed to debug solutions given by | the programming genie in response to poorly described feature | requests. | | If we become mechanics of the software AI vehicles of the | future, so be it. | csee wrote: | You're extrapolating across very different types of problems. | Go and Chess have unlimited training data. Competitive | programming does not. | raphlinus wrote: | To me, that's actually one of the more interesting questions. | It's possible to grade the output of the AI against objective | criteria, like does it run, and resources consumed (RAM, CPU | time, and, particularly of interest to me, parallel scaling, | as GPU algorithms are too hard for most programmers). To what | extent can you keep training by having the AI generate better | and better solutions to a relatively smaller input pool of | problems? I skimmed the paper to see how much they relied on | this but didn't get a clear read. | solididiot wrote: | >> There'll be several new career paths made possible by this | technology as greater worker productivity makes possible | greater specialization. | | Can you list a few? | Der_Einzige wrote: | I'm already anticipating having the job title of "Query | Engineer" sometime in the next 30 years, and I do NLP including | large scale language model training. :( | qualudeheart wrote: | One of the big venture capitalists predicted "prompt | engineering" as a future high paid and high status position. | | Essentially handling large language models. | | Early prompt engineers will probably be drawn from "data | science" communities and will be similarly high status, well | but not as well paid, and require less mathematical | knowledge. | | I'm personally expecting an "Alignment Engineer" role | monitoring AI systems for unwanted behavior. | | This will be structurally similar to current cyber security | roles but mostly recruited from Machine Learning communities, | and embedded in a broader ML ecosystem. | jonas_kgomo wrote: | I like this descriptions better, considering that companies | like Anthropic are working specifically on Alignment and AI | Safety. Being that the team actually spun out of Deep Mind, | it is interesting. | qualudeheart wrote: | Alignment is going be a giant industry and will also | include many people not originally in Stem. The | humanities and "civil society" will both have their | contributions to make. | | It's likely that alignment jobs won't themselves be | automated because noone will trust AI systems to align | themselves. | sjg007 wrote: | >"Alignment Engineer" role monitoring AI systems for | unwanted behavior. | | ha, I know people already doing this.. | lugu wrote: | Depending on what you want to do, you can either choose an | industry with very fuzzy requirements (to stay near the | programming side) or one with very complex but with strict | requirements (to benefit from those coding robots). I guess we | will need simulators for most of what we do in order to train | those robots. | buscoquadnary wrote: | The problem is this view continues to view software engineers | as people that write code, that's not what my job is, it is | figuring out how to solve a business problem using technology, | and getting people on board with that solution and updating and | refining it. | | This viewpoint seems to me to be very similar to the idea of | 3rd generation languages replacing developers because | programming will be so easy, it isn't about how easy it is to | write code, I function as a limited mentat taking all the | possible requirements, tradeoffs constraints, analyzing them | and then building the model, then I write out the code, the | code artifact is not the value I add. The artifact is how I | communicate the value to the world. | | This doesn't make programmers redundant anymore than Ruby, PHP, | or Java made developers redundant because it freed them from | having to manually remember and track memory usage and | pointers, it is at most a tool to reduce the friction of | getting what is in my head into the world. | | I control the code and whoever controls the code controls the | business. I posses the ability to make out the strands of flow | control and see the future state of the application. For I am | the Sr. Software engineer and I have seen where no Project | Manager can see. | | Apologies to Frank Herbet I just finished listening to Dune. | | EDIT: | | I got off track at the end but my point is that no matter how | good the tools for developing the code are, they will never | replace a software engineer anymore than electric drills and | power saws replace home builders. It merely elevates our work. | qualudeheart wrote: | I actually agree with you on that. I had another comment | further down the thread where I said that software | engineering can't be fully automated by anything short of | artificial general intelligence. | | As humans we have a coherent world model that current AI | systems are nowhere near close to having. | | That coherent world model is a necessary precondition for | both understanding a business goal and implementing a program | to solve it. AlphaCode can do the second part but not the | first. | | AlphaCode doesn't have that world model and even if it did it | still wouldn't autonomously act on it, just follow orders | from humans. | | Competitive programming is going to be solved much earlier | than programming in a business context will, because it's | completely independent of business requirements. It's at most | half as hard of a problem . | udev wrote: | Yes, for very precise, comprehensive text descriptions of | problems. | | It will take a far-far more advanced AI to write such | descriptions for real-world problems. | | Writing requirements for a project is difficult work, and not | for technical reasons, but for human reasons (people don't know | what they want exactly, people have trouble imagining things | they haven't seen yet, people are irrational, people might want | something that is different from what they need, etc.) | | In this regard, we are safe for a few more decades at least. | andy_ppp wrote: | I would actually argue the programmers job has never been | 100% writing the code, it's always been interpreting, fixing | and decoding the ideas of others. | bcrosby95 wrote: | I would argue that we figured this out over 50 years ago | but oddly enough some people still hold onto the idea. | tluyben2 wrote: | The older I get the more I see it has not been about | programming for most tasks for quite a long time. In the | early 80s it was a bit more (but not even much more); at | that time as well I spent most of my time debugging and | changing behaviour slightly (but in a lot of pages) instead | of just cranking out huge bags of code. | tluyben2 wrote: | Yes, they have been trying to create 'sufficiently formal | human readable text' to spec out projects; not detailed | enough to execute by a computer but formal and precise enough | so humans know exactly what they are getting. That still | doesn't work at all and that is between humans. If the specs | are clear enough, the act of programming is already mostly | not the issue, however, they never are. I am looking forward | to ML helping me writing boring code (which CoPilot already | does, but again, that's not really where time/energy is spent | anyway) and protect against security issues, scalability | issues and all kinds of bugs (it could rewrite algo's it | knows; it could recommend libraries that I should use instead | of the crap I rolled myself etc). | qualudeheart wrote: | Fully automating software engineering won't happen until AGI. | As a good Yuddite I expect us to have bigger problems when | that happens. | | You need an agent with a large and coherent world model, in | order to understand how your programs relate to the real | world, in order to solve business tasks. | | This isn't something any program synthesis tech currently | available can do, because none of it has a coherent world | model. | | GPT-3 comes closest to this, but isn't able to engage in any | kind of planning or abstract modeling, beyond semi coherent | extrapolations from training data. | | Maybe scaling up GPT by a few more orders of magnitude would | work, by generating an emergent world model along the way. | CobrastanJorji wrote: | What is a "Yuddite?" I tried Googling for it and got the | impression it was LessWrong forum terminology for people | who believed too strongly in LessWrong, but I couldn't find | many references. | nikkwong wrote: | I believe he's referring to "luddites" -- a group of | people who resisted technological innovation during the | industrial revolution. | indiv0 wrote: | Luddite but mixed with "Eliezer Yudkowsky" who is a | researcher working on the problem of friendly AI (or | whatever they're calling it these days). Basically trying | to prevent skynet. | | The GP is saying that once we have AGI, then "AGI is | going to make the human race irrelevant" outweighs "AGI | makes software devs irrelevant". | qualudeheart wrote: | That's the idea. | qualudeheart wrote: | I am a follower of Elizier Yudkowsky. | steve76 wrote: | NicoJuicy wrote: | I would stop programming if all we needed to write was unit tests | :p | FartyMcFarter wrote: | To compensate, lots of people would _start_ programming if that | happened though. Many scientists would be interested in solving | their field 's problems so easily - certainly maths would | benefit from it. | rmujica wrote: | wasn't it this the motivation for Prolog? | [deleted] | 37ef_ced3 wrote: | The example problem (essentially, is T a subsequence of S with | deletions of size N) is a classic problem with no doubt dozens of | implementations in AlphaCode's training set. | | And yet, what a garbage solution it produces. | | To illustrate the difference between intelligence and | regurgitation, someone tell me what CoPilot generates for this: | // A Go function to swap the sixth bit and seventeenth bit of a | 32-bit signed integer. | | Here is a human solution: func swap(x int32) | int32 { const mask = 1 << 5 var ( | xor1 = (x>>11 ^ x) & mask xor2 = xor1 << 11 | ) return x ^ xor1 ^ xor2 } | | CoPilot cannot reason numerically like this (understand | "seventeenth bit" and "sixth bit" and generate the right code for | that combination). It needs to understand the size of the gap | between the bits, i.e., 11, and that's too hard. | [deleted] | deanmen wrote: | You can do it without a subtraction unsigned | int swapbits(unsigned int a) { bool bit6 = a & (1 << | 5); bool bit17 = a & (1 << 16); if (bit6 == bit17) | return a; //bits are the same, do nothing return (a ^ | (1 << 5) ^ (1 << 16)); // flip both 6th and 17th bits | } | 37ef_ced3 wrote: | And, to be clear, this is a human solution. | | Not as efficient as mine, but kudos. | dskloet wrote: | There's really no need for an 11 in the code. I'd say that | makes the code worse, not better. | 37ef_ced3 wrote: | This is a toy problem to illustrate that CoPilot cannot write | code that requires mathematical reasoning. It regurgitates | solutions from the training set, via a mixed internal | reresentation. | deanmen wrote: | unsigned int swapbits(unsigned int a) { | bool bit6 = a & (1 << 5); bool bit17 = a & (1 << | 16); if (bit6 == bit17) return a; //bits are the | same, do nothing return (a ^ (1 << 5) ^ (1 << | 16)); // flip both 6th and 17th bits } | dskloet wrote: | What requires mathematical reasoning? Getting or setting | the nth bit? Or swapping two variables? What am I missing? | altcognito wrote: | Would we be able to generate unit tests? Strikes me that this | would be important to verify given that we didn't even "write" | the code. At some point we might not even be looking at the | generated code? I almost guarantee that's what is going to | happen eventually. | 37ef_ced3 wrote: | You can see it happening already. | | Solutions are posted, and they're wrong. | | But the CoPilot user can't see the code is wrong. | [deleted] | quenix wrote: | Here is what CoPilot wrote in Python: # A | function to swap the sixth bit and seventeenth bit of a 32-bit | signed integer. def swap_bits(x): # Get the | value of the sixth bit. bit6 = x & (1 << 5) | # Get the value of the seventeenth bit. bit17 = x & | (1 << 16) # Swap the values of the sixth and | seventeenth bit. bit6 = bit6 >> 5 bit17 = | bit17 >> 16 # Combine the swapped values of the | sixth and seventeenth bit. x = x ^ (bit6 << 16) | x = x ^ (bit17 << 5) return x | | It actually synthesised about 10 viable solutions. Some | interesting ones: def swap_six_seventeen(x): | # Get the binary representation of the integer. | binary = bin(x)[2:] # Add zeros to the beginning of | the binary representation. binary = '0' * (32 - | len(binary)) + binary # Swap the sixth and | seventeenth bit. binary = binary[:5] + binary[17] + | binary[5:17] + binary[18:] # Convert the binary back | to an integer. return int(binary, 2) | omnicognate wrote: | The first one (swap_bits) sets both bits to the same value, | which is the original two bits XORed together. Eg. | bin(swap_bits(0b_1_0000000000_0_00000)) | '0b10000000000100000' | bin(swap_bits(0b_0_0000000000_1_00000)) | '0b10000000000100000' | bin(swap_bits(0b_1_0000000000_1_00000)) '0b0' | bin(swap_bits(0b_0_0000000000_0_00000)) '0b0' | | The second one converts the value to a string and uses string | operations, which is wildly inefficient and a very common | mistake made by inexperienced programmers unaware of bitwise | operations (so presumably common in the training set). It | also attempts to swap the 6th and 17th _most_ significant | bits rather than the 6th and 17th _least_ significant bits, | i.e. counts in the opposite direction to the first one (the | comment doesn 't specify but typically you count from the | least significant bit in these situations). | | Worse, though, it gets the string manipulation completely | wrong. I think it's trying for `binary[:5] + binary[16] + | binary[6:16] + binary[5] + binary[17:]`, i.e. characters 1-5, | then character 17, then characters 7-16, then character 6, | then characters 18-32. The manipulation it does just | completely mangles the string. | | I'm very keen to try Github Copilot if they ever admit me to | the beta (I've been waiting forever) and will adopt it | enthusiastically if it's useful. However, this is exactly | what I've pessimistically expected. Analysing these truly | awful implementations to identify the subtle and bizarre | misbehaviours has taken me far, far longer than it would have | taken me to just write and test a working implementation | myself. And I'm supposed to evaluate 10 of these to see if | one of them might possibly do the right thing?!?! | Veedrac wrote: | The first example is almost correct, conditioned off a | sentence description. The second example is the right idea, | it just bit off more than it could chew when slicing it all | together. Using string ops for binary manipulation in | Python isn't even stupid; it can be faster in a lot of | cases. | | This feels a lot like screaming at a child for imperfect | grammar. | 37ef_ced3 wrote: | It illustrates that CoPilot is generating maximum | likelihood token strings and has no real understanding of | the code. | | That's what is happening here. There is no intelligence, | just regurgitation. Randomization and maximum likelihood | completion. | | Just like with the competitive programming example, we're | asking it to produce solutions that it has seen in its | training set. If you ask for a nontrivial twist on one of | those solutions, it fails. | hackinthebochs wrote: | >It illustrates that CoPilot is generating maximum | likelihood token strings and has no real understanding of | the code. | | Funny, today I was just thinking of people's tendencies | to dismiss AI advances with this very pattern of | reasoning: take a reductive description of the system and | then dismiss it as obviously insufficient for | understanding or whatever the target is. The assumption | is that understanding is fundamentally non-reductive, or | that there is insufficient complexity contained within | the reductive description. But this is a mistake. | | The fallacy is that the reductive description is glossing | over the source of the complexity, and hence where the | capabilities of the model reside. "Generating maximum | likelihood token strings" doesn't capture the complexity | of the process that generates the token strings, and so | an argument that is premised on this reductive | description cannot prove the model deficient. For | example, the best way to generate maximum likelihood | human text is just to simulate a human mind. Genuine | understanding is within the solution-space of the problem | definition in terms of maximum likelihood strings, thus | you cannot dismiss the model based on this reductive | description. | 37ef_ced3 wrote: | The difference between me and you is that I implement | neural nets professionally. Here is one of my (non- | professional) open source projects: https://NN-512.com | | I'm sure if you understood what the transformer was | doing, you would be less impressed. | hackinthebochs wrote: | This is the wrong context to go with an appeal to | authority. I know what the transformer is doing, I've | also developed neural networks before (though not | professionally). Your experience is working against you | in developing your intuition. There's another common | fallacy that because we're somehow "inside" the system, | that we understand exactly what is going on, or in this | case what isn't going on. Language models are composed of | variations of matrix multiplications, but that isn't a | complete description of their behavior. It's like saying | because we've looked inside the brain and there's just | electrical and chemical signals, the mind must reside | somewhere else. It's just a specious argument. | Veedrac wrote: | It got the value of the sixth and seventeenth bits, moved | them into the right positions, and inserted them into the | original value. Off a one-line description _written in | English_! I really cannot empathize with the idea that | this is not a meaningful capability. If intelligence only | means to you "equal in all capabilities to an experienced | human", you are never going to be able to see anything | coming ever. | 37ef_ced3 wrote: | If you ask CoPilot to solve something it hasn't seen, it | won't be able to solve it. | | It's a transformer. Do you understand what that means? | It's just matrix multiplication. | | It generates maximum likelihood token strings, based on | its training data. | | It doesn't "understand" what those token string mean. | | You are amazed because you're testing the transformer by | asking the transformer to generate human-written code | THAT IT WAS TRAINED ON. To make CoPilot fail, all you | have to do is ask it to generate something unlikely, | something it hasn't seen in training. | | Maximum likelihood token strings. Period. | omnicognate wrote: | You're misunderstanding my point. Nobody's screaming at | anything. Whether this thing is impressive isn't at | issue. It's utterly astonishing. | | I'm trying to figure out whether copilot in its current | form is a tool that will be useful to me in my job. (I'd | be able to do this evaluation properly if they'd just let | me on the damned beta.) | | Nearly right isn't good enough for this afaics. In fact, | I expect there to be a slightly paradoxical effect where | nearly-right is worse than obviously-wrong. An analysis | of a piece of code like I did above is time consuming and | cognitively taxing. An obviously wrong solution I can | just reject immediately. An almost-right (or at least | vaguely plausible) one like these takes _thought_ to | reject. Much more thought, in this case (for me, at | least) than just writing the thing myself in the first | place. | | Edit: BTW, I don't get what you're saying with | | "The first example is almost correct, conditioned off a | sentence description. The second example is the right | idea, it just bit off more than it could chew when | slicing it all together." | | The first one is completely (if subtly) wrong. It's | supposed to swap two bits but it sets them to the same | value. There's no interpretation of the description in | which that's correct. | | The second one is definitely not "the right idea". It | tries to do it with string manipulations, which | (regardless of the fact that it does so incorrectly) is | completely the wrong approach. This one is actually | "better" than the other in the paradoxical sense I | mentioned above, because I could reject it the moment I | saw it convert the number to a string. | Veedrac wrote: | > The second one is definitely not "the right idea". It | tries to do it with string manipulations, which | (regardless of the fact that it does so incorrectly) is | completely the wrong approach. This one is actually | "better" than the other in the paradoxical sense I | mentioned above, because I could reject it the moment I | saw it convert the number to a string. | | In this case string ops are a worse idea, but as I said | before, this is not generally true of Python, at least | when using CPython. Eg. the string method is | significantly the faster in this example: | # https://stackoverflow.com/a/20918545/1763356 | def reverse_mask(x): x = ((x & 0x55555555) << | 1) | ((x & 0xAAAAAAAA) >> 1) x = ((x & | 0x33333333) << 2) | ((x & 0xCCCCCCCC) >> 2) x | = ((x & 0x0F0F0F0F) << 4) | ((x & 0xF0F0F0F0) >> 4) | x = ((x & 0x00FF00FF) << 8) | ((x & 0xFF00FF00) >> 8) | x = ((x & 0x0000FFFF) << 16) | ((x & 0xFFFF0000) >> 16) | return x # My ver def | reverse_format(x): return | int(f"{x:032b}"[::-1], 2) | | Python's dynamic object overhead (and to a lesser extent, | interpreter overhead) makes a lot of seemingly-expensive | operations not matter very much. | Veedrac wrote: | Well, that also seems like the wrong question to ask. | Whether it's currently useful to you for writing short | algorithms, rather than as the non-programmer's API | interface it's primarily marketed as, seems about the | least interesting take-away for it. We'll get to | smoothing over the cracks later, once it's not a | capability we literally just discovered exists. Heck, | Codex is already not SOTA for that, AlphaCode is. | omnicognate wrote: | It may not be the question that interests you but who are | you to say it's the "wrong question" for me to ask? I | want to know if I'm on the verge of having access to a | tool that is going to transform the way I do my job, as | people keep claiming. | 37ef_ced3 wrote: | All wrong. These "solutions" are not correct. | UncleMeat wrote: | The xors are wrong. Stick in the integer 1<<16. | c4wrd wrote: | Ask and you shall receive, here is the output it gave me: // | Write a Go function to swap the sixth bit and seventeenth bit | of a 32-bit signed integer. package main | import ( "fmt" "math" ) | func main() { var a int32 = 0b101010101010101010101010 | 1010101010101010101010101010101010101010 | fmt.Printf("%b\n", a) fmt.Printf("%b\n", swapBits(a, | 6, 17)) } func swapBits(a int32, i | int, j int) int32 { // convert to binary bin | := fmt.Sprintf("%b", a) // get the bits bit1 | := bin[i-1 : i] bit2 := bin[j-1 : j] // swap | the bits bin = bin[:i-1] + bit2 + bin[i:] bin | = bin[:j-1] + bit1 + bin[j:] // convert back to int | return int32(bin2int(bin)) } func | bin2int(bin string) int64 { var sum int64 for | i, v := range bin { if v == '1' { sum += | int64(math.Pow(2, float64(len(bin)-i-1))) } | } return sum } | 37ef_ced3 wrote: | Ridiculous. It's a joke, right? Perhaps the most inefficient | and naive solution ever? | | CoPilot is regurgitating some solution from its training set, | the solution of an inept programmer who would manipulate bits | via conversion to string... yikes. | skulk wrote: | The next iteration of code assistant needs to be able to | parse responses like your comment and update the code | accordingly. Once a human+computer pair can converge on a | correct and admissible solution to _any_ tractable | programming task through natural language dialogue, we | should start worrying about our jobs going away. Until | then, for each line of code generated by AI, there will be | two jobs created to maintain that code. | electroly wrote: | Copilot can do that, sorta. You undo the completion and | add something like "... but don't convert it to a string" | to the comment, then have it try completing again. | hackinthebochs wrote: | Which direction in feature space do you move in response | to "you inept POS"? | jdrc wrote: | "And so in 2022 the species programmus programmicus went extinct" | udev wrote: | I am thinking whether this result can create a type of loop that | can self-optimize. | | We have AI to generate reasonable code from text problem | description. | | Now what if the problem description text is to generate such a | system in the first place? | | Would it be possible to close the loop, so to speak, so that over | many iterations: | | - text description is improved | | - output code is improved | | Would it be possible to create something that converges to | something better? | machiaweliczny wrote: | I am actually trying this. Basically by asking questions to AI | and teaching it to generate code / google when it doesn't know | something. The other process checks if code is valid and either | ask it to get more context or executes code and feeds back to | file :) | machiaweliczny wrote: | I think one can make problem "differentiable" via some | heuristics and if you have NN trained to rate code quality | and some understanding what should be used for type of | problem, memory and speed and than can classify problem to | group then rate solution it should be able to guide the | process (in competitive programming). | indiv0 wrote: | Do you have a blog or a github or something? This sounds | really neat. | wilde wrote: | Oh sweet! When can skip the bullshit puzzle phone screens? | doctor_eval wrote: | I sometimes read these and wonder if I need to retrain. At my | age, I'll struggle to get a job at a similar level in a new | industry. | | And then I remember that the thing I bring to the table is the | ability to turn domain knowledge into code. | | Being able to do competitive coding challenges is impressive, but | a very large segment of software engineering is about eliciting | what the squishy humans in management actually want, putting it | into code, and discovering as quickly as possible that it's not | what they really wanted after all. | | It's going to take a sufficiently long time for AI to take over | management that I don't think oldies like me need to worry too | much. | prideout wrote: | It is obvious to me that computer programming is an interesting | AI goal, but at the same time I wonder if I'm biased, because I'm | a programmer. The authors of AlphaCode might be biased in this | same way. | | I guess this makes sense though, from a practical point of view. | Verifying correctness would be difficult in other intellectual | disciplines like physics and higher mathematics. | thomasahle wrote: | Just make it output a proof together with the program. | EGreg wrote: | To me, coding in imperative languages are one of the hardest | things to produce an AI for with current approaches (CNN's, MCTS | and various backpropagation). Something like Cyc would seem to be | a lot more promising... | | And yet, I am starting to see (with GitHub's Copilot, and now | this) a sort of "GPT-4 for code". I do see many problems with | this, including: | | 1. It doesn't actually "invent" solutions on its own like | AlphaZero, it just uses and remixes from a huge body of work that | humans put together, | | 2. It isn't really ever sure if it solved the problem, unless it | can run against a well-defined test suite, because it could have | subtle problems in both the test suite and the solution if it | generated both | | This is a bit like readyplayer.me trying to find the closest | combination of noses and lips to match a photo (do you know any | open source alternatives to that site btw?) | | But this isn't really "solving" anything in an imperative | language. | | Then again, perhaps human logic is just an approaching with | operations using low-dimensional vectors, able to capture simple | "explainable" models while the AI classifiers and adversarial | training produces far bigger vectors that help model the | "messiness" of the real world and also find simpler patterns as a | side effect. | | In this case, maybe our goal shouldn't be to get solutions in the | form of imperative language or logic, but rather unleash the | computer on "fuzzy" inputs and outputs where things are "mostly | correct 99.999% of the time". The only areas where this could | fail is when some intelligent adversarial network exploits | weaknesses in that 0.001% and makes it more common. But for | natural phenomena it should be good enough ! | qualudeheart wrote: | Can you write more about how Cyc would help? The idea behind | Cyc is cool but I don't think I've seen anyone discuss using it | for program synthesis. | gfd wrote: | Relevant blogpost on codeforces.com (the competitive programming | site used): https://codeforces.com/blog/entry/99566 | | Apparently the bot would have a rating of 1300. Although the elo | rating between sites is not comparable, for some perspective, | mark zuckerberg had a rating of ~1k when he was in college on | topcoder: https://www.topcoder.com/members/mzuckerberg | baobabKoodaa wrote: | The median rating is not descriptive of median ability, because | a large number of Codeforces competitors only do one or a few | competitions. A very small number of competitors hone their | skills over multiple competitions. If we were to restrict our | sample to competitors with more than 20 competitions, the | median rating would be much higher than 1300. It's amazing that | Alphacode achieved a 1300 rating, but compared to humans who | actually practice competitive coding, this is a low rating. | | To clarify, this is a HUGE leap in AI and computing in general. | I don't mean to play it down. | YeGoblynQueenne wrote: | >> To clarify, this is a HUGE leap in AI and computing in | general. I don't mean to play it down. | | Sorry, but it's nothing of the sort. The approach is | primitive, obsolete, and its results are very poor. | | I've posted this three times already but the arxiv preprint | includes an evaluation against a formal benchmark dataset, | APPS. On that more objective measure of performance, the best | performing variant of AlphaCode tested, solved 25% of the | easiest tasks ("introductory") and less than 10% of the | intermediary ("interview") and advanced ("competition") | tasks. | | What's more, the approach that AlphaCode takes to program | generation is primitive. It generates _millions_ of candidate | programs and then it "filters" them by running them against | input-output examples of the target programs taken from the | problem descriptions. The filtering still leaves thousands of | candidate programs (because there are very few I/O examples | and the almost random generation can generate too many | programs that pass the tests, but still don't solve the | problem) so there's an additional step of clustering applied | to pare this down to 10 programs that are finally submitted. | Overall, that's a brute-force, almost random approach that is | ignoring entire decades of program synthesis work. | | To make an analogy, it's as if DeepMind had just published an | article boasting of its invention of a new sorting | algorithm... bubblesort. | gfd wrote: | You can find the rating distribution filtered for >5 contests | here: https://codeforces.com/blog/entry/71260 | | I am rated at 2100+ so I do agree that 1300 rating is low. | But at the same time it solved | https://codeforces.com/contest/1553/problem/D which is rated | at 1500 which was actually non-trivial for me already. I had | one wrong submit before getting that problem correct and I do | estimate that 50% of the regular competitors (and probably | the vast majority of the programmers commenting in this | thread right now) should not be able to solve it within 2hrs. | rfoo wrote: | 1553D is a quite confusing case though. | | On the AlphaCode Attention Visualization website [1], the | _Accepted_ code shown for 1553D is a O(n^2) Python one, | which is supposed to be TLE. It correctly implements a two- | pointer solution, but failed to "realize" that list.pop(0) | is O(n) in Python. I'm not sure how it passed. | | [1] https://alphacode.deepmind.com/#layer=30,problem=34,hea | ds=11... | Jensson wrote: | Likely the python runtime has a strange string | implementation for cases like this, just like javascript | strings. | the-smug-one wrote: | I'm trying to solve this for fun, but I'm stuck! I've got a | recursive definition that solves the problem by building a | result string. I think it's a dynamic programming problem, | but right now I can't see the shared sub-problems so :). | Some real sour cherries being experienced from not getting | this one! | johndough wrote: | The proposed O(N2) solution contains many unnecessary | operations, e.g. the creation of list c or reversal of the | input strings. Maybe it has been copied from a related | problem? You can easily solve the task with half as many | lines in O(N). for _ in | range(int(input())): a = list(input()) | b = list(input()) while a and b: | if a[-1] == b[-1]: a.pop() | b.pop() else: a.pop() | if a: a.pop() print("NO" if b else "YES") | pedrosorio wrote: | > But at the same time it solved | https://codeforces.com/problemset/problem/1553/D | | To be fair, it generated a set of (10) possible solutions, | and at least one of them solved the problem. | captain_price7 wrote: | For comparison, I used to be a very average, but pretty regular | user about 5 years ago. I could reliably solve easiest 2 out of | 5 problems, 3 in my lucky days. | | My rating is 1562. | jakey_bakey wrote: | At the risk of sounding relentlessly skeptical - surely by | training the code on GitHub data you're not actually creating an | AI to solve problems, but creating an extremely obfuscated | database of coding puzzle solutions? | ogogmad wrote: | _We validated our performance using competitions hosted on | Codeforces, a popular platform which hosts regular competitions | that attract tens of thousands of participants from around the | world who come to test their coding skills. We selected for | evaluation 10 recent contests, each newer than our training | data. AlphaCode placed at about the level of the median | competitor, marking the first time an AI code generation system | has reached a competitive level of performance in programming | competitions._ | | [edit] Is "10 recent contests" a large enough sample size to | prove whatever point is being made? | [deleted] | YeGoblynQueenne wrote: | The test against human contestants doesn't tell us anything | because we have no objective measure of the ability of those | human coders (they're just the median in some unknown | distribution of skill). | | There's more objective measures of performance, like a good, | old-fashioned, benchmark dataset. For such an evaluation, see | table 10 in the arxiv preprint (page 21 of the pdf), listing | the results against the APPS dataset of programming tasks. | The best performing variant of AlphaCode solves 25% of the | simplest ("introductory") APPS tasks and less than 10% of the | intermediary ("interview") and more advanced ones | ("competition"). | | So it's not very good. | | Note also that the article above doesn't report the results | on APPS. _Because_ they 're not that good. | solididiot wrote: | Does it need to solve original problems? Most of the code we | write is dealing with the same problems in a slightly different | context each time. | | As others say in commends it might be the case where we meet in | the middle. Us writing some form of tests for AI-produced code | to pass. | qualudeheart wrote: | That's been a common objection to Copilot and other recent | program synthesis papers. | | The models regurgitate solutions to problems already | encountered in the training set. This is very common with | Leetcode problems and seems To still happen with harder | competitive programming problems. | | I think someone else in this thread even pointed put an example | of AlphaCode doing the same thing. | FiberBundle wrote: | It never ceases to amaze me what you can do with these | transformer models. They created millions of potential solutions | for each problem, used the provided examples for the problems to | filter out 99% of incorrect solutions and then applied some more | heuristics and the 10 available submissions to try to find a | solution. | | All these approaches just seem like brute-force approaches: Let's | just throw our transformer on this problem and see if we can get | anything useful out of this. | | Whatever it is, you can't deny that these unsupervised models | learn some semantic representations, but we have no clue at all | what that actually is and how these model learn that. But I'm | also very sceptical that you can actually get anywhere close to | human (expert) capability in any sufficiently complex domain by | using this approach. | bricemo wrote: | What do you think then is the difference between going from | 50th to 99.9th percentile in their other domains? Is there | something materially different between ago, protein folding, or | coding? (I don't know the answer, just curious if anyone else | does) | jahewson wrote: | That's a big question but I'm tempted to answer it with a | yes. A protein sequence contains a complete description of | the structure of a protein but a coding question contains | unknowns and the answers contain subjective variability. | FiberBundle wrote: | Well with respect to Go the fundamental difference afaict is | that you can apply self-supervised learning, which is an | incredibly powerful approach (But note e.g. that even this | approach wasn't successful in "solving" Starcraft). | Unfortunately it's extremely difficult to frame real-world | problems in that setting. I don't know anything about | protein-folding and don't know what Deepmind uses to try to | solve that problem, so I cannot comment on that. | cjbprime wrote: | > this approach wasn't successful in "solving" Starcraft) | | Why do you say that? As I understand it, AlphaStar beat | pros consistently, including a not widely reported | showmatch against Serral when he was BlizzCon champ. | zwaps wrote: | Not once humans adapted to it afaik. AlphaStar got to top | grandmaster level and then that was it, as people found | ways to beat it. Now, it may be that the team considered | the project complete and stopped training it. But | technically - as it stands - Starcraft is still the one | game where humans beat AI. | gavagai691 wrote: | Two possible reasons. | | 1. First, though I am not sure of this (i.e. this should | be verified), I heard that the team working on AlphaStar | initially tried to create a Starcraft AI entirely through | "self-play," but this was not successful. (Intuitively, | in a real-time game, there are too many bad options too | early on that even with a LOT of time to learn, if your | approach is too "random" you will quickly enter an | unwinnable position and not learn anything useful.) As a | result, they replaced this approach with an approach | which incorporated learning from human games. | | 2. "including a not widely reported showmatch against | Serral when he was BlizzCon champ." is a | mischaracterization. It was not a "showmatch," rather | there was a setup at Blizzcon where anyone could sit down | and play against AlphaStar, and Serral at some point sat | down to play AlphaStar there. He went 0-4 vs AlphaStar's | protoss and zerg, and 1-0 vs its Terran. However, not | only was he not using his own keyboard and mouse, but he | could not use any custom hotkeys. If you do not play | Starcraft it may not be obvious just how large of a | difference this could make. BTW, when Serral played | (perhaps an earlier iteration of) AlphaStar's terran on | the SC2 ladder, he demolished it. | | I remember when seeing the final report, I was a bit | disappointed. It seemed like they cut the project off at | a strange point, before AlphaStar was clearly better than | humans. I feel that if they had continued they could have | gotten to that point, but now we will never know. | briga wrote: | Another way to frame it is that these models still perform very | poorly at the task they're designed to do. Imagine if real | programmer needed to write a solution a hundred times before | they were able to achieve (average) performance. You'd probably | wonder if it was just blind luck that got them to the solution. | You'd also fire them. What these models are very good at doing | is plagiarizing content, so part of me wonders if they aren't | just copying previous solutions with slight adjustments. | zmmmmm wrote: | Has nobody yet asked it to write itself? | ensan wrote: | Wake me up when an AI creates an operating system on the same | level of functionality as early-years Linux. | timetotea wrote: | If you want some video explanation https://youtu.be/Qr_PCqxznB0 | pedrobtz wrote: | What about finding bugs, zero-day exploits? | erwincoumans wrote: | It would be interesting if a future 'AlphaZeroCode' with access | to a compiler and debugger can learn to code, generating data | using self-play. Haven't read the paper yet, seems some | impressive milestone. | [deleted] | throwaway5752 wrote: | Most people here are programmers (or otherwise involved in the | production of software). We shouldn't look at RPA and other job | automation trends dispassionately. SaaS valuations aren't were | they are (and accounting doesn't treat engineering salary as cost | of goods sold) because investors believe that they will require | armies of very well paid developers in perpetuity. | countvonbalzac wrote: | what? | londons_explore wrote: | > AlphaCode placed at about the level of the median competitor, | | In many programming contests, a large number of people can't | solve the problem at all, and drop out without submitting | anything. Frequently that means the median scoring solution is a | blank file. | | Therefore, without further information, this statement shouldn't | be taken to be as impressive as it sounds. | [deleted] | d0mine wrote: | It reminds me that median reputation on StackOverflow is 1. All | AlphaSO would have to do is to register to receive median | reputation on SO ;) (kidding aside AlphaCode sounds like magic) | | Inventing relational DBs hasn't replaced programmers, we just | write custom DB engines less often. Inventing electronic | spreadsheets hasn't deprecated programmers, it just means that we | don't need programmers for corresponding tasks (where | spreadsheets work well). | | AI won't replace programmers until it grows to replace the | humanity as a whole. | falcor84 wrote: | >AI won't replace programmers until it grows to replace the | humanity as a whole. | | Yes, but after seeing this progress in the former, my time | estimate of time remaining until the latter had just | significantly shortened. | qualudeheart wrote: | I don't even think the "will AI replace human programmers" | question is that interesting anymore. My prediction is that a | full replacement won't happen until we achieve general | artificial intelligence, and have it treat programming as it | would any other problem. | | Elsewhere ITT I've claimed that to fully automate programming | you also need a model of the external world that's on par with | a humans. | | Otherwise you can't work a job because you don't know how to do | the many other tasks that aren't coding. | | You need to understand what the business goals are and how your | program solves them. | a-dub wrote: | > In our preprint, we detail AlphaCode, which uses transformer- | based language models to generate code at an unprecedented scale, | and then smartly filters to a small set of promising programs | | if you're using a large corpus of code chunks from working | programs as symbols in your alphabet, i wonder how much entropy | there actually is in the space of syntactically correct solution | candidates. | softwaredoug wrote: | I think CoPilot, etc will be revolutionary tools AND I think | human coders are needed. Specifically I love CoPilot for the task | of "well specified algorithm to solve problem with well-defined | inputs and outputs". The kind of problem you could describe as a | coding challenge. | | BUT, our jobs have a lot more complexity | | - Local constraints - We almost always work in a large, complex | existing code base with specific constraints | | - Correctness is hard - writing lots of code is usually not the | hard part, it's proving it correct against amorphous | requirements, communicated in a variety of human social contexts, | and bookmarked. | | - Precision is extremely important - Even if 99% of the time, | CoPilot can spit out a correct solution, the 1% of the time it | doesn't creates a bevy of problems | | Are those insurmountable problems? We'll see I suppose, but we | begin to verge on general AI if we can gather and understand half | a dozen modalities of social context to build a correct solution. | | Not to mention much of the skill needed in our jobs has much more | to do with soft skills, and the bridge between the technical and | the non technical, and less to do with hardcore heads-down | coding. | | Exciting times! | tasubotadas wrote: | I just hope that this shows how useless competitive programming | is that it can be replace by the Transformer-model. | | Additionally, people should REALLY rething their coding | interviews if they can be solved by a program. | msoad wrote: | This seems to have a narrower scope than GitHub Copilot. It | generates more lines of code to a more holistic problem vs. | GitHub Copilot that works as a "more advanced autocomplete" in | code editors. Sure Copilot can synthesize full functions and | classes but for me, it's the most useful when it suggests another | test case's title or writes repetitive code like this.foo = foo; | this.bar = bar etc... | | Having used Copilot I can assure you that this technology won't | replace you as a programmer but it will make your job easier by | doing things that programmers don't like to do as much like | writing tests and comments. | ipnon wrote: | The big question seems to be whether par with professional | programmers is a matter of increasing training set and flop | size, or whether different model or multi-model architectures | are required. | | It does look like we've entered an era where programmers who | don't use AI assistants will be disadvantaged, and that this | era has an expiration date. | stupidcar wrote: | Having used Copilot for a while, I am quite certain it _will_ | replace me as a programmer. | | It appears to me that when it comes to language models, | intelligence = experience * context. Where experience is the | amount what's encoded in the model, and context is the prompt. | And the biggest limitation on Copilot currently is context. It | behaves as an "advanced autocomplete" because it all is has to | go on is what regular autocomplete sees, e.g. the last few | characters and lines of code. | | So, you can write a function name called createUserInDB() and | it will attempt to complete it for you. But how does it know | what DB technology you're using? Or what your user record looks | like? It doesn't, and so you typically end up with a "generic" | looking function using the most common DB tech and naming | conventions for your language of choice. | | But now imagine a future version of Copilot that is | automatically provided with a lot _more_ context. It also gets | fed a list of your dependencies, from which it can derive which | DB library you 're using. It gets any locatable SQL schema | file, so it can determine the columns in the user table. It | gets the text of the Jira ticket, so it can determine the | requirements. | | As a programmer a great deal of time is spent checking these | different sources and synthesising them in your head into an | approach, which you then code. But they are all just text, of | one form or another, and language models can work with them | just as easily, and much faster, than you can. | | And one the ML train coding gets running, it'll only get | faster. Sooner or later Github will have a "Copilot bot" that | can automatically make a stab at fixing issues, which you then | approve, reject, or fix. And as thousands of these issues pile | up, the training set will get bigger, and the model will get | better. Sooner or later it'll be possible to create a repo, | start filing issues, and rely on the bot to implement | everything. | solarmist wrote: | I'm skeptical it'll replace programmers, as in no more human | programmers, but agree in the sense 100% human programmers -> | 50%, 25%, 10% human programmers + computers doing most of the | writing of actual code. | | I see it continuing to evolve and becoming a far superior | auto-complete with full context, but, short of actual general | AI, there will always be a step that takes a high-level | description of a problem and turns it into something a | computer can implement. | | So while it will make the remaining programmers MUCH more | productive, thereby reducing the needed number of | programmers, I can't see it driving that number to zero. | mabub24 wrote: | It will probably change the types of things a programmer | does, and what it looks like to be a programmer. The nitty | gritty of code _writing_ will probably get more and more | automated. But the architecture of the code, and | establishing and selecting it 's purpose in the larger | scheme of a business, will probably be more what | programmers do. Essentially, they might just become | managers for automated code writers, similar to the | military's idea of future fighter pilots relating to | autonomous fighters/drones as described in this article: | | https://www.newyorker.com/magazine/2022/01/24/the-rise-of- | ai... | | Maybe. It might never get to that level though. | solarmist wrote: | Yup, I think that's it exactly. I just described this in | another comment as a reverse of the evolution that | graphic design has undergone in bringing them into | programming front-ends. | | I can't wait to see how far we're able to go down that | path. | TSiege wrote: | I have a feeling this is the correct read in terms of | progression. But I'm skeptical if it'll ever be able to | synthesize a program entirely. I imagine that in the future | we'll have some sort of computer language more like written | language that will be used by some sort of AI to generate | software to meet certain demands, but might need some manual | connections when requirements are hazy or needs a more human | touch in the UI/UX | Veedrac wrote: | > But I'm skeptical if it'll ever be able to synthesize a | program entirely. | | Emotional skepticism carries a lot more weight in worlds | where AI isn't constantly doing things that are meant to be | infeasible, like coming 54th percentile in a competitive | programming competition. | | People need to remember that AlexNet is 10 years old. At no | point in this span have neural networks stopped solving | things they weren't meant to be able to solve. | solarmist wrote: | I feel like you're taking that sentence a bit too | literally. I read it as "I'm skeptical that AI will ever | be able to take a vague human description from a product | manager/etc. and solve it without an engineer-type person | in the loop." The issue is humans don't know what they | want and realistically programs require a lot of | iteration to get right, no amount of AI can solve that. | | I agree with you; it seems obvious to me that once you | get to a well-specified solution a computer will be able | to create entire programs that solve user requirements. | And that they'll start small, but expand to larger and | more complex solutions over time in the same way that no- | code tools have done. | Hgsb wrote: | Google Ambiguity. | sharemywin wrote: | To me it's not about it's current capabilities. It's the | trajectory. This tech wasn't even a thing 2 years ago. There's | billions being poured into it and every time someone uses these | tools there's more free training data. | chongli wrote: | _repetitive code like this.foo = foo; this.bar = bar etc..._ | | This sort of boilerplate code is best solved by the programming | language. Either via better built-in syntax or macros. Using an | advanced machine learning model to generate this code is both | error-prone and a big source of noise and code bloat. This is | not an issue that will go away with better tooling; it will | only get worse. | xmprt wrote: | I don't think I agree. Most people spend more time reading | than writing code so programming languages should be | optimized to be easier to read whereas tooling should be made | to simplify writing code. New syntax or macros sounds like it | would make the language harder to read. I agree that an | advanced machine learning model for generating boilerplate | code isn't the right approach but I also don't think we | should extend languages for this. Tooling like code | generators and linters are a good middle ground. | RangerScience wrote: | FYI+IMO: Both Ruby and Scala have excellent ways to reduce | these issues that occur at the language level, and make it | easier to both read and write. I don't know either way if | that means you should extend languages to handle it, but at | least it's definitively possible to write the language that | way from the beginning. | | Otherwise yup, agree with you; ML for problematic | boilerplate isn't the right approach, but other code | generators and linters are really good and get you most of | the way there. | orangecat wrote: | _New syntax or macros sounds like it would make the | language harder to read._ | | Often the opposite is true. For example Java records are | far easier to read and understand than the pages of | boilerplate that they replace. | valyagolev wrote: | it is a very similar argument to the one for powerful IDEs | and underwhelming languages. to be fair, it's not necessarily | fruitless - e.g. with smalltalk. i fail to see the analoguous | smalltalk-style empowerment of language using AI but perhaps | something is there. | | anyway. programming is automation; automation of programming | is abstraction. using AI to write your code is just a bad | abstraction - we are used to them | jxcole wrote: | I feel like you are very defensive here and I want to be sure | we take time to recognize this as a real accomplishment. | | Seriously though, I do doubt I can be fully replaced by a robot | any time soon, it may be the case that soon enough I can make | high-level written descriptions of programs and hand them off | to an AI to do most of the work. This wouldn't completely | replace me, but it could make developers 50x productive. The | question is how elastic is the market...can the market grow in | step with our increase in productivitiy? | | Also, please remember that as with anything, within 5 years we | should see vast improvements to this AI. I think it will be an | important thing to watch. | nsxwolf wrote: | Yesterday, I spent several hours figuring out if the business | requirement for "within the next 3 days" meant 3 calendar | days or 72 hours from now. Then about 10 minutes actually | writing the code. Everyone thought my efforts were very | valuable. | RangerScience wrote: | 100%. What makes us what we are is the mindset (in this | case, this kind of "attention to detail); that didn't | change with (first) compilers, (then) scripting languages, | or (future?) AI-assisted programming. | | PS - Lawyers aren't even as detail-oriented as we are, it's | surprising. | solarmist wrote: | Really? | | Maybe that's true in general because the spread in skill | for being able to make a living as a lawyer and the same | as a programmer depends far less on that attention to | detail being a core skill. Still, I wonder if that also | holds at the high levels of the profession. I get the | impression that at the FAANG-level, lawyers would compare | pretty favorably to programmers in detail orientation. In | particular, patent and contract law. | | That said, it's just my general impression of what | lawyers get up to. | | ...Hmm, thinking about the contract law thing a bit more. | Yeah, I do believe you are right. Lawyers aren't writing | nearly as many extremely detail-oriented texts as | programmers are on a day-to-day basis. Their jobs are | much more around finding, reading, and understanding | those things and building stories around them. | visarga wrote: | The GPT family has already shown more than 50x productivity | increase by being able to solve not one, but hundreds and | perhaps thousands of tasks on the same model. We used to need | much more data, and the model would be more fragile, and | finding the right architecture would be a problem. Now we | plug a transformer with a handful of samples and it works. | | I just hope LMs will prove to be just as useful in software | development as they are in their own field. | thomasahle wrote: | If you make developers 50x more efficient, won't you need 50x | fewer developers? | bmh100 wrote: | Not necessarily. Demand may be much higher than available | supply right now. Tech companies will continue to compete, | requiring spending on developers to remain competitive. | Software is unlike manufacturing, in that the output is a | service, not a widget. Worker productivity in general has | not decreased the demand for full work weeks, despite | projections in the early 20th century to the contrary. Of | course, it is possible that fewer developers would be | needed, but I don't think it's likely, yet. | alasdair_ wrote: | >If you make developers 50x more efficient, won't you need | 50x fewer developers? | | Developers today are 50X more efficient than when they had | to input machine code on punched tape, yet the number of | developers needed today is far larger than it was in those | times. | throw10920 wrote: | There's no reason to believe that we'll need _another_ | 50x more developers, though. | solarmist wrote: | There isn't? I feel like there's still a ton of places | software hasn't even touched and not because it doesn't | make sense, but because no one's gotten to it. It's not | the most profitable thing people could write software | for. | alasdair_ wrote: | Even if not, the original claim was that we may see a 50X | _decrease_ and I personally don 't think that is likely, | pre-Singularity anyway :) | qualudeheart wrote: | But think how large of a job program that would have | been. | | Hundreds of people manually writing assembly and paid | middle class wages. Not a compiler in sight. | | In the years leading up to the singularity I'd expect to | see a lot of Graeberian "Bullshit Jobs". | | Everyone knows they're BS but as a society we allow them | because we aren't willing to implement socialism or UBI. | woadwarrior01 wrote: | https://en.m.wikipedia.org/wiki/Jevons_paradox | kevlened wrote: | Greater efficiency leads to greater consumption unless | demand is saturated. Given software's ability to uncover | more problems that are solvable by software, we're more | likely to build 50x more software. | RangerScience wrote: | This happened with the introduction of power tools to set | building in Hollywood back in the day - literally this same | question. | | People just built bigger sets, and smaller productions | became financially feasible. Ended up creating demand, not | reducing it. | 0xdeadbeefbabe wrote: | > but it could make developers 50x productive | | More likely it will translate the abstraction level by some | vector of 50 elements. | blt wrote: | I am always surprised by the amount of skepticism towards deep | learning on HN. When I joined the field around 10 years ago, | image classification was considered a grand challenge problem | (e.g. https://xkcd.com/1425/). 5 years ago, only singularity | enthusiast types were envisioning things like GPT-3 and Copilot | in the short term. | | I think many people are uncomfortable with the idea that their | own "intelligent" behavior is not that different from pattern | recognition. | | I do not enjoy running deep learning experiments. Doing resource- | hungry empirical work is not why I got into CS. But I still | believe it is very powerful. | jonas_kgomo wrote: | Genuine question, what are the reasons to be a software engineer | without much ML knowledge in 2022. Seems like a wake up call for | developers | eulers_secret wrote: | > what are the reasons to be a software engineer without much | ML knowledge in 2022. | | I'm not quite sure what you're asking, but my reason is that I | _do not enjoy_ working on /with ML. I'd personally rather quit | the industry. | | But I work in embedded/driver development. I do not worry about | ML models replacing me yet, but if I were just gluing together | API calls I would be a bit worried and try to specialize. | qualudeheart wrote: | Find something that's hard and interesting. Someone will | probably have a business trying to solve it and will hire you. | jonas_kgomo wrote: | 7 months ago, I asked natfriedman the same question, of which | he responded: "We think that software development is entering | its third wave of productivity change. The first was the | creation of tools like compilers, debuggers, garbage | collectors, and languages that made developers more productive. | The second was open source where a global community of | developers came together to build on each other's work. The | third revolution will be the use of AI in coding. The problems | we spend our days solving may change. But there will always be | problems for humans to solve." | | https://news.ycombinator.com/item?id=27676266&p=2 | slingnow wrote: | Genuine question: what are the reasons to be a carpenter | without much robotics / automation knowledge in 2022. Seems | like a wakeup call for carpenters. | 0xdeadbeefbabe wrote: | I hope you are right, but just to answer the question: all | those other AI winters. | jonas_kgomo wrote: | Thats a good meditation. I think the winters were more driven | by research dichotomy, for example Marvin Minsky's critique | of the perceptron really slowed the research by 10 years. | Advances made thus far have too much commercial relevance | that companies invested dont look like they are gonna stop | soon. But its a valid point. Looks like there is more upside | being in subsets of computing like quantum computing, web3, | metaverse etc than being a regular front-end engineer | agentultra wrote: | This is kind of neat. I wonder if it will one day be possible for | it to find programs that maintain invariant properties we state | in proofs. This would allow us to feel confident that even though | it's generating huge programs that do weird things a human might | not think of... well that it's still _correct_ for the stated | properties we care about, ie: that it 's not doing anything | underhanded. | jdrc wrote: | I think it would be interesting the train a system end-to-end | with assembly code instead of various programming languages. This | would make it a much more generic compiler | ahgamut wrote: | I find almost every new advance in deep learning is accompanied | by contrasting comments: it's either "AI will soon automate | programming/<insert task here>", or "let me know when AI can | actually do <some-difficult-task>". There are many views on this | spectrum, but these two are sure to be present in every comment | section. | | IIUC, AlphaCode was trained on Github code to solve competitive | programming challenges on Codeforces, some of which are | "difficult for a human to do". Suppose AlphaCode was trained on | Github code that contains the entire set of solutions on | Codeforces, is it actually doing anything "difficult"? I don't | believe it would be difficult for a human to solve problems on | Codeforces when given access to the entirety of Github (indexed | and efficiently searchable). | | The general question I have been trying to understand is this: is | the ML model doing something that we can _quantify_ as | "difficult to do (given this particular training set)"? I would | like to compute a number that measures how difficult it is for a | model to do task X given a large training set Y. If the X is part | of the training set, the difficulty should be _zero_. If X is | obtained only by combining elements in the training, maybe it is | harder to do. My efforts to answer this question: | https://arxiv.org/abs/2109.12075 | | In recent literature, the RETRO Transformer | (https://arxiv.org/pdf/2112.04426.pdf) talks about "quantifying | dataset leakage", which is related to what I mentioned in the | above paragraph. If many training samples are also in the test | set, what is the model actually learning? | | Until deep learning methods provide a measurement of | "difficulty", it will be difficult to gauge the prowess of any | new model that appears on the scene. | pedrosorio wrote: | > Suppose AlphaCode was trained on Github code that contains | the entire set of solutions on Codeforces, is it actually doing | anything "difficult"? | | They tested it on problems from recent contests. The | implication being: the statements and solutions to these | problems were not available when the Github training set was | collected. | | From the paper [0]: "Our pre-training dataset is based on a | snapshot of selected public GitHub repositories taken on | 2021/07/14" and "Following our GitHub pre-training dataset | snapshot date, all training data in CodeContests was publicly | released on or before 2021/07/14. Validation problems appeared | between 2021/07/15 and 2021/09/20, and the test set contains | problems published after 2021/09/21. This temporal split means | that only information humans could have seen is available for | training the model." | | At the very least, even if some of these problems had been | solved exactly before, you still need to go from "all of the | code in Github" + "natural language description of the problem" | to "picking the correct code snippet that solves the problem". | Doesn't seem trivial to me. | | > I don't believe it would be difficult for a human to solve | problems on Codeforces when given access to the entirety of | Github (indexed and efficiently searchable). | | And yet, many humans who participate in these contests are | unable to do so (although I guess the issue here is that Github | is not properly indexed and searchable for humans?). | | [0] https://storage.googleapis.com/deepmind- | media/AlphaCode/comp... | ahgamut wrote: | > They tested it on problems from recent contests. The | implication being: the statements and solutions to these | problems were not available when the Github training set was | collected. | | Yes, and I would like to know how similar the dataset(s) | were. Suppose the models were trained only on greedy | algorithms and then I provided a dynamic programming problem | in the test set, (how) would the model solve it? | | > And yet, many humans who participate in these contests are | unable to do so (although I guess the issue here is that | Github is not properly indexed and searchable for humans?). | | Indeed, so we don't know what "difficult" means for | <human+indexed Github>, and hence we cannot compare it to | <model trained on Github>. | | My point is, whenever I see a new achievement of deep | learning, I have no frame of reference (apart from my | personal biases) of how "trivial" or "awesome" it is. I would | like to have a quantity that measures this - I call it | generalization difficulty. | | Otherwise the datasets and models just keep getting larger, | and we have no idea of the full capability of these models. | pedrosorio wrote: | > Suppose the models were trained only on greedy algorithms | and then I provided a dynamic programming problem in the | test set, (how) would the model solve it? | | How many human beings do you personally know who were able | to solve a dynamic programming problem at first sight | without ever having seen anything but greedy algorithms? | | Deepmind is not claiming they have a machine capable of | performing original research here. | | Many human programmers are unable to solve DP problems even | after having them explained several times. If you could get | a machine that takes in all of Github and can solve "any" | DP problem you describe in natural language with a couple | of examples, that is AI above and beyond what many humans | can do, which is "awesome" no matter how you put it. | sibeshk96 wrote: | > that is AI above and beyond what many humans can do, | which is "awesome" no matter how you put it. | | That's not the point being made. The point OP is making | is that it is not possible to understand how impressive | at "generalizing" to uncertainty a model is if you don't | know how different the training set is from the test set. | If they are extremely similar to each other, then the | model generalizes weakly (this is also why the world's | smartest chess bot needs to play a million games to beat | the average grandmaster, who has played less than 10,000 | games in her lifetime). Weak generalization vs strong | generalization. | | Perhaps all such published results should contain info | about this "difference" so it becomes easier to judge the | model's true learning capabilities. | ahgamut wrote: | > How many human beings do you personally know who were | able to solve a dynamic programming problem at first | sight without ever having seen anything but greedy | algorithms? | | Zero, which is why if a trained network could do it, that | would be "impressive" to me, given my personal biases. | | >. If you could get a machine that takes in all of Github | and can solve "any" DP problem you describe in natural | language with a couple of examples, that is AI above and | beyond what many humans can do, which is "awesome" no | matter how you put it. | | I agree with you that such a machine would be awesome, | and AlphaCode is certainly a great step closer towards | that ideal. However, I would like to have a number | measures the "awesomeness" of the machine (not elo rating | because that depends on a human reference), so I will | have something as a benchmark to refer to when the next | improvement arrives. | pedrosorio wrote: | I understand wanting to look at different metrics to | gauge progress, but what is the issue with this? | | > not elo rating because that depends on a human | reference | sibeshk96 wrote: | Using my previous chess analogy, the world's smartest | chess bot has played a million games to beat the average | grandmaster, who has played less than 10,000 games in her | lifetime. So while they both will have the same elo | rating, which is a measure of how well they are at the | narrow domain of chess, there is clearly something | superior about the how the human grandmaster learns from | just a few data points i.e. strong generalization vs the | AI's weak generalization. Hence the task-specific elo | rating does not give enough context to understand how | well a model adapts to uncertainty. For instance - a | Roomba would beat a human hands down if there was an elo | rating for vacuuming floors. | ahgamut wrote: | The Turing Test | (https://en.wikipedia.org/wiki/Turing_test) for | artificial intelligence required the machine to convince | a human questioner that it was a human. Since then, most | AI methods rely on a human reference of performance to | showcase their prowess. I don't find this appealing | because: | | 1) It's an imprecise target: believers can always hype | and skeptics can always downplay improvements. Humans can | do lots of different things somewhat well at the same | time, so a machine beating human-level performance in one | field (like identifying digits) says little about other | fields (like identifying code vulnerabilities). | | 2) ELO ratings, or similar metrics are measurements of | _skill_ , and can be brute-forced to some extent, | equivalent to grinding up levels in a video game. Brute- | forcing a solution is "bad", but how do we know a new | method is "better/more elegant/more efficient"? For | algorithms we have Big-O notation, so we know (brute | force < bubble sort < quick sort), perhaps there is an | analogue for machine learning. | | I would like performance comparisons that focus on | quantities unique to machines. I don't compare the | addition of computer processors with reference to human | addition, so why not treat machine intelligence | similarly? | | There are many interesting quantities with which we can | compare ML models. Energy usage is a popular metric, but | we can also compare the structure of the network, the | code used, the hardware, the amount of training data, the | amount of training time, and the similarity between | training and test data. I think a combination of these | would be useful to look at every time a new model | arrives. | mwattsun wrote: | Seems to me that this accelerates the trend towards a more | declarative style of programming where you tell the computer what | you want to do, not how to do it | aidenn0 wrote: | > Creating solutions to unforeseen problems is second nature in | human intelligence | | If this is true then a lot of the people I know lack human | intelligence... | algon33 wrote: | How suprising did you guys find this? I'd have said there was a | 20% chance of this performing at the median+level if I was asked | to predict things beforehand. | Isinlor wrote: | There is a prediction market called Metaculus. | | On Dec 31, 2016 in partnership with Center for the Study of | Existential Risk, Machine Intelligence Research Institute, and | The Future of Life Institute they asked: | | How long until a machine-learning system can take a simple text | description and turn it into a program coded in C/Python? | | https://www.metaculus.com/questions/405/when-will-programs-w... | | First 19 forecasters in March 2017 were predicting mid-2021, | the best forecasters were predicting late 2024. When the | question closed in 2020 the community was predicting January | 2027 and the best forecasters were predicting March 2030. | | The question resolved on July 2021 when Codex was published. | | Community and the best forecasters were assigning ~15% that it | will happen by July 2021. | | I'm currently 14th best forecaster there and I was predicting | 33% before July 2021. It was my last prediction, and it was | made on October 2018. | | I'm also predicting 75% that we will have AGI by 2040 as | defined in this question: | | https://www.metaculus.com/questions/3479/when-will-the-first... | | 20% that it will happen before 2030. | | There is also stronger operationalization: | | https://www.metaculus.com/questions/5121/when-will-the-first... | | My prediction here is 60% before 2040 and 5% before 2030. | | I have also "canary in the coal mine" questions: | | When will AI achieve competency on multi-choice questions | across diverse fields of expertise? Community predicts 50% | before 2030, I agree. | | https://www.metaculus.com/questions/5276/ai-competence-in-di... | | When will AI be able to learn to play Montezuma's Revenge in | less than 30 min? Community predicts 50% before 2025, I think | 50% before 2027. | | https://www.metaculus.com/questions/5460/ai-rapidly-learning... | baobabKoodaa wrote: | I would have said there is a ~0% chance of this happening | within our lifetimes. | hackinthebochs wrote: | I didn't find it very surprising, but then I tend to be more | optimistic than average about the capabilities of transformer | models and the prospect of general AI in the relatively near | term. | machiaweliczny wrote: | I am surprised, as recently OpenAI had ~25% of easy problems | and ~2% in competitive problems. Seems like DeepMind is ahead | in this topic as well. | | Actually I think Meta AI had some interesting discovery | recently that could possibly improve NNs in genral, so probably | this as well. | | I am not in field but wonder if some other approaches like | Tsetlin machines would be more useful for programming. | marcusbuffett wrote: | I would have guessed around the same chance, this was | surprising to me after playing around with copilot and not | being impressed at all. | knowmad wrote: | I agree with most of the comments I've read in this thread. | Writing code to solve a well defined narrowly scoped problem | isn't that hard or valuable. It's determining what the problem | actually is and how software could be used to solve it that is | challenging and valuable. | | I would really like to see more effort in the AI/ML code | generation space being put into things like code review, and | system observation. It seems significantly more useful to use | these tools to augment human software engineers rather than | trying to tackle the daunting and improbable task of completely | replacing them. | | *Note: as a human software engineer I am biased | [deleted] | FemmeAndroid wrote: | This is extremely impressive, but I do think it's worth noting | that these two things were provided: | | - a very well defined problem. (One of the things I like about | competitive programming and the like is just getting to implement | a clearly articulated problem, not something I experience on most | days.) - existing test data. | | This is definitely a great accomplishment, but I think those two | features of competitive programming are notably different than my | experience of daily programming. I don't mean to suggest these | will always be limitations of this kind of technology, though. | baobabKoodaa wrote: | > One of the things I like about competitive programming and | the like is just getting to implement a clearly articulated | problem | | English versions of Codeforces problems may be well-defined but | they are often very badly articulated and easy to misunderstand | as a human reader. I still can't understand how they got AI to | be able to generate plausible solutions from these problem | statements. | jakub_g wrote: | 100% agree. Someone (who?) had to take time and write the | detailed requirements. In real jobs you rarely get good tickets | with well defined expectations; it's one of most important | developer's jobs to transform fuzzy requirement into a good | ticket. | | (Side note: I find that many people skip this step, and go | straight from fuzzy-requirement-only-discussed-on-zoom-with-Bob | to code; open a pull request without much context or comments; | and then a code reviewer is supposed to review it properly | without really knowing what problem is actually being solved, | and whether the code is solving a proper problem at all). | jensensbutton wrote: | Maybe the problem transformation will be both the beginning | _and_ end of the developer's role. | ctoth wrote: | So what happens when OpenAI releases TicketFixer 0.8 which | synthesizes everything from transcripts of your meetings to | the comments to the JIRA ticket to the existing codebase and | spits out better tickets to feed into the programming side? | solarmist wrote: | Yup, I hope that'll happen. Then engineers would just end | up being done at a higher level of abstraction closer to | what designers do with wireframes and mockups. | | Kind of the opposite of the way graphic design has evolved. | Instead of getting more involved in the process and, in | many cases, becoming front-end developers, it'll become | more abstract where humans make the decisions and reason | about what to include/exclude, how it'll flow, etc. | | Even TicketFixer wouldn't be able to do more than offer a | handful of possible solutions to design-type issues. | bmhin wrote: | Yeah, we need our TicketFixer to also include the No_Bob | 0.2 plugin that figures out that a decent percentage of | the time whatever "Bob" is asking for in that meeting is | not what "Bob" thinks he is asking for or should be | asking for and can squash those tickets. Without that | we're gonna somehow end up with spreadsheets in | everything. | solarmist wrote: | Haha, yeah, there's that, but there are also things like | "adding a dark mode." There are a dozen ways to | accomplish that kind of thing, and every company's | solution will diverge when you get down to the details. | jakub_g wrote: | Take my money. | machiaweliczny wrote: | But it's easy to create AI conversation that will refine | problem. | ohwellhere wrote: | Is the next step in the evolution of programming having the | programmer become the specifier? | | Fuzzy business requirements -> programmer specifies and | writes tests -> AI codes | buscoquadnary wrote: | That's all we've ever been since we invented software. | | First we specified the exact flow of the bits with punch | cards. | | Then we got assembly and we specified the machine | instructions. | | Then we got higher level languages and we specified how the | memory was to be managed and what data to store where. | | Now we have object oriented languages that allow us to work | with domain models, and functional languages that allow us | to work data structures and algorithms. | | The next level may be writing business rules, and | specifying how services talk to each other, who knows, but | it will be no different than it is now just a higher level. | chinabot wrote: | If its anything like my job | | while(1) { Fuzzy business requirements -> programmer | specifies and writes tests -> AI codes } | e4e78a06 wrote: | I don't think it's quite as impressive as you make it out to | be. Median performance in a Codeforces programming competition | is solving the easiest 1-2 problems out of 5-6 problems. Like | all things programming the top 1% is much, much better than the | median. | | There's also the open problem of verifying correctness in | solutions and providing some sort of flag when the model is not | confident in its correctness. I give it another 5 years in the | optimistic case before AlphaCode can reliably compete at the | top 1% level. | ctoth wrote: | This is technology that simply didn't exist in any form 2 | years ago. For no amount of money could you buy a program | that did what this one does. Having been watching the growth | of Transformer-based models for a couple years now really has | hammered home that just as soon as we figure out how an AI | can do X, X is no longer AI, or at least no longer | impressive. How this happens is with comments like yours, and | I'd really like to push back against it for once. Also 5 | years? So assuming that we have all of the future ahead of | us, to think that we only have 5 years left of being the top | in programming competitions seems like it's somehow important | and shouldn't be dismissed with "I don't think it's quite as | impressive as you make it out to be." | BobbyJo wrote: | I don't think that's what happening. Let's talk about this | case: programming. It's not that people are saying "an AI | programming" isn't impressive or isn't AI, it's that when | people say "an AI programming" they aren't talking about | ridiculously controlled environments like in this case. | | It's like self-driving cars. A car driving itself for the | first time in a controlled environment, I'm sure, was an | impressive feat, and it wouldn't be inaccurate to call it a | self-driving car. However, that's not what we're all | waiting for when we talk about the arrival of self-driving | cars. | ctoth wrote: | And if AI programming were limited to completely | artificial contexts you would have a point, though I'd | still be concerned. We live in a world, however, where | programmers routinely call on the powers of an AI to | complete their real code and get real value out of it. | This is based on the same technology that brought us this | particular win, so clearly this technology is useful | outside "ridiculously controlled environments." | Retric wrote: | Programmers do setup completely artificial contexts so AI | can work. | | None of the self driving systems where setup by giving | the AI access to sensors, a car, and the drivers handbook | and saying well you figure it out from there. The general | trend is solve this greatly simplified problem, this more | complex one, up to dealing with the real world. | ctoth wrote: | By AI programming I mean the AI doing programming, not | programming the AI. Though soon enough the first will be | doing the second and that's where the loop really | closes... | BobbyJo wrote: | That's not significantly different than how programming | has worked for the last 40 years though. We slowly push | certain types of decisions and tasks down into the tools | we use, and what's left over is what we call | 'programming'. It's cool, no doubt, but as long as | companies need to hire 'prorammers', then it's not the | huge thing we're all looking out over the horizon waiting | for. | YeGoblynQueenne wrote: | >> This is technology that simply didn't exist in any form | 2 years ago. | | A few examples of neural program synthesis from at least 2 | years ago: | | https://sunblaze-ucb.github.io/program-synthesis/index.html | | Another example from June 2020: | | _DreamCoder: Growing generalizable, interpretable | knowledge with wake-sleep Bayesian program learning_ | | https://arxiv.org/abs/2006.08381 | | RobustFill, from 2017: | | _RobustFill: Neural Program Learning under Noisy I /O_ | | https://www.microsoft.com/en-us/research/wp- | content/uploads/... | | I could go on. | | And those are only examples from neural program synthesis. | Program synthesis, in general, is a field that goes way | back. I'd suggest as usual not making big proclamations | about its state of the art without being acquainted with | the literature. Because if you don't know what others have | done every announcement by DeepMind, OpenAI et al seems | like a huge advance... when it really isn't. | qualudeheart wrote: | Has someone tried classical program synthesis techniques | on competitive programming problems? I wonder what would | have been possible with tech from more than 2 years ago. | YeGoblynQueenne wrote: | I don't know if anyone has tried it, but it's not a very | objective evaluation. We have no good measure of the | coding ability of the "median level competitor" so doing | better or worse than that, doesn't really tell us | anything useful about the coding capability of an | automated system. | | So my hunch is that it probably hasn't been done, or | hasn't been done often, because the program synthesis | community would recognise it's pointless. | | What you really want to look at is formal program | synthesis benchmarks and how systems like AlphaCode do on | them (hint: not so good). | ctoth wrote: | Of course program synthesis has been a thing for years, I | remember some excellent papers out of MSR 10 years ago. | But which of those could read a prompt and build the | program from the prompt? Setting up a whole bunch of | constraints and having your optimizer spit out a program | that fulfills them is program synthesis and is super | interesting, but not at all what I think of when I'm told | we can make the computer program for us. For instance, | RobustFill takes its optimization criteria from a bundle | of pre-completed inputs and outputs of how people want | the program to behave instead of having the problem | described in natural language and creating the solution | program. | YeGoblynQueenne wrote: | Program synthesis from natural language specifications | has existed for many years, also. It's not my specialty | (neither am I particularly interested in it), but here's | a paper I found from 2017, with a quick search: | | https://www.semanticscholar.org/paper/Program-Synthesis- | from... | | AlphaCode is not particularly good at it, either. In the | arxiv preprint, besides the subjetive and pretty | meaningless "evaluation" against human coders it's also | tested on a formal program synthesis benchmark, the APPS | dataset. The best performing AlphaCode variant reported | in the arxiv preprint solves 25% of the "introductory" | APPS tasks (the least challenging ones). All AlphaCode | variants tested solve less than 10% of the "interview" | and "competition" (intermediary and advanced) tasks. | These more objective results are not reported in the | article above, I think for obvious reasons (because they | are extremely poor). | | So it's not doing anything radically new and it's not | doing it particularlly well either. Please be better | informed before propagating hype. | | Edit: really, from a technical point of view, AlphaCode | is a brute-force, generate-and-test approach to program | synthesis that was state-of-the-art 40 years ago. It's | just a big generator that spams programs hoping it will | hit a good one. I have no idea who came up with this. | Oriol Vinyals is the last author and I've seen enough of | that guy's work to know he knows better than bet on such | a primitive, even backwards approach. I'm really shocked | that this is DeepMind work. | Jensson wrote: | Top 1% competitive programming level means that it can start | solving research problems, problem difficulty and creativity | needed for problems goes up exponentially for harder problems | and programming contests have lead to research papers before. | It would be cool if we got there in 5 years but I doubt it. | But if we got there it would revolutionize so many things in | society. | xorcist wrote: | You don't think it's impressive, yet you surmise that a | computer program could compete at a level of the top 1% of | all humans _in_ _five_ _years_? | | That's wildly overstating the promise of this technology, and | I'd be very surprised if the authors of this wouldn't agree. | bricemo wrote: | Agree. If an AI could code within the top 1%, every single | person whose career touches code would have their lives | completely upended. If that's only 5 years out...ooof. | Groxx wrote: | I do kinda wonder if it'd lead to as good results if you just | did a standard "matches the most terms the most times" search | against all of github. | | I have a suspicion it would - kinda like Stack Overflow, | problems/solutions are not that different "in the small". | It'd have almost certainly given us the fast square root | trick verbatim, like Github's AI is doing routinely. | YeGoblynQueenne wrote: | >> AlphaCode ranked within the top 54% in real-world programming | competitions, an advancement that demonstrates the potential of | deep learning models for tasks that require critical thinking. | | Critical thinking? Oh, wow. That sounds amazing! | | Let's read further on... | | >> At evaluation time, we create a massive amount of C++ and | Python programs for each problem, orders of magnitude larger than | previous work. Then we filter, cluster, and rerank those | solutions to a small set of 10 candidate programs that we submit | for external assessment. | | Ah. That doesn't sound like "critical thinking", or any thinking. | It sounds like massive brute-force guessing. | | A quick look at the arxiv preprint linked from the article | reveals that the "massive" amount of prorgams generated is in the | millions (see Section 4.4). These are "filtered" by testing them | against program input-output (I/O) examples given in the problem | descriptions. This "filtering" still leaves a few thousands of | candidate programs that are further reduced by clustering to | "only" 10 (which are finally submitted). | | So it's a generate-and-test approach rather than anything to do | with reasoning (as claimed elsewhere in the article) let alone | "thinking". But why do such massive numbers of programs need to | be generated? And why are there still thousands of candidate | programs left after "filtering" on I/O examples? | | The reason is that the generation step is constrained by the | natural-language problem descriptions, but those are not enough | to generate appropriate solutions because the generating language | model doesn't understand what the problem descriptions mean; so | the system must generate millions of solutions hoping to "get | lucky". Most of those don't pass the I/O tests so they must be | discarded. But there are only very few I/O tests for each problem | so there are many programs that can pass them, and still not | satisfy the problem spec. In the end, clustering is needed to | reduce the overwhelming number of pretty much randomly generated | programs to a small number. This is a method of generating | programs that's not much more precise than drawing numbers at | random from a hat. | | Inevitably, the results don't seem to be particularly accurate, | hence the evaluation against programs written by participants in | coding competitions, which is not any objective measure of | program correctness. Table 10 on the arxiv preprint lists results | on a more formal benchmar, the APPS dataset, where it's clear | that the results are extremely poor (the best performing | AlphaCode variant solves 20% of the "introductory" level | problems, though outperforming earlier approaches). | | Overall, pretty underwhelming and a bit surpirsing to see such | lackluster results from DeepMind. | thomasahle wrote: | Next they can train it on kaggle, and we'll start getting closer | to the singularity ___________________________________________________________________ (page generated 2022-02-02 23:00 UTC)