[HN Gopher] GitHub Copilot: your AI pair programmer ___________________________________________________________________ GitHub Copilot: your AI pair programmer Author : todsacerdoti Score : 1795 points Date : 2021-06-29 14:29 UTC (8 hours ago) (HTM) web link (copilot.github.com) (TXT) w3m dump (copilot.github.com) | dec0dedab0de wrote: | I absolutely hate whenever I see patterns in my code. The first | thing I think is "There has to be a way to automate this" This is | not what I had in mind, but if it's as good as people seem to | say, it might be a good step. I can't believe I am considering a | Microsoft product after 15 years of avoiding them as much as I | could. | dgdosen wrote: | want | mtnGoat wrote: | What a cool project, I'm impressed. Looking forward to checking | it out. | wozer wrote: | A great way to use this would be to create very good tests | (manually) and then let the AI write the code. Maybe even with a | feedback loop: when a test fails, the AI automatically tries a | different approach. | cbsks wrote: | I don't think we need to start looking for new career paths yet. | This example has a few bugs and it took me longer to track them | down than it would have to write it myself: | #!/bin/bash # List all python source files which are more | than 1KB and contain the word "copilot". find . \ | -name "*.py" \ -size +1000 \ -exec grep -n | copilot {}\; | | "-exec grep -n copilot {}\;" needs to have a space before the | semicolon otherwise find fails with "find: missing argument to | '-exec'". | | The "1000" in "-size +1000" has a unit of 512 byte blocks so it | is looking for files that are greater than 512000 bytes, not 1KB. | This would be very easy to miss in a code review and is one of | those terrible bugs that causes the code to mostly work, but not | quite. | | https://linux.die.net/man/1/find | mitjak wrote: | the main argument against Copilot for me. it takes longer to | grok existing code than just write it from ground up. | toxik wrote: | This is actually a pretty important thing to understand. It | can be better to rewrite than fix an existing mess. It's | similar to construction work in that sense: if you rebuild, | you know what's inside the walls. | ok2938 wrote: | That's why this copilot won't fly. The junior programmer will | not be able to spot subtle errors but will kind-of feel | "productive" by some random pastes from a giant brain, which | cannot be interrogated. | | If anything, I see copilot generating more work for existing, | senior programmers - so there you have it. | nemetroid wrote: | It also doesn't list the files. It prints all matching lines | (and their line numbers), _without_ the corresponding | filenames. | awestroke wrote: | I read this as a criticism of bash | hashingroll wrote: | I don't see why. It is on Copilot to produce syntactically | correct code that at least doesn't fail to run, even if we | ignore the correctness. | CityOfThrowaway wrote: | This is very impressive! | | OpenAI's tech opens an ethical Pandora's box: | | 1. It's clear that the raw inputs to all of OpenAI's outputs | originated with real, human creativity. | | 2. So, in a sense, OpenAI is laundering creativity. It reads in | creative works, does complicated (and, yes, groundbreaking) | transformations, and produces an output that is hard to trace to | any particular source. | | 3. Yet, isn't that effectively what human brains do too? Perhaps | OpenAI lacks the capacity for true invention, but I'd argue that | most people live their whole lives without a meaningful creative | contribution as well. | | All told, I don't have a good framework for thinking about the | ethics here. So instead, I'll simply say: | | Wow. | 6gvONxR4sf7o wrote: | > Yet, isn't that effectively what human brains do too? | | If I want to watch a bunch of movies, I have to pay the theater | for each movie, or pay netflix, or whatever. The screenplay I | write afterwards belongs to me, but the learning process | involved me paying for access to others' work. That's what's | often missing here. But at the same time, if you train on | legally public data, there's no 'theater' to be paid. | | (Often, people train on illegally public data though, like the | eleuther folks. That's a whole extra can of worms I've ranted | about plenty). | | Maybe we'll start seeing licenses with a section saying "not | for use as training data for commercial models." | visarga wrote: | > Maybe we'll start seeing licenses with a section saying | "not for use as training data for commercial models." | | Considering that the impact of a single example is extremely | small in training a model, and that it is trained on an | ungodly amount of examples, then I wonder if the effort of | forbidding its use has any real benefits. | koolhaas wrote: | I would change your question from "does it have any real | benefits" to "does it have a practical effect on the model" | | Benefits to me are clear: giving a developer choice over | how their source code is used with for-profit, opaque, next | generation ML models. | | But yes, drop in the ocean in terms of the full data set. | But that shouldn't be an excuse to remove user choice. | fulafel wrote: | What are some ethical problems that could emerge from the box? | Maybe unfair competition from having very good tools compared | to other programmers, or havin irresponsibly shallow | understanding of what the produced code does? | Ensorceled wrote: | > What are some ethical problems that could emerge from the | box? | | Being put out of job by an AI trained on your own code? | | It's really the same ethical problem of all automation ... | and will be as long as we need a job to fulfill basic needs | like food, housing and medical care. | visarga wrote: | Programmers change jobs like socks and are open to learning | new things all the time. Software has been automating | itself for 70 years and look how many people have jobs in | this field. | | Also, human desire is a bottomless pit, where automation | saves we spend even more. | shadowgovt wrote: | Everyone gets into programming for their own reasons. | | But to my personal philosophy: if I'm not coding to put | myself out of a job, I'm thinking about the problem wrong. | | When there are no more lines of code to be written, I shall | do something else, content that I have done my part to free | humanity from the burden of human-machine interfacing. I | hear dairy farming is a demanding and rewarding challenge. | Ensorceled wrote: | Dairy farmers aren't really looking for workers ... | ironically, it is a job that has been almost eliminated | by automatic milking machines and robotic harvesters. | | https://www.thebullvine.com/wp- | content/uploads/2014/03/Figur... | fulafel wrote: | I wonder if HLL compiler authors had fears about this back | when writing assembly and machine code was the norm. | | But good point about the ambivalent result of eliminating | busywork. Food, housing and medical care is available in | most western countries for people who choose to not get a | job... I think the social status problem and guilt of | freeriding are also big factors preventing prople from | living more leisurely lives in these countries. | milofeynman wrote: | There are some really weird licensing problems. Like does | your code license say they can use the your code to train AIs | that then reproduce very similar code to you but with no | attribution etc in someone else's codebase. | jareklupinski wrote: | s/OpenAI/Photoshop | | Reads similarly :) | ohnoesjmr wrote: | Great. | | Now, how about you use that to implement a non-horrible search | experience? | natfriedman wrote: | Hi HN, we've been building GitHub Copilot together with the | incredibly talented team at OpenAI for the last year, and we're | so excited to be able to show it off today. | | Hundreds of developers are using it every day internally, and the | most common reaction has been the head exploding emoji. If the | technical preview goes well, we'll plan to scale this up as a | paid product at some point in the future. | ggsp wrote: | One question: how long is the waitlist? Very excited to try | this! | [deleted] | r3trohack3r wrote: | Is there a public API? Will it be documented? Are you open to | folks porting the VSCode plugin to other editors (I.e. | kakoune's autocomplete)? | 6gvONxR4sf7o wrote: | If I put a section in my LICENSE.txt prohibiting use as | training data in commercial models, would that be sufficient to | keep my code out of models like this? | mdaniel wrote: | Only if they trained a model to be able to read and | understand LICENSE.txt files -- wowzers what a monster | improvement that would be for the world | | Or, I guess a sentinel phrase that the scraper could | explicitly check: `github-copilot-optout: true` | 6gvONxR4sf7o wrote: | Or it could explicitly check for known standard licenses | that permit it, if it were opt in instead of opt out, the | way most everything else in software licensing is opt-in | for letting others use. | dragonwriter wrote: | > If I put a section in my LICENSE.txt prohibiting use as | training data in commercial models, would that be sufficient | to keep my code out of models like this? | | Neither in practice (because it doesn't look for it) nor | legally in the US, if Microsoft's contention that such use is | "fair use" under US copyright law. | | That "fair use" is an Americanism and not a general feature | of copyright law might create some interesting international | wrinkles, though. | 6gvONxR4sf7o wrote: | Their contention is | | > Why was GitHub Copilot trained on data from publicly | available sources? | | > Training machine learning models on publicly available | data is now common practice across the machine learning | community. The models gain insight and accuracy from the | public collective intelligence. But this is a new space, | and we are keen to engage in a discussion with developers | on these topics and lead the industry in setting | appropriate standards for training AI models. | | Personally, I'd prefer this to be like any other software | license. If you want to use my IP for training, you need a | license. If I use MIT license or something that lets you | use my code however you want, then have at it. If I don't, | then you can't just use it because it's public. | | Then you'd see a lot more open models. Like a GPL model | whose code and weights must be shared because the bulk of | the easily accessible training data says it has to be open, | or something like that. | | I realize, however, that I'm in the minority of the ML | community feeling this way, and that it certainly is | standard practice to just use data wherever you can get it. | yencabulator wrote: | > however you want | | I don't see any attribution here. | | MIT may say "substantial portions" but BSD just says | "must retain". | dragonwriter wrote: | When I referenced their contention on Fair Use, that's | not what I was referencing, but instead Github CEO Nat | Friedman's comment _in this thread_ that "In general: (1) | training ML systems on public data is fair use". | | https://news.ycombinator.com/item?id=27678354 | blibble wrote: | would be interesting if someone uploaded a leaked copy of | the NT kernel, then coerced the system to regurgitate it | piece by piece | | would MS position then be different? | UnFleshedOne wrote: | In the end this would slightly increase likelihood of such | sections appearing in licenses generated by AIs. | foobarbazetc wrote: | Are those developers worried about having their jobs replaced | by a code-writing AI? :) | | I mean... why would 95% of developer jobs exist with this tech | available? | | You just need that 5% of devs who actually write novel code for | this thing to learn from. | mkr-hn wrote: | https://en.wikipedia.org/wiki/Profession_(novella) | | An Isaac Asimov story about someone who didn't take to the | program and, as a result, got picked to create new things | because _someone_ has to make them. | mooreds wrote: | I love this story. | | If you want to read the whole thing, it's here: | | https://www.abelard.org/asimov.php | kif wrote: | I visited https://copilot.github.com/, and I don't know how to | feel. Obviously it's a nice achievement, not gonna lie. | | But I have a feeling it will end up causing more work. e.g. the | `averageRuntimeInSeconds` example, I had to spend a bit of time | to see if it was actually correct. It has to be, since it's on | the front page, but then I realized I'd need to spend time | reviewing the AI's code. | | It's cool as a toy, but I'd like to see where it is one year | from now when the wow factor has cooled down a bit. | taneq wrote: | > I had to spend a bit of time to see if it was actually | correct. | | Interesting point - it reminds me of the idea that it's | harder to debug code than to write it. Is it also harder to | interpret code you didn't write than to write it? | ec109685 wrote: | It has the ability to generate unit tests as well, which will | help cut down some on the verification side if you feed it | enough cases. | IncRnd wrote: | If you question the veracity of the code that is produced, | you have to question the usefulness of the unit test that | is produced. | CloselyChunky wrote: | Well then you have to check the generated tests. That's | just one more layer, isn't it? | freedomben wrote: | I think I'd love to use this to generate tests and then | write the functions myself. Test generation seems like a | killer feature. | taftster wrote: | Yes!! Totally agree. Imagine writing a method and then | telling an AI to write your unit tests for it. The AI | would likely be able to come up with the edge cases and | such that you would not normally take the time to write. | | While I think the AI generating your mainline code is | interesting, I must certainly agree that generating test | code would be the killer feature. I would like to see | this showcased a little more on the copilot page. | fay59 wrote: | Have there yet been reports of the AI writing code that has | security bugs? Is that something folks are on the lookout for? | natfriedman wrote: | I haven't seen any reports of this, but it's certainly | something we want to guard against: | https://copilot.github.com/#faq-can-github-copilot- | introduce... | nightski wrote: | I'm glad they find it head exploding but my concern is that it | would be most head exploding to newbies who don't have the | skill to discern if AI code is how it should be written. | | For a seasoned veteran writing the code was never really the | hard part in the first place. | amelius wrote: | > For a seasoned veteran writing the code was never really | the hard part in the first place. | | Yes, to most coders this Copilot software is just a fancy | keyboard. | williamdclt wrote: | Sounds great. I'm a bad typist, anything that makes me type | less (vim, voice assistant, completion) is a big win to me | amelius wrote: | Vim is great because it doesn't try to be smart. | enriquto wrote: | > Hundreds of developers are using it every day internally, and | the most common reaction has been the head exploding emoji | | Apart from developing this "head exploding" stuff, couldn't | some of these incredibly talented hundreds of developers fix | the github code search? | mdellavo wrote: | What do you think about this being overall detrimental to code | quality as it allows people to just blindly accept completions | without really understanding the generated code. Similar to | copy-and-paste coding. | | The first example parse_expenses.py uses a float for currency - | that seems to be a pretty big error that's being overlooked | along with other minor issues around no error handling. | | I would say the quality of the generated code in | parse_expenses.py is not very high, certainly not for the | banner example. | | EDIT - I just noticed Github reordered the examples on | copilot.github.com in order to bury the issues with | parse_expenses.py for now. I guess I got my answer. | ehsankia wrote: | How is it different from the status quo of people just doing | the wrong thing or copy pasting bad code? Yes there's the | whole discussion below about float currency values, but I | could very well see the opposite happening too, where this | thing recommends better code that the person would've written | otherwise. | IncRnd wrote: | > How is it different from the status quo of people just | doing the wrong thing or copy pasting bad code? | | Well, yes, the wrong code would be used. However - the | wrong code would then become more prevelant as an answer | from gh, causing more people to blindly use it. It's a | self-perpetuating cycle of finding and using bad and wrong | code. | ehsankia wrote: | Hmm, not quite. My point was that if they aren't a good | enough programmer to understand why the code is wrong, | then chances are they would've written bad code or copy | pasted bad code anyways. It just makes the cycle faster. | | And again, I could argue that the opposite could happen | too, people who would otherwise have written bad code | could be given suggestions of better code that they | would've written. | vincnetas wrote: | People make mistakes. With computers people make mistakes | much faster :) | shadowgovt wrote: | It seems that copilot lets one cycle through options, which | is an opportunity for it to facilitate programmers moving | from a naive solution to one they hadn't thought of that is | more correct. | | (Unclear to me yet whether the design takes advantage of | this opportunity) | sumanthvepa wrote: | I use a similar feature in IntelliJ idea, and I've often | found that first time I learn about new feature in the | language is when I get a suggestion. I usually explore topic | much more deeply at that time. So far from helping me copy- | paste, I find code suggestions help me explore new features | of the language and framework, that I might not have known | about. | as300 wrote: | Why would you say it's an error to use a float for currency? | I would imagine it's better to use a float for calculations | then round when you need to report a value rather than | accumulate a bunch of rounding errors while doing | computations. | mdellavo wrote: | https://stackoverflow.com/questions/3730019/why-not-use- | doub... | ConceptJunkie wrote: | Because it's an error to use floats in almost every | situation. And currency is something where you don't want | rounding errors, period. The more I've learned about | floating point numbers over the years, the less I want to | use them. Floats solve a specific problem, and they're a | reasonable trade-off for that kind of problem, but the | problem they solve is fairly narrow. | RobLach wrote: | Standard practice is to use a signed decimal number with an | appropriate precision that you scale around. | Tainnor wrote: | It is widely accepted that using floats for money[1] is | wrong because floating point numbers cannot guarantee | precision. | | The fact that you ask is a very good case in point though: | Many programmers are not aware of this issue and would | maybe not question the "wisdom" of the AI code generator. | In that sense, it could have a similar effect to blindly | copy-pasted answers from SO, just with even less friction. | | [1] Exceptions may apply to e.g. finance mathematics where | you need to work with statistics and you're not going to | expect exact results anyway. | joquarky wrote: | You don't want to kick the can down to the floating point | standard. Design for deterministic behavior. Find the edge | cases, go over it with others and explicitly address the | edge case issues so that they always behave as expected. | IncRnd wrote: | By definition, currency uses _fixed point_ arithmetic not | _floating point_ arithmetic. | shishy wrote: | micro-dollars are a better way of representing it (multiply | by 10e6); store as bigint. | | See: https://stackoverflow.com/a/51238749 | IncRnd wrote: | No, they aren't. Micro-dollars do not exist, so this | method is guaranteed to cause errors. | mdellavo wrote: | this is a common approach when you are dealing in rates | less than .01 -- you just need to be sure you are | rounding correctly | IncRnd wrote: | When you are approximating fixed-point using floating- | point there is a lot more you need to do correctly other | than roun ding. Your representation must have enough | precision and range for the beginning inputs, | intermediate results, and final results. You must be able | to represent all expected numbers. And on. There is a lot | more involved than what you mentioned. | | Of course, if you are willing to get incorrect results, | such as in play money, this may be okay. | voxic11 wrote: | Standard floats cannot represent very common numbers such | as 0.1 exactly so they are generally disfavored for | financial calculations where an approximated result is | often unacceptable. | | > For example, the non-representability of 0.1 and 0.01 (in | binary) means that the result of attempting to square 0.1 | is neither 0.01 nor the representable number closest to it. | | https://en.wikipedia.org/wiki/Floating- | point_arithmetic#Accu... | wiz21c wrote: | Using float is perfectly OK since using fixed point decimal | (or whatever "exact" math operations) will lead to rounding | error anyway (what about multiplying a monthly salary by | 16/31 (half a month) ?) | | The problem with float is that many people don't understand | how they work to handle rounding errors correctly. | | Now there are some cases where float don't cut it. And big | ones. For example, summing a set of numbers (with decimal | parts) will usually be screwed if you don't round it. And | not many people expect to round the results of additions | because they are "simple" operations. So you get errors in | the end. | | (I have written applications that handle billions of euros | with floats and have found just as many rounding errors | there as in any COBOL application) | mdellavo wrote: | It seems incorrect to determine a half a month as 16/31 | but ok , for your proposed example: >>> | from decimal import Decimal >>> Decimal(1000) * | Decimal(16) / Decimal(31) | Decimal('516.1290322580645161290322581') >>> 1000 | * 16 / 31 516.1290322580645 | | The point is using Decimal allows control over precision | and rounding rather than accepting ad-hoc approximations | of a float. | | https://docs.python.org/3/library/decimal.html | | If it were me, I wouldn't go around bragging about how | much money my software manages while being willfully | ignorant of the fundamentals. | wiz21c wrote: | OK, the salary example was a bit simplified; in my case | it was about giving financial help to someone. That help | is based on a monthly allowance and then split in the | number of allocated days in the month, that's for the | 16/31. | | Now for your example, I see that float and decimal just | give the same result. Provided I'm doing financial | computations of a final number, I'm ok with 2 decimals. | And both your computations work fine. | | Th decimal module in python gives you number of | significant digits, not number of decimals. You'll end up | using .quantize() to get to two decimals which is | rounding (so, no advantage over floats). | | As I said, as soon as you have division/multiplication | you'll have to take care of rounding manually. But for | addition/subtraction, then decimal doesn't need rounding | (which is better). | | The fact is that everybody say "floats are bad" because | rounding is tricky. But rounding is always possible. And | my point is that rounding is tricky even with the decimal | module. | | And about bragging, I can tell you one more thing : | rounding errors were absolutely not the worse of our | problems. The worse problem is to be able to explain to | the accountant that your computation is right. That's the | hard part 'cos some computations imply hundreds of | business decisions. When you end up on a rounding error, | you're actually happy 'cos it's easy to understand, | explain and fix. And don't start me on how laws (yes, the | texts) sometimes explain how rounding rules should work. | chongli wrote: | Using float for currency calculations is how you accumulate | a bunch of rounding errors. Standard practice when dealing | with money is to use an arbitrary-precision numerical type. | verst wrote: | I have been using this - for example working in Go on Dapr | (dapr.io) or in Python on one of its SDKs. | | I love it. So often the code suggestions accurately anticipate | what I planned to do next. | | It's especially fun to write a comment or doc string and then | see Copilot create a block of code perfectly matching your | comment. | stephen82 wrote: | Lots of questions: - the generated code by AI | belongs to me or GitHub? - under what license the | generated code falls under? - if generated code becomes | the reason for infringment, who gets the blame or legal action? | - how can anyone prove the code was actually generated by | Copilot and not the project owner? - if a project member | does not agree with the usage of Copilot, what should we do as | a team? - can Copilot copy code from other projects and | use that excerpt code? - if yes, *WHY* ?! - who | is going to deal with legalese for something he or she was not | responsible in the first place? - what about conflicts | of interest? - can GitHub guarantee that Copilot won't | use proprietary code excerpts in FOSS-ed projects that could | lead to new "Google vs Oracle" API cases? | gpm wrote: | > - under what license the generated code falls under? | | Is it even copyrighted? Generally my understand is that to be | copyrightable it has to be the output of a _human_ creative | process, this doesn 't seem to qualify (I am not a lawyer). | | See also, monkeys can't hold copyright: https://en.wikipedia. | org/wiki/Monkey_selfie_copyright_disput... | croes wrote: | It is output of humans creative processes, just not yours. | Like an automated stackoverflow snippet engine. | agilob wrote: | >Generally my understand is that to be copyrightable it has | to be the output of a human creative process | | https://en.wikipedia.org/wiki/Monkey_selfie_copyright_dispu | t... | lawtalkinghuman wrote: | In the US, yes. Elsewhere, not necessarily. | tlamponi wrote: | > Is it even copyrighted? | | Isn't it subject to the licenses the model was created | from, as the learning is basically just an automated | transformation of the code, which would be still the | original license - as else I could just run some minifier, | or some other, more elaborate, code transformation, on some | FOSS project, for example the Linux kernel, and relicense | it under whatever? | | Does not sound right to me, but IANAL and I also did not | really look at how this specific model/s is/are generated. | | If I did some AI on existing code I'd be quite cautious and | group by compatible licences classes, asking the user what | their projects licence is and then only use the compatible | parts of the models.-Anything else seems not really ethical | and rather uncharted territory in law to me, which may not | mean much as IANAL and just some random voice on the | internet, but FWIW at least I tried to understand quite a | few FOSS licences to decide what I can use in projects and | what not. | | Anybody knows of some relevant cases of AI and their input | data the model was from, ideally in jurisdictions being the | US or any European Country ones? | buu700 wrote: | This is a great point. If I recall correctly, prior to | Microsoft's acquisition of Xamarin, Mono had to go out of | its way to avoid accepting contributions from anyone | who'd looked at the (public but non-FOSS) source code of | .NET, for fear that they might reproduce some of what | they'd seen rather than genuinely reverse engineering. | | Is this not subject to the same concern, but at a much | greater scale? What happens when a large entity with a | legal department discovers an instance of Copilot- | generated copyright infringement? Is the project owner | liable, is GitHub/Microsoft liable, or would a court | ultimately tell the infringee to deal with it and eat | whatever losses occur as a result? | | In any case, I hope that GitHub is at least limiting any | training data to a sensible whitelist of licenses (MIT, | BSD, Apache, and similar). Otherwise, I think it would | probably be too much risk to use this for anything | important/revenue-generating. | heavyset_go wrote: | > _In any case, I hope that GitHub is at least limiting | any training data to a sensible whitelist of licenses | (MIT, BSD, Apache, and similar). Otherwise, I think it | would probably be too much risk to use this for anything | important /revenue-generating._ | | I'm going to assume that there is no sensible whitelist | of licenses until someone at GitHub is willing to go on | the record that this is the case. | gpm wrote: | (Not a lawyer, and only at all familiar with US law, | definitely uncharted territory) | | No, I don't believe it is, at least to the extent that | the model isn't just copy and pasting code directly. | | Creating the model implicates copyright law, that's | creating a derivative work. It's probably fair use | (transformative, not competing in the market place, etc), | but whether or not it is fair use is github's problem and | liability, and only if they didn't have a valid license | (which they should have for any open source inputs, since | they're not distributing the model). | | I think the output of the model is just straight up not | copyrighted though. A license is a grant of rights, you | don't need to be granted rights to use code that is not | copyrighted. Remember you don't sue for a license | violation (that's not illegal), you sue for copyright | infringement. You can't violate a copyright that doesn't | exist in the first place. | | Sometimes a "license" is interpreted as a contract rather | than a license, in which you agreed to terms and | conditions to use the code. But that didn't happen here, | you didn't agree to terms and conditions, you weren't | even told them, there was no meeting of minds, so that | can't be held against you. The "worst case" here (which I | doubt is the case - since I doubt this AI implicates any | contract-like licenses), is that github violated a | contract they agreed to, but I don't think that | implicates you, you aren't a party to the contract, there | was no meeting of minds, you have a code snippet free of | copyright received from github... | birdyrooster wrote: | So if I make AI that takes copyrighted material in one | side, jumbles it about, and spits out the same | copyrighted material on the other side, I have | successfully laundered someone else's work as my own? | | Wouldn't GitHub potentially be responsible for the | infringement by distributing the copyrighted material | knowing that it would be published? | gpm wrote: | I exempted copied segments at the start of my previous | post for a reason, that reason is I don't really know, I | doubt it works because judges tend to frown on absurd | outcomes. | chuinard wrote: | Some of your questions aren't easy to answer. Maybe the first | two were OK to ask. Others would probably require lawyers and | maybe even courts to decide. This is a pretty cool new | product just being shared on an online discussion forum. If | you are serious about using it for a company, talk to your | lawyers, get in touch with Github's people, and maybe hash | out these very specific details on the side. Your comment | came off as super negative to me. | ericbarrett wrote: | Regardless of tone, I thought it was chock full of great | questions that raised all kinds of important issues, and | I'm really curious to hear the answers. | peddling-brink wrote: | I think these are very important questions. | | The commenter isn't interrogating some indy programmer. | This is a product of a subsidiary of Microsoft, who I | guarantee has already had a lawyer, or several, consider | these questions. | king_magic wrote: | No, they are all entirely reasonable questions. Yeah, they | might require lawyers to answer - tough shit. Understanding | the legal landscape that ones' product lives in is part of | a company's responsibility. | Tainnor wrote: | > This is a pretty cool new product just being shared on an | online discussion forum. | | This is not one lone developer with a passion promoting | their cool side-project. It's GitHub, which is an | established brand and therefore already has a leg up, | promoting their new project for active use. | | I think in this case, it's very relevant to post these | kinds of questions here, since other people will very | probably have similar questions. | natfriedman wrote: | In general: (1) training ML systems on public data is fair | use (2) the output belongs to the operator, just like with a | compiler. | | On the training question specifically, you can find OpenAI's | position, as submitted to the USPTO here: https://www.uspto.g | ov/sites/default/files/documents/OpenAI_R... | | We expect that IP and AI will be an interesting policy | discussion around the world in the coming years, and we're | eager to participate! | croes wrote: | Fair use doesn't exist in every country, so it's US only? | jay_kyburz wrote: | Yes, my partner likes to remind me we don't have it here | in Australia. You could never write a search engine here. | You can't write code that scrapes websites. | krzyk wrote: | It exists in EU also (and it much mire powerful here). | croes wrote: | The EU doesn't have a copyright related fair use. Quite | the opposite, that why we are getting upload filters. | stephen82 wrote: | > We expect that IP and AI will be an interesting policy | discussion around the world in the coming years, and we're | eager to participate! | | Another question is this: let's hypothesize I work solo on | a project; I have decided to enable Copilot and have | reached a 50%-50% development with it after a period of | time. One day the "hit by a bus" factor takes place; who | owns the project after this incident? | lovich wrote: | Your estate? The compiler comparison upthread seems to be | perfectly valid. If you work on a solo project in c# and | die, Microsoft doesn't automatically own your project | because you used visual studio to produce it | king_magic wrote: | @Nat, these questions (all of them, not just the 2 you | answered) are critical for anyone who is considering using | this system. Please answer them? | | I for one wouldn't touch this with a 10000' pole until I | know the answers to these (very reasonable) questions. | breck wrote: | You should look into: | | https://breckyunits.com/the-intellectual-freedom- | amendment.h... | | Great achievements like this only hammer home the point | more about how illogical copyright and patent laws are. | | Ideas are _always_ shared creations, by definition. If you | have an "original idea", all you really have is noise! If | your idea means anything to anyone, then by definition it | is built on other ideas, it is a shared creation. | | We need to ditch the term "IP", it's a lie. | | Hopefully we can do that before it's too late. | delano wrote: | In practical terms, IP could be referred to as unique | advantages. What is the purpose of an organization that | has no unique qualities? | | In general, what is IP and how it's enforced are two | separate things. Just because we've used copyright and | patents to "protect" an organization's unique advantages, | doesn't mean we need to keep using them in the same way. | Or maybe it's the best we can do for now. That's why BSD | style licences are so great. | anmk wrote: | I'm sure _natfriedman_ will be thrilled to abolish IP and | also apply this to the Windows source code. We can expect | it on GitHub any minute! | stwrong wrote: | What about privacy. Does the AI send code to GitHub? This | reminds me of Kite | avery42 wrote: | Yes, under "How does GitHub Copilot work?": | | > [...] The GitHub Copilot editor extension sends your | comments and code to the GitHub Copilot service, which | then uses OpenAI Codex to synthesize and suggest | individual lines and whole functions. | patrickthebold wrote: | What does "public" mean? Do you mean "public domain", or | something else? | 6gvONxR4sf7o wrote: | Unfortunately, in ML "public data" typically means | available to the public. Even if it's pirated, like much | of the data available in the Books3 dataset, which is a | big part of some other very prominent datasets. | kzrdude wrote: | So basically youtube all over again? I.e bootstrap and | become popular by using widely available whatever media | (pirated by crowdsourced piracy) and then many years | later, when it gets popular, dominant, it has to turn | around and "do things right" and guard copyrights. | qihqi wrote: | (1) training ML systems on public data is fair use | | This one is tricky considering that kNN is also a ML | system. | visarga wrote: | kNN needs to hold on to a complete copy of the dataset | itself unlike a neural net where it's all mangled. | joepie91_ wrote: | > training ML systems on public data is fair use | | Uh, I very much doubt that. Is there any actual precedent | on this? | | > We expect that IP and AI will be an interesting policy | discussion around the world in the coming years, and we're | eager to participate! | | But apparently not eager enough to have this discussion | with the community _before_ deciding to train your | proprietary for-profit system on billions of lines of code | that undoubtedly are not all under CC0 or similar no- | attribution-required licenses. | | I don't see attribution anywhere. To me, this just looks | like _yet another_ case of appropriating the public | commons. | stefano wrote: | How do you guarantee it doesn't copy a GPL-ed function | line-by-line? | abraae wrote: | Surprised not to see more mention of this. It would make | sense for an AI to "copy" existing solutions. In the real | world, we use clean room to avoid this. | | In the AI world, unless all GPL (etc.) code is excluded | from the training data, it's inevitable that some will be | "copied" into other code. | | Where lawyers decide what "copy" means. | dkarras wrote: | How do you know that when you write a simplish function | for example, it is not identical to some GPL code | somewhere? "Line by line" code does not exist anywhere in | the neural network. It doesn't store or reference data in | that way. Every character of code is in some sense | "synthesized". If anything, this exposes the fragility of | our concept of "copyright" in the realm of computer | programs and source code. It has always been ridiculous. | GPL is just another license that leverages the copyright | framework (the enforcement of GPL cannot exist outside | such a copyright framework after all) so in such weird | "edge cases" GPL is bound to look stupid just like any | other scheme. Remember that GPL also forbids "derivative" | works to be relicensed (with a less "permissive" one). It | is safe to say that you are writing code that is close | enough to be considered "derivative" to some GPL code | somewhere pretty much every day, and you can't possibly | prove that you didn't cheat. So the whole framework | collapses in the end anyways. | king_magic wrote: | I truly don't think they can guarantee that. Which is a | massive concern. | ipsum2 wrote: | Yup, this isn't a theoretical concern, but a major | practical one. GPT models are known for memorizing their | training data: https://towardsdatascience.com/openai-gpt- | leaking-your-data-... | | Edit: Github mentions the issue here: | https://docs.github.com/en/github/copilot/research- | recitatio... and here: https://copilot.github.com/#faq- | does-github-copilot-recite-c... though they neatly ignore | the issue of licensing :) | visarga wrote: | > GPT models are known for memorizing their training data | | Hash each function, store the hashes as a blacklist. Then | you can ask the model to regenerate the function until it | is copyright safe. | ipsum2 wrote: | What if it copies only a few lines, but not an entire | function? Or the function name is different, but the code | inside is the same? | proteal wrote: | If we could answer those questions definitively, we could | also put lawyers out of a job. There's always going to be | a legal gray area around situations like this. | amelius wrote: | Does Copilot phone home? | heavyset_go wrote: | Yes, and with the code you're writing/generating. | amelius wrote: | This obviously sucks. | | Can't companies write code that runs on customer's | premises these days? Are they too afraid somebody will | extract their deep learning model? I have no other | explanation. | | And the irony is that these companies are effectively | transferring their own fears to their customers. | jmmcd wrote: | It's a large and gpu-hungry model. | gpm wrote: | When you sign up for the waitlist it asks permission for | additional telemetry, so yes. Also the "how it works" image | seems to show the actual model is on github's servers. | natfriedman wrote: | You should read the FAQ at the bottom of the page; I think it | answers all of your questions: | https://copilot.github.com/#faqs | viccuad wrote: | > You should read the FAQ at the bottom of the page; I | think it answers all of your questions: | https://copilot.github.com/#faqs | | Read it all, and the questions still stand. Could you, or | any on your team, point me on where the questions are | answered? | | In particular, the FAQ doesn't assure that the "training | set from publicly available data" doesn't contain license | or patent violations, nor if that code is considered | tainted for a particular use. | res0nat0r wrote: | From the faq: | | > GitHub Copilot is a code synthesizer, not a search | engine: the vast majority of the code that it suggests is | uniquely generated and has never been seen before. We | found that about 0.1% of the time, the suggestion may | contain some snippets that are verbatim from the training | set. | | I'm guessing this covers it. I'm not sure if someone | posting their code online, but explicitly saying you're | not allowed to look at it, getting ingested into this | system with billions of other inputs could somehow make | you liable in court for some kind of infringement. | IncRnd wrote: | That doesn't cover it, since that is a technical answer | for a non-technical question. The same questions remain. | notsureaboutpg wrote: | Sounds like using CodePilot can introduct GPLd code into | your project and make your project bound by GPL as a | result... | | 0.1% is a lot when you use 100 suggestions a day. | viccuad wrote: | that doesn't include patent violations nor license | violations or compatibility between licenses. Which would | be the most numerous and non-trivial cases. | res0nat0r wrote: | How is it possible to determine if you've violated a | random patent from somewhere on the internet via a small | snippet of customized auto-generated code? | | Does everyone in this thread contact their lawyers after | cutting and pasting a mergesort example from | Stackoverflow that they've modified to fit their needs? | Seems folks are reaching a bit. | IncRnd wrote: | For that very reason, many companies have policies that | forbid copying code from online (especially from | StackOverflow). | dlubarov wrote: | That mitigates copyright concerns, but patent | infringement can occur even if the idea was independently | rediscovered. | woah wrote: | I think a patent violation with CoPilot is exactly the | same scenario as if you violated a patent yourself | without knowing it. | dvaun wrote: | None of the questions and answers in this section hold | information about how the generated code affects licensing. | None of the links in this section contain information about | licensing, either. | rozab wrote: | This page has a looping back button hijack for me | samtheprogram wrote: | The most important question, whether you own the code, is | sort of maybe vaguely answered under "How will GitHub | Copilot get better over time?" | | > You can use the code anywhere, but you do so at your own | risk. | | Something more explicit than this would be nice. Is there a | specific license? | | EDIT: also, there's multiple sections to a FAQ, notice the | drop down... under "Do I need to credit GitHub Copilot for | helping me write code?", the answer is also no. | | Until a specific license (or explicit lack there-of) is | provided, I can't use this except to mess around. | netcraft wrote: | I dont see the answer to a single one of their questions on | that page - did you link to where you intended? | | Edit: you have to click the things on the left, I didn't | realize they were tabs. | Asmod4n wrote: | Might this end up putting GPL code into projects with an | incompatible license? | natfriedman wrote: | It shouldn't do that, and we are taking steps to avoid | reciting training data in the output: | https://copilot.github.com/#faq-does-github-copilot- | recite-c... https://docs.github.com/en/early- | access/github/copilot/resea... | | In terms of the permissibility of training on public code, | the jurisprudence here - broadly relied upon by the machine | learning community - is that training ML models is fair use. | We are certain this will be an area of discussion in the US | and around the world and we're eager to participate. | eqtn wrote: | Would i be able to use something like this in the near | future to produce a proprietary linux kernel? | Hamuko wrote: | > _training ML models is fair use_ | | How does that apply to countries where Fair Use is not a | thing? As in, if you train a model on a fair use basis in | the US and I start using the model somewhere else? | KMnO4 wrote: | I don't think it's fair to ask a US company to comment on | legalities outside of the US. | [deleted] | detaro wrote: | It's fair to expect a international company pushing its | products all over the world to be prepared to comment on | non-US jurisdictions. (I have some sympathy for "we have | a local market, and that's what we are solely targeting | and preparing for" in companies where that is actually | the case, but that's really not what we are dealing with | in the case of Microsoft/GitHub) | toomuchtodo wrote: | One would expect GitHub (owned by Microsoft) to have | engaged corporate counsel for an opinion (backed by | statue and case law), and to be prepared to disable the | functionality in jurisdictions where it's incompatible | with local IP law. | Asmod4n wrote: | Fair use doesn't exist in Germany. | [deleted] | jazzyjackson wrote: | > It shouldn't do that, and we are taking steps to avoid | reciting training data in the output | | This just gives me a flashback to copying homework in | school, "make sure you change some of the words around so | it's not obvious" | | I'm sure you're right Re: jurisprudence, but it never sat | right with me that AI engineers get to produce these big, | impressive models but the people who created the training | data will never be compensated, let alone asked. So I | posted my face on Flickr, how should I know I'm consenting | to benefit someone's killer robot facial recognition? | ramraj07 wrote: | Wait I thought y'all argued Google didn't copy Java for | Android, now that big tech is copying your code you're | crying wolf? | InspiredIdiot wrote: | The whole point of that case begins with the admission | "yes of course Google copied." They copied the API. The | argument was that copying an API to enable | interoperability was fair use. It went to the Supreme | Court because no law explicitly said that was fair use | and no previous case had settled the point definitively. | And the reason Google could be confident they copied only | the API is because they made sure the humans who did it | understood both the difference and the importance of the | difference between API and implementation. I don't think | there is a credible argument that any AI existing today | can make such a distinction. | npteljes wrote: | > ...the jurisprudence here - broadly relied upon by the | machine learning community - is that training ML models is | fair use. | | If you train az ML model on GPL code, and then make it | output some code, would that not make the result a | derivative of the GPL licensed inputs? | | But I guess this could be similar to musical composition. | If the output doesn't resemble any of the inputs, or | contains significant continous portions of them, then it's | not a derivative. | IncRnd wrote: | > If the output doesn't resemble any of the inputs, or | contains significant continous portions of them, then | it's not a derivative. | | In this particular case, the output resembles the inputs, | or there is no reason to use Github Copilot. | sicromoft wrote: | You just shared a URL that says "Please do not share this | URL publicly". | jamie_ca wrote: | Well, he's also GitHub's CEO so it's probably just fine. | SCLeo wrote: | > ...the jurisprudence here - broadly relied upon by the | machine learning community - is that training ML models is | fair use. | | To be honest, I doubt that. Maybe I am special, but if I am | releasing some code under GPL, I _really_ don 't want it to | be used in training a closed source model, which will be | used in a closed source software generating code for closed | source projects. | yjftsjthsd-h wrote: | Is it any different than training a human? What if a | person learned programming by hacking on GPL public code | and then went to build proprietary software? | IncRnd wrote: | Would you hire a person who only knew how to program by | taking small snippets of code from GPL and rearranging | them? That's like hiring monkey's to type Shakespeare. | | The clear difference is that a human's training regimen | is to understand how and why code interacts. That is | different from an engine that replicates other people's | source code. | woodruffw wrote: | A human being who has learned from reading GPL'd code can | make the informed, intelligent decision to not copy that | code. | | My understanding of the open problem here is whether the | ML model is intelligently recommending _entire fragments_ | that are explicitly licensed under the GPL. That would be | a licensing violation, if a human did it. | 10000truths wrote: | > A human being who has learned from reading GPL'd code | can make the informed, intelligent decision to not copy | that code. | | A model can do this as well. Getting the length of a | substring match isn't rocket science. | akavel wrote: | Actually, I believe it's tricky to say if even human can | actually do that safely. There's the whole concept of | "cleanroom rewrite" - meaning, if you want to rewrite | some GPL or closed-source project _into a different | license_ , you should make sure _you never ever seen even | a glimpse of the original code_. If you look on GPL or | closed-source code (or, actually, code governed by any | other license), it 's hard to prove you didn't | accidentally/subconsciously remember parts of this code, | and copy them into your "rewrite" project even if "you | made a decision to not copy". The border between | "inspired by" and "blatant copyright infringement" is | blurry and messy. If that was already so tricky and | troublesome legal-wise before, my first instinct is that | with the Copilot it could be even more legally murky | territory. IANAL, yet I'd feel better if they made some | [legally binding] promises that their model is based only | on code carefully verified to have one of an explicit | (and published) whitelist of permissive licenses. (Even | this could be tricky, with MIT etc. actually requiring | some mention in your advertising materials [which is | often forgotten], but now that's a completely different | level of trouble than not knowing if I'm infringing GPL | or some closed-source code, or other weird license.) | Hamuko wrote: | How do you distribute a human? | yjftsjthsd-h wrote: | A contractor seems equivalent to SaaS to me | praptak wrote: | It is different in the same way that a person looking at | me from their window when I pass by is different from a | thousand cameras observing me when I move around city. | Scale matters. | throwaway2037 wrote: | This is a lovely analogy, akin to "sharing mix tapes" vs | "sharing MP3s on Napster". I fear the coming world with | extensive public camera surveilance and facial | recognition! (For any other "tin foil hatters" out there, | cue the trailer for Minority Report.) | Hamuko wrote: | > _I fear the coming world with extensive public camera | surveilance and facial recognition!_ | | I fear the coming world of training machine learning | models with my face just because it was published by | someone somewhere (legally or not). | heavyset_go wrote: | You can rest assured that this is already the case if | your picture was ever posted online. There are dozens of | such products that law enforcement buys subscriptions to. | jonny_eh wrote: | > a thousand cameras observing me when I move around | city. Scale matters. reply | | While I certainly appreciate the difference, is camera | observation illegal anywhere where it isn't explicitly | outlawed? Meaning, have courts ever decided that the | difference of scale matters? | praptak wrote: | No idea. I was not trying to make a legal argument. This | was to try to convey why someone might feel ok about | humans learning from their work but not necessarily about | training a model. | [deleted] | twobitshifter wrote: | What if a person heard a song by hearing it on the radio | and went on to record their own version? | mkr-hn wrote: | There is already a legal structure in place for cover | song licensing. | | https://en.wikipedia.org/wiki/Cover_version#United_States | _co... | twobitshifter wrote: | Exactly so it needs licensing of some sort - this is | closer to cover tunes than it is to someone getting a CS | degree and being asked to credit Knuth for all their | future work. | manquer wrote: | Perhaps we need GPL v4. I don't think there is any clause | in current V2/V3 that prohibits _learning_ from the code, | only using the code in other places and running a service | with code. | colinbartlett wrote: | Would you be okay with a human reading your GPL code and | learning how to write closed source software for closed | source projects? | zarzavat wrote: | The whole point of fair use is that it allows people to | copy things even when the copyright holder doesn't want | them to. | | For example, if I am writing a criticism of an article, I | can quote portions of that article in my criticism, or | modify images from the article in order to add my own | commentary. Fair use protects against authors who try to | exert so much control over their works that it harms the | public good. | Tainnor wrote: | Fair Use is specific to the US, though. The picture could | end up being much more complicated when code written | outside the US is being analyzed. | krzyk wrote: | It is not US specific, we have it in EU. And e.g. in | Poland I could reverse engineer a program to make it work | on my hardware/software if it doesn't. This is covered by | fair use here. | Hamuko wrote: | The messier issue is probably using the model to write | code outside the US. Americans can probably analyze code | from anywhere in the world and refer to Fair Use if a | lawyer comes knocking, but I can't refer to Fair Use if a | lawyer knocks on my door after using Copilot. | IncRnd wrote: | This isn't the same situation at all. The copying of code | doesn't seem to be for a limited or transformative | purpose. Fair use might cover parody or commentary & | criticism but not limitless replication. | zarzavat wrote: | They are not replicating the code at all. They are | training a neural network. The neural network then | _learns_ from the code and synthesises new code. | | It's no different from a human programmer reading code, | learning from it, and using that experience to write new | code. Somewhere in your head there is code that someone | else wrote. And it's not infringing anybody's copyright | for those memories to exist in your head. | jaimeyap wrote: | We can't yet equivocate ML systems with human beings. | Maybe one day. But at the moment, it's probably better to | compare this to a compiler being fed licensed code. The | compilation output is still subject to the license. | Regardless of how fancy the compiler is. | | Also, a human being that reproduces licensed code from | memory - because they read that code - would constitute a | license violation. The line between derivative work, and | authentic new original creation is not a well defined | one. This is why we still have human arbiters of these | decisions and not formal differential definitions of it. | This happens in music for example _all the time_. | IncRnd wrote: | It is replication, maybe not of a single piece of code - | but creating a synthesis is still copying. For example, | constructing a single piece of code of three pieces of | code from your co-workers is still replication of code. | | Your argument would have some merit if something were | _created_ instead of assembled, but there is no new | algorithm that is being created. That is not what is | happening here. | | On the one hand, you call this copying in fair use. On | the other hand, you say this is creating new code. You | can't have it both ways. | blibble wrote: | surely that depends on the size of the training set? | | I could feed the Linux kernel one function at a time into | a ML model, then coerce its output to be exactly the same | as the input | | this is obviously copyright infringement | | whereas in the github case where they've trained it on | millions of projects maybe it isn't? | | does the training set size become relevant legally? | slownews45 wrote: | This is what is so miserable about the GPL progression. | We went from GPLv2 (preserving everyone's rights to use | code) to GPLv3 (you have to give up your encryption keys) | - I think we've lost the GPL as a place where we could | solve / answer these types of questions which are good | ones - GPL just tanked a lot of trust in it with the | (A)GPLv3 stuff especially around prohibiting other | developers from specific uses of the code (which is | diametrically different from earlier versions which | preserved rights). | gspr wrote: | Think what you will of GPLv3, but lies help no one. Of | course it doesn't require you to give up your encryption | keys. | slownews45 wrote: | Under GPLv2 I could make a device with GPLv2 software and | maintain root of trust control of that device if I wanted | (ie, do an anti-theft activation lock process, do a lease | ownership option of $200/month vs $10K to buy etc). | | Think what you will, but your lies about the GPLv3 can | easily be tested. Can you point me to some GPLv3 software | in the Apple tech stack? | | We actually already know the answer. | | Apple had to drop Samba (they were a MAJOR end user use | of Samba) because of GPLv3 | | I think they also moved away from GCC for LLVM. | | In fact - they've probably purged at least 15 packages | I'm aware of and I'm aware of NO GPLv3 packages being | included. | | Not sure what their App Store story is - but I wouldn't | be surprised if they were careful there too. | | Oh - this is all lies and apple's lawyers are wrong? Come | one - I'm aware of many other companies that absolutely | will not ship GPLv3 software for this reason. | | In fact, by 2011 even it was clear that GPLv3 is not | really workable in a lot of contexts and alternatives | like MIT became more popular. | | https://trends.google.com/trends/explore?date=all&geo=US& | q=%... | | Apple geared up to fight DOJ over maintaining root | control of devices (San Bernadino case). | | Even Ubuntu has had to deal with this - SFLC made it | clear that if some distributor messed things up ubuntu | would have to release their keys, which is why they ended | up with a MICROSOFT (!) solution. | | "Ubuntu wishes to ensure that users can boot any | operating system they like and run any software they | want. Their concern is that the GPLv3 makes provisions by | which the FSF could, in this case as the owner of GRUB2, | deem that a machine that won't let them replace GRUB2 | with something else is in violation of the GPLv3. At that | point, they can demand that Ubuntu surrender its | encryption keys used to provide secure bootloader | verification--which then allows anyone to sign any | bootloader they want, thus negating any security features | you could leverage out of the bootloader (for example, | intentionally instructing it to boot only signed code-- | keeping the chain trusted, rather than booting a foreign | OS as is the option)." - commentator on this topic. | | It's just interesting to me that rather than any | substance the folks arguing for GPLv3 reach for name | calling type responses. | easton wrote: | That's why Apple's SMB implementation stinks! Finally, | there's a reason for it, I thought they had just started | before Samba was mature or something. | slownews45 wrote: | Yeah, it was a bit of a big bummer! | | Apple used to also interoperate wonderfully if you were | using Samba SERVER side too because - well, they were | using Samba client side. Those days were fantastic | frankly. You would run Samba server side (on Linux), then | Mac client side - and still have your windows machines | kind of on -network (for accounting etc) too. | | But the Samba folks are (or were) VERY hard core GPLv3 | folks - so writing was on the wall. | | GPLv3 shifted things really from preserving developer | freedom for OTHERs to do what they wanted with the code, | to requiring YOU to do stuff in various ways which was a | big shift. I'd assumed that (under GPLv2) there would be | natural convergences, but GPLv3 really blew that apart | and we've had a bit of a license fracturing relatively. | | AGPLv3 has also been a bit weaponized to do a sort of | fake open source where you can only really use the | software if you pay for a commercial license. | goodpoint wrote: | These claims are absurd. AGPL and GPLv3 carry on the same | mission of GPLv2 to protect authors and end users from | proprietization, patent trolling and freeloading. | | This is why SaaS/Cloud companies dislike them and fuel | FUD campaigns. | rowanG077 wrote: | That's the point of fair use. To do something with a | material the original author does not want. | b3morales wrote: | Can you explain why you think this is covered by fair | use? It seems to me to be | | 1a) commercial | | 1b) non-transformative: in order to be useful, the | produced code must have the same semantics as some code | in the training set, so this does not add "a different | character or purpose". Note that this is very different | from a "clean room" implementation, where a high-level | design is reproduced, because the AI _is looking directly | at_ the original code! | | 2) possibly creative? | | 3) probably not literally reproducing input code | | 4) competitive/displacing for the code that was used in | the input set | | So failing at least 3 out of 5 of the guidelines. | https://www.copyright.gov/fair-use/index.html | rowanG077 wrote: | 1a) Fair use can be commercial. And copilot is not | commercial so the point is moot. | | 1b) This is false. This is not literally taking snippets | it has found and suggesting it to the user. That would be | an intelligent search algorithm. This is writing novel | code automatically based on what it has learned. | | 2) Definitely creative. It's creating novel code. At | least it's creative if you consider a human programming | to be a creative endeavor as well. | | 3) If it's reproducing input code it's just a search | algorithm. This doesn't seem to be the case. | | 4) Most GPLed code doesn't cost any money. As such the | market for it is non-existent. Besides copilot does not | displace the original even if there were a market for it. | As far as I know there is not anything even close to | comparable in the world right now. | | So from my reading it violates none of the guidelines. | dragonwriter wrote: | > To be honest, I doubt that. | | Okay, but that's...not much of a counterargument (to be | fair, the original claim was unsupported, though.) | | > Maybe I am special, but if I am releasing some code | under GPL, I really don't want it to be used in training | a closed source model | | That's _really_ not a counterargument. "Fair use" is an | exception to exclusive rights under copyright, and | renders the copyright holder's preferences moot to the | extent it applies. The copyright holder not being likely | to want it based on the circumstances is an argument | against it being implicitly licensed use, but not against | it being fair use. | yewenjie wrote: | This looks really cool. Do you plan to release this in some | other form like a language server so that it can be easily | integrated to other editors? | jonas_kgomo wrote: | This is obviously controversial, since we are thinking about | how this could displace a large portion of developers. How do | you see Copilot being more augmentative than disruptive to the | developer ecosystem? Also, how you see it different from | regular code completion tools like tabnine. | tux1968 wrote: | How many jobs have developers helped displace in business and | industry? I don't think it's controversial that we become | fair game for that same automation process we've been | leading. | AnIdiotOnTheNet wrote: | Indeed. It should be the goal of society to automate away | as much work as possible. If there are perverse incentives | working against this then we should correct them. | amw-zero wrote: | Human beings need something to do to have a fulfilling | life. I do not agree at all that the ultimate goal of | society is to automate everything that's possible. I | think that will be horrible overall for society. | kyawzazaw wrote: | I typically find other things fulfill my life more than | work. | AnIdiotOnTheNet wrote: | My job is probably the least fulfilling activity in my | life and I'm sure that goes for a lot of people. | | By your reasoning, maybe we don't need backhoes and | should just hire a bunch of guys with spoons instead? | gugagore wrote: | 1. How do you define work differently from "that which | should be automated"? | | 2. While I agree with your stance, it is not by itself | sufficient. If you provide the automation but you do not | correct the perverse incentives (or you worry about | correcting them only later) that you mention, then you | are contributing to widening the disparity between a | category of workers (who have now lost their leverage) | and those with assets and capital (who have a reduced | need for workers). | mkr-hn wrote: | That's why it's best to get unions or support systems | (like UBI) before they're needed. It's hard to organize | and build systems when you have no leverage, influence, | or power left. | AnIdiotOnTheNet wrote: | I agree, the fact we're even talking about this is | evidence that our society has the perverse incentive | baked in and we should be aware of and seek to address | that. | | Regardless, programmers would be hypocritical to decry | having their jobs automated away. | spec-obs wrote: | I would be up for that, if said society did not leave us | destitute as a result | pizza234 wrote: | > How many jobs have developers helped displace in business | and industry? | | How many? | | > I don't think it's controversial that we become fair game | for that same automation process we've been leading. | | This is not correct. A human (developer) displacing another | human (business person) is entirely different than a tool | (AI bot) replacing a human (developer). | | Regardless, this is the Lump of Labour fallacy | (https://en.wikipedia.org/wiki/Lump_of_labour_fallacy). | | In this case, it is assumed that the global amount of | development work is fixed, so that, if AI takes a part of | it, the equivalent workforce in terms of developers, will | be out of job. Especially in the field of SWE, this is | obviously false. | | It also needs to be seen what this technology will actually | do. SWE is a complex field, way more than typing a few | routines. In best case (technologically speaking) this will | be an augmentation. | tux1968 wrote: | > A human (developer) displacing another human (business | person) is entirely different | | That's not what is happening though, a few developers | replace thousands of business and industry people with | automated tools. Say, automated route planning for | package delivery, would take many thousands of humans if | not for the AI bots that do the job instead. | | > SWE is a complex field, way more than typing a few | routines. In best case (technologically speaking) this | will be an augmentation. | | Of course there will always be some jobs for humans to | do. Just like there are still jobs for humans loading | thread into the automated looms and such. | | But your arguments against automation displacing | programming jobs ring hollow. People said the same thing | about chess playing programs, they would never be able to | understand the subtlety or complexity like a human could. | varjag wrote: | > SWE is a complex field, way more than typing a few | routines. In best case (technologically speaking) this | will be an augmentation. | | If there is a pathway to improving this AI assist | efficiency say by restricting the language, methodology, | UI paradigm and design principles, it will happen quick | due to market incentives. The main reason SWE is complex | is it's done manually in myriad subjectively preferred | ways. | cnvm wrote: | Nothing is inevitable. Doctors and lawyers have protected | their professions successfully for centuries. | | Only some software developers seem interested in replacing | themselves in order to enrich their corporate masters | (mains?) even further. | | Just don't use this tool! | mason55 wrote: | This tool only replaces a small part of a good programmer | and just further highlights the differences between | people blindly writing code and people building actual | systems. | | The challenge in software development is understanding | the real world processes and constraints and turning them | into a design for a functional & resilient system that | doesn't collapse as people add every little idea that | pops into their head. | | If the hard part was "typing in code" then programmers | would have been replaced long ago. But most people can't | even verbally explain what they do as a series of steps & | decision points such that a coherent and robust process | can be documented. Once you have that it's easy to turn | into code. | tux1968 wrote: | > Doctors and lawyers have protected their professions | successfully for centuries. | | And one could argue that this means we all pay more for | health and legal services than we otherwise would. You | have to calculate both costs and benefits; what price | does society pay for those few people having very high | paying jobs? | serf wrote: | >How many jobs have developers helped displace in business | and industry? I don't think it's controversial that we | become fair game for that same automation process we've | been leading. | | historically when has that sort of 'tit-for-tat' style of | argument ever been helpful? | | the correct approach would be "we've observed first hand | the problems that we've cause for society, how can we avoid | creating such problems _for any person_ in the future? " | | It might seem self-serving, and it is, but 'two wrongs | don't make a right'. Let's try to fix such problems rather | than serving our sentence as condemned individuals. | tux1968 wrote: | > historically when has that sort of 'tit-for-tat' style | of argument ever been helpful? | | It's not tit-for-tat, it's a wake up call. As in, what | exactly do you think we've been doing with our skills and | time? | | > ""we've observed first hand the problems that we've | cause for society"... | | But not everyone agrees that this is actually a problem. | There was a time when being a blacksmith or a weaver was | a very highly paid profession, and as technology improved | and the workforce became larger, large wages could no | longer be commanded. Of course the exact same thing is | going to happen to developers, at least to some extent. | natfriedman wrote: | We think that software development is entering its third wave | of productivity change. The first was the creation of tools | like compilers, debuggers, garbage collectors, and languages | that made developers more productive. The second was open | source where a global community of developers came together | to build on each other's work. The third revolution will be | the use of AI in coding. | | The problems we spend our days solving may change. But there | will always be problems for humans to solve. | jonas_kgomo wrote: | I appreciate this insight, as a proponent of progress | studies. It is indeed a pragmatic view of what the industry | will be or should be. I believe the thing that would be | also appreciated would be a pair security auditor. Most | vulnerabilities in software can be avoided early on in | development , I believe this could be a great addition to | Github's Security Lab securitylab.github.com/ | dfkl wrote: | Do you or 'natfriedman have authored any works in a | public repository, so that we can judge the validity of | the pragmatic view? | freedomben wrote: | I'm super interested to read more about your | theory/analysis. Have you written on it in a blog or | anything? | therealplato wrote: | There's a good amount of discussion on this topic in "The | Mythical Man-Month". The entire book is discussing the | factors that affect development timeframes and it | specifically addresses whether AI can speed it up (albeit | from 1975, 1986 and 1995 viewpoints, and comparing | progress between those points.) | freedomben wrote: | Thanks! That's a great suggestion. I forgot that was in | there. | | I read Mythical Man Month many years ago and enjoyed it. | Time for a re-read. Of course it won't cover the third | wave very well though. Would love to see a blog post | cover that. | xna wrote: | Let's solve the problem of replacing CEOs next. The above | paragraph could have been written by GPT-3 already. | foobarbazetc wrote: | LOL. But you actually make a good point here. GPT-3 can | replace most comms / PR type jobs since they all sound | like Exec-speak. | toomuchtodo wrote: | I think you're looking at the problem the wrong way. This | provides less strong engineering talent with more | leverage. The CEO (which could be you!) gets closer to | being a CTO with less experience and context necessary | (recall businesses that run on old janky codebases or no | code platforms; they don't have to be elegant, they | simply have to work). | | It all boils down to who is capturing the value for the | effort and time expended. If a mediocre software engineer | can compete against senior engineers with such | augmentation, that seems like a win. Less time on | learning language incantations, more time spent | delivering value to those who will pay for it. | foobarbazetc wrote: | That's not really how it's going to go though. Just look | at what your average person is able to accomplish with | Excel. | | Your own example of the CEO becoming a CTO can be used in | every level and part of the business. | | Now the receptionist is building office automation tools | because they can describe what they want in plain English | and have this thing spit out code. | dragonwriter wrote: | > Just look at what your average person is able to | accomplish with Excel. | | Approximately nothing. | | The average knowledge worker somewhat more, but lots of | them are at the level of "I can consume a pivot table | someone else set up". | | Sure, there are highly-productive, highly-skilled excel | users that aren't traditional developers that can build | great things, but they aren't "your average person". | foobarbazetc wrote: | Well, agree to disagree here as I've seen it with my own | eyes, but it's kind of besides the point. | | Is it a coincidence that the same company that makes | Excel is trying to... "democratize" and/or de-specialize | programming? | | I don't really think so, but _shrug_. | toomuchtodo wrote: | https://news.ycombinator.com/item?id=24791017 (HN: Excel | warriors who save governments and companies from | spreadsheet errors) | | https://news.ycombinator.com/item?id=26386419 (HN: Excel | Never Dies) | | https://news.ycombinator.com/item?id=20417967 (HN: I was | wrong about spreadsheets) | | https://mobile.twitter.com/amitranjan/status/113944938807 | 223... (Excel is every #SAAS company's biggest | competitor!) | dragonwriter wrote: | Yes, Excel "runs the world", and in most organizations, | you'll find a fairly narrow slice of Excel power users | that build and maintain the Excel that "runs the world". | | We may not call them developers or programmers (or we | might; I've been one of them as a fraction of my job at | different times, both as a "fiscal analyst" by working | title and as a "programmer analyst" by title), but | effectively that's what they are, developers using (and | possibly exclusively comfortable with) Excel as a | platform. | [deleted] | alexanderdmitri wrote: | I think this is already happening. There's credible | evidence that the Apple CEO, Tim Cook, has been | essentially replaced by a Siri-driven clone over the last | 7 months. They march the real guy out when needed, but if | you watch closely when they do, it's obvious he's under | duress reading lines prepared by an AI. His testimony in | the Epic lawsuit for example. They'll probably cite how | seriously he and the company take 'privacy' to help | normalize his withdrawal from the public space in the | coming years. | mkr-hn wrote: | This is exactly the kind of fun, kooky conspiracy theory | I've missed with all the real conspiracies coming to | light over the last decade or so. | tudelo wrote: | Can you cite some of this credible evidence? | wolverine876 wrote: | natfriedman is a human being like you and me, not an AI; | let's treat them with consideration for that. | anmk wrote: | Perhaps he should go easy on the euphemisms then and show | respect for the developers who wrote the corpus of | software that this AI is being trained on (perhaps | illegally). | wolverine876 wrote: | OK, then ask him to go easy! Great idea, and it might get | a good response. | dragonwriter wrote: | > This is obviously controversial, since we are thinking | about how this could displace a large portion of developers. | | It... couldn't, in net. | | Tools which improve developer productivity increase the | number of developers hired and the number of tasks for which | it is worthwhile to employ them and the market clearing price | for development work. | | See, for examples, _the whole history of the computing | industry as we've added more layers of automation between | "conceptual design for software" and "bit patterns in | hardware implementing that conceptual design as concrete | software"_. | | It might displace or disadvantage some developers in specific | (though I doubt a large portion) by shifting the relative | value of particular subskills within the set used in | development, I suppose. | dvaun wrote: | I agree with this viewpoint. | | A tool which increases how rapidly we can output code-- | correct code--would allow for more time spent on hard | tasks. | | I can see the quality of some "commodity" software | increasing as a result of tools in this realm. | ArtWomb wrote: | Hi Nat! Just signed up for the preview (even though I'm the | type to turn off intellisense and smart indent). I was | wondering if WebGL shader code (glsl) was included in the | training set? Translating human understandable graphics effects | from natural language is a real challenge ;) | mwcampbell wrote: | Has this been tested for accessibility yet, particularly with a | screen reader? | Celenduin wrote: | This is impressive. And scary. How long has your team been | working on this first release? | pera wrote: | Cool project! Have you seen any interesting correlations | between languages, paradigms and the accuracy of your | suggestions? | dang wrote: | Ok, we changed the URL to that from | https://github.blog/2021-06-29-introducing-github-copilot- | ai.... | [deleted] | hit8run wrote: | I tried the paid version of tabnine and was really unhappy | because it suggested me code with syntax errors and introduced | subtle bugs when I did not closely review every generated line. | It was as if you have someone very impatient sitting next to you | typing before actually really listening what you want to do. Is | Copilot better? Does it suggest broken code, too? | ranguna wrote: | According to some of the comments here: yes, yes it does. One | do the snippets on the front page has a bug, don't remember | which, but that was written by someone here. | bencollier49 wrote: | Ah, here it is; the Power Loom. And we're all weavers. | | http://historymesh.com/object/power-loom/?story=textiles | kp302 wrote: | Sounds like a big distraction. | RosanaAnaDana wrote: | Why are the green ones always angry? | xeromal wrote: | haha, you asked a question that I've always wondered. | dang wrote: | I suppose you know this, but green means the account is new | (https://news.ycombinator.com/newsfaq.html). Angry people | sometimes create new accounts to vent. That's not a good use | of HN, but let's not respond as if there's something wrong | with new users coming to the community. We don't want this | place to become insular and brackish. New users are the | community's freshwater! | RosanaAnaDana wrote: | This is a great point. Do names turn from green to grey | based on karma or time? | dang wrote: | Time alone. | elteto wrote: | They haven't ripened yet. | ezekg wrote: | Anonymity can breed honesty, I guess. | cqaz wrote: | This is hardly grounds for celebration. Another step in | Microsoft's efforts to drive down programmer salaries by | expanding the work force. | | Also, this tool will enable more cheap LOC churn for those gaming | performance reviews (not that this is currently difficult, but it | will be even easier). | f38zf5vdt wrote: | It will absolutely transform undergraduate education in | computer science, or rather, the breadth of the workload. <:) | [deleted] | JCBird1012 wrote: | It's funny that GitHub ships their own text editor, Atom, but the | second example (under the "More than Autocomplete" header) on the | Copilot website is clearly using VSCode. | jon-wood wrote: | I don't know if they've officially said anything on this, but | since the MS acquisition it feels like Atom is on life support. | Github Codespaces is built on VSCode, and there's a lot of | effort from Github going into their VSCode extension for things | like PR review. | Aperocky wrote: | now they just need to build this into vim. | alpaca128 wrote: | Or more realistically, a language server which would then be | compatible with many editors including (Neo)Vim. | kleiba wrote: | Presumably, this will be accessible through a REST API or | something like that at some point, so that it can eventually | be integrated into all editors. | Aperocky wrote: | yep, the only thing I'm not quite sure about is how often | it needs to call home. Presumably very often.. which runs | counter to minimalism. | kleiba wrote: | Check out the "Telemetry" link when you scroll down on | the project page. | hsuduebc2 wrote: | Ok, I'm afraid I must become some kind of neoluddite or I'm going | to starve to death in next twenty years. | nonbirithm wrote: | It sounds like this is similar to Kite, but actually competent as | a service and not associated with a brand that has destroyed all | trust. But it has to come with the same privacy caveats, right? | Uploading your private code to a third-party server could result | in business or regulatory violations. | | And even if you're okay with sending your code, what about | hardcoded secrets? What's to prevent Copilot clients from sending | things that should never leave the user's computer? Heuristics? | Will we be able to tell what part of the code is about to be | sent? And is the data stored? | speedgoose wrote: | I'm very enthusiastic about this kind of technology. I recently | stopped using tabnine because the suggestions where often worse | than the normal IDE completion while being on top of the list, | but I do miss the magic. | | I think AI in software development will save us a lot of time, so | we can focus on more interesting things. It may replace a few | humans in the long term, but as software developers we shouldn't | be hypocrite because we work hard to replace humans by software. | kaimorid wrote: | hello | minimaxir wrote: | So this is what happens when you're owned by Microsoft who has an | exclusive contract with OpenAI. | | A couple weeks ago I ran a few experiments with AI-based code | generation (https://news.ycombinator.com/item?id=27621114 ) from | a GPT model more suitable for code generation: it sounds like | this new "Codex" model is something similar. | | If anyone from GitHub is reading this, please give me access to | the Alpha so I can see what happens when I give it a | should_terminate() function. :P | jboggan wrote: | I saw that post, neat stuff. We made an attempt to develop | something similar 4 years ago and take it to YC, it simply | wasn't good enough often enough because our training data | (Stack Overflow posts) was garbage and models were weaker back | then. I figured it would take about 5 years for it to really be | useful given the technology trajectory, and here we are. | | I'll note that we weren't trying to build "code auto-complete" | but instead a automated "rubber duck debugger" which would | function enough like another contextually-ignorant but | intelligent programmer that you could explain your issues to | and illuminate the solution yourself. But we did a poor job of | cleaning the data and we found that English questions started | returning Python code blocks, sometimes contextually relevant. | It was neat. This GitHub/OpenAI project is neater. | | I would be curious what the cost of developing and running this | model is though. | [deleted] | colesantiago wrote: | Gigantic caveat. | | > I agree to these additional telemetry terms as part of the | technical preview | reilly3000 wrote: | Right. If you're comfortable giving access to your source files | to GitHub+OpenAI, then go for it. | | I'm not sure how this would apply to secret keys or flat files | with customer data/PII, but in any case that makes it a non- | starter for me. | | Their "Please do not share this URL publicly." Banner at the | top of the page which disclosed this info makes my skin crawl a | bit... | | If I were only working on public projects I would be on board | right away, it looks like a big time saver. | | Am I being to paranoid here? | ranguna wrote: | No, you are not being paranoid. This tool literally uploads | all the code it wants off your machine, and I see no way of | filtering out secrets and the likes. You have all the rights | to be worried about that. | kleiba wrote: | "You have zero privacy anyway. Get over it." | | Scott McNealy, CEO Sun Microsystems, 1999 | colesantiago wrote: | Then, I should see absolutely _zero complaints_ about | privacy, tracking, spying, google analytics and facebook | tracking. | | Perhaps we should ask Scott if he is willing to share his | browsing history, his personal photos and his passwords with | the rest of us or maybe if I can come into his house? | | After all, _" You have zero privacy anyway"_ | [deleted] | hiram112 wrote: | I wonder if CoPilot uses Github's private repositories to train | itself, which would allow malicious users to somehow obtain code | or designs that they otherwise would not be able to view. | thrower123 wrote: | I chuckle when one of the bullet points is that it autogenerates | the stupid, pointless unit-tests that you need to write to verify | trivial code, but boosts your code-coverage metrics... | heroHACK17 wrote: | Remote coding interviews just got 100x easier. | Buttons840 wrote: | I want to try and make it generate a UUID and then see if I can | find the original source. | BiteCode_dev wrote: | I like the concept, but just like with kite, having part of my | code sent to the remote service is a not going to be ok for many | of my projects. | | For foss ones it could be great though. | | I think the best part for me is that how it's going to introduce | even more low code devs to the worker pool, which means I will be | able to raise my price again. Last time this happens, when | designers got to the backend, I got +30% in a year once my | clients figured out the difference in output. | markstos wrote: | Software automating people out of jobs spares no one, even the | people writing the software. | | If this improves developer efficiency by 10%, 10% fewer | developers are required. | asperous wrote: | Or development cost lowers by 10% keeping salaries the same, | and more work that was previously too expensive to do is now | feasible. | gervwyk wrote: | "Software eats world" just got more real. Funny that programmers | thought they were eating the world with software, now, software | has eaten them. | visarga wrote: | far from it, this thing won't write full applications by itself | gervwyk wrote: | I know.. But we can dream. Also I'me sure when we first got | code completion we said - "This thing won't write functions | by itself." | [deleted] | iwintermute wrote: | So if it was trained using "source code from publicly available | sources, including code in public repositories on GitHub." was it | also GPLv2? | | So everything generated also GPLv2? | pjfin123 wrote: | I think this would fall under any reasonable definition of fair | use. If I read GPL (or proprietary) code as a human I still own | code that I later write. If copyright was enforced on the | outputs of machine learning models based on all content they | were trained on it would be incredibly stifling to innovation. | Requiring obtaining legal access to data for training but full | ownership of output seems like a sensible middle ground. | megous wrote: | 1) this is not human, it's some software | | 2) if I write a program that copies parts of other GPL | licensed SW into my proprietary code, does that absolve me of | GPL if the copying algorithm is complicated enough? | pjfin123 wrote: | Clearly this requires some level of judgement but this | isn't new, determining what is plagiarism and not requires | a similar judgement call. | tomthe wrote: | What if I put a licence on my Github-repositories that | explicitly forbids the use of my code for machine-learning | models? | pjfin123 wrote: | Then the person training the models wouldn't be legally | accessing your code. | 542458 wrote: | My interpretation of the GitHub TOS section D4 would give | give GitHub the right to parse your code and/or make | incedental copies regardless of what your license states. | | https://docs.github.com/en/github/site-policy/github- | terms-o... | | This is the same reason it doesn't matter if you put up a | license that forbids GitHub from including you in backups | or the search index. | maxhille wrote: | And so it begins: We start applying human rights to AIs. | | Not a critique on your point, which a was just about yo bring | up myself. | sanderjd wrote: | Certainly not. If I memorize a line of copyrighted code and | then write it down in a different project, I have copied it. | If an ML model does the same thing as my brain - memorizing a | line of code and writing it down elsewhere - it has also | copied it. In neither case is that "fair use". | qchris wrote: | This is a bit tricky, because at least in the U.S., I don't | believe it's settled question in law yet. Some of the other | posters on here have said that the resulting model isn't | covered by GPL--that's partially true, but provenance of data, | and the rights to it, definitely does matter. A good example of | this was the Everalbum ruling, where the company was forced to | delete both the data and the trained models used they were used | to generate due to lack of consent from the users from whom the | data was taken[1]. Since open source code is, well, open, it's | definitely less a problem for permissively-licensed code. | | That said, copyright is typically generally assigned to the | closest human to the activation process (it's unlikely that | Github is going to try to claim the copyright to code generated | by Copilot over the human/company pair-programming with it), | but since copyleft in general is a pretty domain-specific to | software, afaik the way that courts interpret the legality of | using code licensed under those terms in training data for a | non-copyleft-producing model is still up in the air. | | Obligatory IANAL, and also happy to adjust this info if someone | has sources demonstrating updates on the current state. | | [1] https://techcrunch.com/2021/01/12/ftc-settlement-with- | ever-o... | blibble wrote: | until the legal position is clear it you'd have to be insane | to allow output from this process to be incorporated into | your codebases | | imagine if the output was ruled as being GPLv2, then having | to go through a proprietary codebase trying to rip out these | bits of code | | it would be basically impossible | throwawaygh wrote: | _> So everything generated also GPLv2?_ | | Almost certainly not _everything_. | | But possibly things that were spit out verbatim from the | training set, which the FAQ mentions does happen about .1% of | the time [1]. Another comment in this thread indicated that the | model outputs something that's verbatim usable about 10% of the | time. So, taking those two numbers together, if you're using a | whole generated function verbatim, a bit of caveat emptor re: | licensing might not be the worst idea. At least until the | origin tracker mentioned in the FAQ becomes available. | | [1] https://docs.github.com/en/early- | access/github/copilot/resea... | | [2] "GitHub Copilot is a code synthesizer, not a search engine: | the vast majority of the code that it suggests is uniquely | generated and has never been seen before. We found that about | 0.1% of the time, the suggestion may contain some snippets that | are verbatim from the training set. Here is an in-depth study | on the model's behavior. Many of these cases happen when you | don't provide sufficient context (in particular, when editing | an empty file), or when there is a common, perhaps even | universal, solution to the problem. We are building an origin | tracker to help detect the rare instances of code that is | repeated from the training set, to help you make good real-time | decisions about GitHub Copilot's suggestions." | andrewstuart2 wrote: | You bring up a really good point. I'm super curious what the | legality and ethics around training machines on licensed or | even proprietary code would be. IIRC there are implications | around code you can build if you've seen proprietary code (I | remember an article from HN about how bash had to be written by | someone who hadn't seen the unix shell code or something like | that). | | How would we classify that legally when it comes to training | and generating code? Would you argue the machine is just | picking up best practices and patterns, or would you say it has | gained specifically-licensed or proprietary knowledge? | Iv wrote: | I would argue that a trained model falls under the legal | category of "compilation of facts". | | More generally, keep in mind that the legal world, despite an | apparent focus on definition is very bad at dealing with | novelty, and most of it will end up justifying a posteriori | existing practices. | einpoklum wrote: | You might argue that, but you would likely be wrong. | | Even a search engine is not merely a "compilation of | facts". A trained model is the result of analysis and | reasoning, albeit automated. | jjcm wrote: | A search engine provides snippets of other data. You can | point explicitly to where it got that text from. A | trained model generates its own new data, from influence | of millions of different sources. It's entirely | different. | teekert wrote: | Is what a human generates GPLv2 because it learned from GPLv2 | code? | IMTDb wrote: | What if a human copies GPLv2 code? | f38zf5vdt wrote: | https://gpl-violations.org/ | teekert wrote: | When is it copying? What about all those stack overflow | snippets I copied?! | JustFinishedBSG wrote: | Congrats, you've just discovered why many employers block | or forbid stackoverflow. | flohofwoe wrote: | I seem to remember a similar discussion on Intellicode (similar | thing, but more like Intellisense, and as Visual Studio | plugin), which is trained on "github projects with more than | 100 stars". IFIR they check the LICENSE.txt file in the project | and ignore projects with an "incompatible" license. I don't | have any links handy which would confirm this though. | uticus wrote: | Could it be this? | https://visualstudio.microsoft.com/services/intellicode/ | | I was wondering the same thing, especially with MS being | behind both. | | edited: or this? https://docs.microsoft.com/en- | us/visualstudio/intellicode/cu... | dekhn wrote: | No, a model trained on text covered by a license is not itself | covered by the license, unless it explicitly copies the text | (you cannot copyright a "style"). | evgen wrote: | The trained model is a derivative work that contains copies | of the corpus used for training embedded in the model. If any | of the training code was GPL the output is now covered by | GPL. The music industry has already done most of the heavy | lifting here in terms of scope and nature of derived works, | and while IANAL I would not suggest that it looks good for | anyone using this tool if GPL code was in the training set. | GuB-42 wrote: | My guess is that it is, if we think of a machine learning | framework as a compiler and the model as compiled code. | Compiled GPL code is still GPL, that's the entire point. | | Anyways, GitHub is Microsoft, and Microsoft has really good | lawyers so I guess they did everything necessary to make sur | that you can use it the way they tell you so. The most | obvious solution would be to filter by LICENSE.txt and only | train the model with code under permissive licenses. | f38zf5vdt wrote: | There will almost certainly be cases where it copies exact | lines. When working with GPT2 I got whole chunks of news | articles. | akersten wrote: | Well, it probably _is_ explicitly copying at least some | subset of the source text - otherwise the code would be | syntactically invalid, no? | dekhn wrote: | Strictly speaking, you could train a model which does not | contain the original source text (just the underlying | language structure and work tokens), and generates ASCII | strings which are consistent with the underlying generative | model, that are also always valid code. I expect to see | code generator models that explicitly generate valid code | as part of their generalization capability. | gugagore wrote: | I can't say what's happening in GitHub Copilot, but it's | not necessarily true that the only way to produce | syntactically valid outputs is to take substrings of the | source text. It is possible to learn something | approximating a generative grammar. | | Take a look at https://karpathy.github.io/2015/05/21/rnn- | effectiveness/ | | At the same time, I would not be surprised if there are | outputs that _do_ correspond to the source training data. | 6gvONxR4sf7o wrote: | > you cannot copyright a "style" | | This line of thinking applies to the code generated by the | model, but not necessarily to the model itself, or the | training of it. | dekhn wrote: | Thanks- in retrospect, I shoudl have explicitly said "code | generated by the model". | not2b wrote: | But it actually is explicitly copying the text. That's how it | works. The training data are massive, and you will get long | strings of code that are pulled directly from that training | data. It isn't giving you just the style. It may be mashing | together several different code examples taking some text | from each. That's called "derivative work". | Iv wrote: | Google Books actually displays full pages of copyrighted | works Google did not license. It was considered legal. | | [1] https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Go | ogle,.... | f38zf5vdt wrote: | We all don't have Google resources. What if someone comes | after us individually because some model-generated code | is near identical to code in a GPL codebase? Where is the | liability here? | | edit: from https://copilot.github.com/ | | > What is my responsibility when I accept GitHub Copilot | suggestions? | | > You are responsible for the content you create with the | assistance of GitHub Copilot. We recommend that you | carefully test, review, and vet the code, as you would | with any code you write yourself. | | Well, that solves that question. | Kiro wrote: | No, that's not how it works. | | "[...] the vast majority of the code that it suggests is | uniquely generated and has never been seen before. We found | that about 0.1% of the time, the suggestion may contain | some snippets that are verbatim from the training set" | | https://copilot.github.com/#faqs | not2b wrote: | If that's the case (only 0.1%), the developers must have | done something that differs from other openai experiments | that suggest code sequences that I recall seeing, where | significant chunks of code from Stack Overflow or similar | sites were appearing in answers. | ipsum2 wrote: | So you're gambling on whether that the code that was | generated or copied. | visarga wrote: | use a bloom filter to skip/regenerate that 0.1% | zarzavat wrote: | Synthesising material from various sources isn't copyright | infringement, that's called _writing_. | | It's only infringement if the portion copied is significant | either absolutely or relatively. A line here or there of | the millions in the Linux kernel is okay. A couple of lines | of a haiku is not. Copyright is not leprosy. | dekhn wrote: | hmm, so let's think this through. | | Wouldn't that imply that a person who learned to code on | GPLv2 sources wrote writes some more code in that style | (including "long strings of code", some of which are | clearly not unique to GPL) is writing code that is "born | GPLv2"? | | I don't think it currently works that way. | [deleted] | cycomanic wrote: | IMO the closest case is probably the students suing turnitin a | number of years ago, which iParadigms (the turnitin maker) won | [1]. | | I think this is definitely a gray area and in some way | iParadigms winning (compared to all the cases decided in favour | of e.g. the music industry), shows the different yardsticks | being used for individuals and companies. | | I'm sure we will see more cases about this. | | [1] https://www.plagiarismtoday.com/2008/03/25/iparadigms- | wins-t... | 6gvONxR4sf7o wrote: | My guess would be that the model itself (and the training | process) could have different legal requirements compared to | the code it generates. The code generated by the model is | probably sufficiently transformative new work that wouldn't be | GPL (it's "fair use"). | | I suspect there could be issues on the training side, using | copyrighted data for training without any form of licensing. | Typically ML researchers have a pretty free-for-all attitude | towards 'if I can find data, I can train models on it.' | evgen wrote: | No, the code generated is what copyright law calls a | derivative work and you should go ask Robin Thicke and | Pharrell Williams exactly how much slack the courts give for | 'sufficiently transformative new work. | gugagore wrote: | My bet is that copyright law has not caught up with massive | machine learning models that partially encode the training | data, and that there will still be cases to set legal | precedent for machine learning models. | | Note also that it's not just a concern for copyright, but | also privacy. If the training data is private, but the | model can "recite" (reproduce) some of the input given an | appropriate query, then it's a matter of finding the right | adversarial inputs to reconstruct some training data. There | are many papers on this topic. | evgen wrote: | It is almost certainly the case that current IP law is | very unsettled when it comes to machine learning models | and mechanisms that encode a particular training set into | the output or mechanism for input transformation. What | should probably scare the shit out of people looking to | commercialize this sort of ML is that the most readily | available precedents for the courts to look at are from | the music industry, and some of the outcomes have truly | been wacky IMHO. The 'blurred lines' case is the one that | should keep tech lawyers up at night, because if | something like that gets applied to ML models the entire | industry is in for a world of pain. | 6gvONxR4sf7o wrote: | You're missing the fair use aspects. Check out this article | on fair use [0]. | | > In 1994, the U.S. Supreme Court reviewed a case involving | a rap group, 2 Live Crew, in the case _Campbell v. Acuff- | Rose Music_ , 510 U.S. 569 (1994)... It focused on one of | the four fair use factors, the purpose and character of the | use, and emphasized that the most important aspect of the | fair use analysis was whether the purpose and character of | the use was "transformative." | | It has some neat examples and explanation. | | [0] https://www.nolo.com/legal-encyclopedia/fair-use-what- | transf... | evgen wrote: | There are far more current precedents that apply here, | and they do not trend in Github's favor -- as I noted | previously, Williams v. Gaye (9th Cir. 2017) is going to | be very interesting in this case. I am sure several | people in Microsoft's legal department set parameters on | the model training and that they felt that they were | standing on solid ground, but I am also sure that there | are a few associate professors in various law schools | around the country who are salivating at the opportunity | to take a run against this and make a name for | themselves. | kp302 wrote: | Its not cause the License says nothing about training. I mean | every oss dev's brain would be under GPL then. | iwintermute wrote: | https://en.wikipedia.org/wiki/Clean_room_design | | There're definitely cases when devs avoid even looking at | implementation before creating their own | 542458 wrote: | IANAL, but my interpretation of the GitHub TOS section D4 would | give give GitHub the right to parse your code and/or make | copies regardless of what your license states. This is the same | reason the GitHub search index isn't GPL contaminated. | sreeramb93 wrote: | This is where microsoft's billion dollars went in OpenAI. Clever | marcodiego wrote: | Consider the following chain of events: - I write | GPL'ed code. - Someone uses this tool to develop a | proprietary solution. - I later prove that the tool | generated code that is equal to mine. | | Now the code of the proprietary solution must be GPL licensed! | Cool! | | How I'd defend myself? I'd only use such a tool if there are | guarantees of the licenses of the code it was trained on. Without | such guarantees, it is just too risky. | ska wrote: | Of course this is an continuation of things people have been | trying for decades at this point, rather than something | fundamentally new, but it brings up a point a colleague and I had | a decade ago on training something like this on large data sets - | namely that you are going to tend find common idioms rather than | nominally best ones. In many scenarios it may make little to no | difference, but clearly not all . It's likely going to gravitate | towards lowest-common-denominator solutions. | | One example of where this can be a problem is numerics - most | software developers don't understand it and routinely do | questionable things. I'm curious what effort the authors have put | in to mitigate this problem. | krzyk wrote: | I wonder why there is no example in java. It is one of the most | popular languages (drfinatelly more popular than ruby or Go, and | on par with Javascript and python). | dgb23 wrote: | The utility and quality of this will likely depend on language | use: | | https://madnight.github.io/githut/#/pull_requests/2021/1 | bovermyer wrote: | Oh this would screw with me so badly. | | A lot of the time, I'm thinking pretty deeply about the code I'm | writing, and as I'm writing code I'll also be thinking about how | it applies in context. | | Having some external agent inject new code into my editor would | shatter my thought flow, since I'd then have to grok whatever it | just spit out instead of continuing on with whatever thought I | was pursuing at the time. | karsinkk wrote: | I still haven't gotten my hands on the Beta yet, so I'm not sure | how it's going to be deployed; But does anyone know if Copilot is | going to be accessible through some online IDE (or) is it going | to be through an extension for VS Code/other editors? If it's the | latter, I hope the extension doesn't eat up all my CPU! | whateveracct wrote: | I prefer the wingman | | https://haskellwingman.dev/ | IceWreck wrote: | Works on MS's VSCode but not on GitHub's own Atom. More proof | that Atom is all bud dead | tushar1196 wrote: | Hey | __MatrixMan__ wrote: | Someone, somewhere, is already working on ways to make it inject | vulnerabilities into your project. | FunnyLookinHat wrote: | This seems to work really well in cases where you're just laying | down boilerplate. A few cherry-picked comments seem to suggest | that React components are an ideal use case - which makes sense, | that's a lot of munging and syntax to just render some strings. | | However, I find the process of writing these sorts of functions | cathartic and part of the process to get into zen-mode for | coding. I think I'd feel less joy in programming if all of this | was just done by glorified commenting and then code-review of the | robot. | | I like to think of coding in terms of athletic training, which | usually is comprised of difficult tasks that are interspersed | with lighter ones that keep you moving (but are giving you a bit | of a break). Training for soccer teams often involved lots of | sprinting and aerobic exercise - and in between those activities | we would do some stretching or jogging to keep our body moving. | These sorts of small functions (write a function to fetch a | resource, parse an input payload, etc.) are when my brain is | still moving but getting ready for the next difficult task. | StratusBen wrote: | Not to be confused with AWS Copilot - which has been an area of | focus for the AWS container services team: | https://aws.github.io/copilot-cli/ | fxtentacle wrote: | "In order to generate suggestions, GitHub Copilot transmits part | of the file you are editing to the service." | | Well, isn't GitHub part of Microsoft now? No wonder it has gained | telemetry... | | I'm a bit worried that this thing will lead to even more bugs | like the typical StackOverflow copy&paste which compiles fine, | runs OK, but completely doesn't understand the context and | thereby introduces subtle difficult to find issues. | | My personal take on autocomplete has always been that I only use | it so that I can use longAndDescriptiveFunctionNames. Apart from | that, if your source code is so verbose that you wish you didn't | have to type all of it, something else probably already went | wrong. | EugeneOZ wrote: | A toy. | hashingroll wrote: | > Convert comments to code. Write a comment describing the logic | you want, and let GitHub Copilot assemble the code for you. | | I don't know if Copilot does it already but I would love if there | was a tool that does exactly the opposite -- convert code to | comments and documentation. | birdyrooster wrote: | Does GitHub Copilot grant you a license to the code it generates? | How does it know you haven't just copied some proprietary code | which is not free? | ranguna wrote: | This looks amazing! I was going sign up for the preview and stop | immediately after reading the additional telemetry that is | scrapped from my IDE, Microsoft would basically be allowed to | steal my code and see it whenever they need. Including unintended | snippets like local (git ignored) secrets and any sensitive | information that it might catch without my "snippet by snippet" | approval and not way to ignore files (afaik). | | Until this is fixed, good luck but not thank you Microsoft. | adelarsq wrote: | Tabnine, It's you? | papito wrote: | Stack Overflow copt pasta on crack cocaine. | ayush--s wrote: | wow I'm going to have lots of opinions about this. | | 1. A lot of people on this thread are concerned about licensing | issues with GPL etc. I am sure Github will restrict the beta | until it figures out that stuff. | | 2. I wonder if eventually our corrections to the code suggested | by the model would be used to feedback to the model, and if | that'll lead to a differential pricing - If I let it see my code, | I get charged lesser. | | 3. I believe a mini-GPT-3 model is where it's at. GPT-3 (and | similar) models look to be to too big to run locally. I've been | using TabNine for past year or so & it gives me anywhere between | 5-10% productivity boost. But one of the main reasons why it | works so well is because it trains on my repo as well. TabNine is | based off GPT-2 from what i've heard. | | 4. prediction: Microsoft is probably going to milk GPT-3. Expect | a bumpy ride. | | 5. In all likeliness, this would be a great tool to make | developers productive, rather than take their jobs - at least at | levels that are more than just code-coolie. | | 6. Eventually all tasks with enough data around it will see | automation using AI. | vmception wrote: | Interviewer: use copilot to implement the most efficient sorting | algorithm | Graffur wrote: | Awesome - but I fear maintaining the code generated by it in the | future.. can the AI maintain it as well? | | I am looking forward to AI testing the programs I write though. | That would be awesome. | canada_dry wrote: | I'm assuming that - by design - it has a feed-back-loop that | allows it to tweak, learn and improve itself by feeding back the | choices people make vs its own recommendations. | onion2k wrote: | _GitHub Copilot may suggest old or deprecated uses of libraries | and languages_ | | This raises two questions. | | - is there a way (right now or planned for the future) for | library maintainers to mark suggestions to be removed from the | suggestions? I can foresee Copilot being used as a source of | 'truth' among less experienced developers, and getting people | turning up in the Issues or Discord asking why the suggestion | doesn't work might be a bit jarring if the maintainers have to | argue that "Github was wrong." | | - if a library is hosted on Github is there a way to mark some | examples as Copilot training data? Maybe by having a 'gh-copilot' | branch in the repo that represents high quality, known-good | examples of usage? | mcintyre1994 wrote: | I'd be pretty worried about this based on how many times an SO | question has new better answers months/years later. I find that | platform self-corrects reasonably well, with the newer better | answer ending up at the top, but no idea how that'd happen | here. | spoonjim wrote: | This seems great for noobs or dunces to get code compiling, but I | hope nobody with talent uses it. It would be like Hemingway using | Grammarly. | dennisy wrote: | The site mentions the system this is built on: Codex by Open AI. | | Has anyone seen anything about this system? Are others able to | build upon it? | fabiospampinato wrote: | I'd like to see something like that, but with knowledge about | every single file in the codebase, and running locally. | Aperocky wrote: | Like an IDE? | fabiospampinato wrote: | Yeah but smart. | adelarsq wrote: | There is Tabnine that can work like this | fabiospampinato wrote: | Last time I tried Tabnine it wasn't really of much use to me, | the top of the line GPT-3 is a much much bigger model, it | should be able to do much more intelligent things. | ranguna wrote: | But gpt 3 won't run locally, so no thank you. | cobrabyte wrote: | Yeah, running locally would be my preference. I get "Antitrust | (2001)" vibes from this, but that's the tinfoil hat side of me. | flohofwoe wrote: | Like this? | | https://visualstudio.microsoft.com/services/intellicode/ | | I bet it's the same people, trying to push their crap into | all sorts of successful products. | lostintangent wrote: | The IntelliCode and Copilot teams have been collaborating | closely together, since we want them to provide a "better | together" experience. However, the underlying tech isn't | the same. Copilot is powered by OpenAI Codex, and enables | rich code synthesis via a cloud service. Whereas | IntelliCode uses multiple local models, to enhance various | parts of the editor (e.g. prioritizing the completion list | based on your context, detecting "repeated edits" and | suggesting additional refactorings). | Vinnl wrote: | As far as I know that also runs remotely. | | Edit: looks like I remembered incorrectly: | https://docs.microsoft.com/en- | us/visualstudio/intellicode/ov... | | > No user-defined code is sent to Microsoft | wozer wrote: | Anyone knows what languages are supported besides those | mentioned? | stared wrote: | For a second I got afraid that it is Copilot by Kite, with their | infamous history (https://news.ycombinator.com/item?id=19018037). | simongauld wrote: | All your commits are belong to us | asimjalis wrote: | I thought it would be someone you can talk to. | monkeydust wrote: | As a business / product person I am naturally wondering how much | more productive this will make my engineering team, should I | overtime expect to reduction in costs, faster shipping times...or | will the benefit manifest itself in more reliable code...? | ranguna wrote: | Like any other similar questions: ask your team. They'll know | better than you or any random person on the Internet. | SebastianFish wrote: | What interests me most about the development of tools like this | is how it might go on to influence the evolution of programming | languages. The article that was posted on the CompCert verified | C-compiler for instance. What if machine learning could make the | cost of developing using more programming languages with stronger | guarantees (ie rust, coq, etc) easier? Using languages with more | internal checks could also help manage risk the the co-pilot gave | a buggy/insecure suggestion. | KaoruAoiShiho wrote: | RIP sublime text, I guess it's back to VS | quink wrote: | Great, a whole new way to automatically introduce bugs through | code duplication. | | From the examples: body: text=${text} | | `text` isn't properly encoded, what if it has a `=`? | rows, err := db.Query("SELECT category, COUNT(category), | AVG(value) FROM tasks GROUP BY category") if err != | nil { return nil, err } | defer rows.Close() | | shouldn't we want to cleanly `rows.Close()` even if there was an | error? float(value) | | where `value` is a currency. Doesn't Python have a Decimal class | for this? create_table :shipping_addresses do | |t| | | that's an auto-generated table, that one's debatable but for | starters a `zip` fields makes it American only. And doesn't the | customer have an e-mail address instead of a shipping address? | var date1InMillis = date1.getTime(); | | But what about the time-zone offset? | | I could go on, but literally the first five examples I looked at | are buggy in some way. | | Edit: const markers: { [language:string]: | CommentMarker } = { javascript: { start: '//', end: | ''}, | | Wow. | | Edit 2: function collaborators_map(json: any): | Map<string, Set<string>> { | | Not exactly buggy, but 8 lines of tedium. | | What about new Map(json.map(({name, | collaborators}) => [name, new Set(collaborators)])) | | instead? | | Edit 3: const images = | document.querySelectorAll('img'); for (let i = 0; i < | images.length; i++) { if | (!images[i].hasAttribute('alt')) { | | I mean, I get it, it's auto-generate code. Maybe in the future | they can narrow it down to auto-generating good code. | document.querySelectorAll('img:not([alt])').forEach( | | (or) img:not([alt]) { border: 1px solid red; } | | (or) eslint-plugin-jsx-a11y because that maybe | that img was rendered through react. | | (or) it should really be an `outline` because | we don't want this to reflow the page and we can. And maybe a | will-change. | clashmeifyoucan wrote: | one thing that caught my eye was the convert comments to code | feature. If you can use your voice to dictate comments then | combined with copilot it might just be possible to write code | without touching the keyboard at all! | | of course I guess copilot won't be perfectly accurate right now | or even maybe for a long time but it is interesting to imagine a | future where the programmer can think and get code written | without lifting a finger. | pow_ext wrote: | I curious about typing "const api_key" and see what the editor | adds hahaha | jarym wrote: | Wow. Please make this for Rust :D | | What this really displaces is StackOverflow (or some of its | users...) | mihaifm wrote: | I can see in their FAQ that there are plans to monetize this. So | they're building an ML model that feeds on the work done by | millions of people and then selling it? How is this even ethical? | Not to mention we'd be feeding the model while using it. Guess | this is another instance where we are becoming the product. | bootlooped wrote: | If I spent a lot of time reading open source repos on GitHub to | teach myself to code, and then went out and got a high-paying | job based on that knowledge, is that ethical? This seems | roughly analogous to what the machine is doing. | [deleted] | bruce343434 wrote: | Never learn a new skill again, let the computer do it for you. | NmAmDa wrote: | Seems interesting, but did they train their models on the open | source projects available on github? | ffggvv wrote: | wow surprised not one comment that's scared of this being the | step to automate away our jobs. | | maybe sooner than we think? | acid__ wrote: | It's one step closer, but still a good ways away. | | - you still need to understand the code that copilot is | writing, it just turns it from a recall/synthesis problem into | a recognition problem | | - most of the work above the level of a junior engineer isn't | about writing the actual code, it's the systems design, | architecture, communicating with external stakeholders, | addressing edge cases, tech debt management, etc. | dxbydt wrote: | How does this work in the context of leetcode/hackerrank | interviews ? Can I just use the copilot to get a 90%?skeleton of | the required solution and maybe fill in just the 10% smarts ? | Sr_developer wrote: | Since most of the code written anywhere is crap (the tool was | trained with "millions" of lines of code) I suspect it will | repeat all the same anti-patterns,bad-structured,ill-thought code | which fills Github. | jensensbutton wrote: | Interestingly enough I think Ops work is far more resilient to | automation than SWE work. | leotaku wrote: | > If the technical preview is successful, our plan is to build a | commercial version of GitHub Copilot in the future. | | This may be the first time that a proprietary coding tool offers | such a great value preposition that I am actually interested in | trying it out and potentially even paying for it. It's also kind | of scary that this will probably be extremely hard, if not | impossible, to create an FOSS version of this technology, just | because of the immense amount of computing power, and by | extension money, needed to create GPT3. | | I'm not that comfortable with the idea of a future where | proprietary AI-based solutions and libraries (e.g. automatic | testing libraries, which have been mentioned here a few times) | are so powerful that I'll be forced to use them if I don't want | to waste my time. | MathYouF wrote: | Says the person who likely owns a washing machine, sink | connected to plumbing, microwave, stove, lighters, clothes made | with a sewing machine, ect. ect. | | GPT-3 will take way less time to make a good substitute that | costs the power of compute than other historical time saving | technologies. Unlike other historic technologies, they pretty | much spell out exactly how to do it, and own no patents related | to its creation. I have trouble seeing the downside. | leotaku wrote: | I see your point, but aren't you making me into a bit of a | straw man? When did I say that I was some open-access Luddite | who won't use any technology that they can't build | themselves. I just like the current state of programming, | where I can easily build new and exciting things without | having to rely on a lot of proprietary libraries or tools. | bluefox wrote: | I don't understand fascination with bullshit machines. | | While they may be useful in propaganda, state or commercial, I'm | not sure why Microsoft GitHub would find it useful to generate | volumes of bullshit source code. | xrisk wrote: | How do you opt out your own GitHub repositories from being a part | of the training data? | ranguna wrote: | Probably by changing your license to not allow use in | commercial products. | smasher164 wrote: | I wonder how much of this is OpenAI-based vs program synthesis | techniques. | shmageggy wrote: | In the FAQ they state that it sometimes outputs non-running or | even malformed code, so it looks like fairly pure language | modeling with little to no program synthesis. | iou wrote: | This looks awesome! And I'd really like to try it out. | | 2 security thoughts that I couldn't find answers to: | | 1. how does the input fed into OpenAI codex filter out malicious | code pollution? or even benign but incorrect code pollution | (relevant research on stackoverflow for example - | https://stackoverflow.blog/2019/11/26/copying-code-from-stac...) | | 2. In an enterprise setting, what does the feedback loop look | like? How do you secure internal code that is being fed back to | the model? Does it use some localized model? HE, etc? | ollien wrote: | #2 is the big one for me. I'm hesitant to install this on a | work machine where our code could be sent elsewhere. | methehack wrote: | The gap between concept and working product is still very much in | the human's wheelhouse. This is an accelerator to snippets not | yet turned into a library and, in fact, a lot like a library in | terms of it's day to day utility. This will not end or even | change programming that much. The value I provide as a programmer | is not about copying and pasting snippets. It's something totally | separate different in kind from what copilot does. If it's | helpful to do what I already do, sure I'll use it. But it ain't | me, babe. | lmilcin wrote: | What about rights to the code that is created this way? | 29athrowaway wrote: | AI is on its way to replace the people that copy and paste code | from stackoverflow. Good riddance. | BigJono wrote: | Excellent. Now we just need tools to automatically shit out | convoluted JIRA cards, run them through this, and automatically | generate the PRs. | | Then we can run it overnight each night, knock back whatever | rubbish it generates before breakfast now that there's nobody | left to complain and schedule 6 hours of meetings to "fix" it, | and instead we have the whole day to just build it quickly and | properly the first time. | | We just increased productivity on the average enterprise | project by 50%. Good stuff. | 6gvONxR4sf7o wrote: | Plenty of discussion about the IP issues. It makes me want to | start adding a section in my LICENSE.txt that says it's not | eligible for use training commercial models. We'll likely end up | with a whole set of license choices for that. | | Although if a license can permit or prohibit use in training | commercial models, does that mean that the lack of permission | implies a prohibition on it? | darepublic wrote: | I knew this day was coming but it still stings. What I'm | interested in is making copilot the main pilot.. i.e. for web | development, why not just let this thing go on its own and have a | separate module that attempts to compile the code + observe the | differences in a web page functionality? No longer so crazy to | say that shit like that is on the horizon. Then the middle | managers can have their dream come true, they are the true | benefactors! They can 'code' with copilot, and just endlessly | iterate with it while figuring out what they want until they get | a result they're happy with. | lifeisstillgood wrote: | I always bang on about "Software literacy" but I do wonder how i | would deal with this if it was suggesting text for me while I was | writing - emails, reports, novels. | | I suspect that for drudgery or work stuff I would happily take | some help with typing, but I am not sure (beta access please!) if | I would want if for my novel, or my sales copy. | | I am (optimistically) hoping that my novel has my _voice_ - my | unique fingerprint of tone and vocab. | | And I wonder if my software has similar tone. A much more | restricted syntax of course, but my choice to use meta | programming here, an API design there. I may be over thinking it. | turbinerneiter wrote: | Are they making a proprietary tool based on the data provided by | FOSS projects? | dannyT1 wrote: | I wonder how or if it will impact publicly available code and | even open source on a larger scale | EastSmith wrote: | "Seriously Copilot, cover all these files with tests." | a-r-t wrote: | One of the examples they provide on copilot.github.com shows a | unit test for strip_suffix function. It does not test for a file | name without a suffix, which the function would fail (it removes | the last character instead): def | strip_suffix(filename): """ Removes the | suffix from a filename """ return | filename[:filename.rfind('.')] import unittest | def test_strip_suffix(): """ Tests for | the strip_suffix function """ assert | strip_suffix('notes.txt') == 'notes' assert | strip_suffix('notes.txt.gz') == 'notes.txt' | zild3d wrote: | Great, you got 2 assertions for free, which lowers some | friction of writing tests. You should still be thinking and be | "the pilot". When you start writing additional test cases, it | will help you out with those too | astrobe_ wrote: | One should keep in mind that it is just "copy-paste" on | steroids (ok, maybe a gallon of steroids), but users should be | cautious about the _false sense of irresponsibility_. | | Because just like when they copy-paste the top answer on SO, at | the end of the day they are responsible for the code they ship. | krzyk wrote: | It's a pity it is not (yet, I hope) a plugin to popular IDEs like | Intellij IDEA etc. and works only in VS Code. | ipsum2 wrote: | TabNine is similar and works across multiple IDEs. No | affiliation, thought their product was neat: | https://www.tabnine.com/ | karmasimida wrote: | I like the example LOL. | | How can one signed up? This could make one programmer an army. | sqs wrote: | AI to write code is cool, but you know what'd be even cooler? | | AI for maintaining, upgrading, improving, and fixing code. | | After all, devs spend 80%+ of their time doing those things and | they're WAY more painful than writing code imo. | ranguna wrote: | Snyk code is trying to do something similar. It basically | uploads your code and compares it to knows vulnerability | patterns, it also gives you examples on how to fix those | vulnerabilities based on open source project's pull requests. | booleandilemma wrote: | Please, replace me faster :) | arthurcolle wrote: | The Rails migration example doesn't follow the rails convention | with prepended datetime :p | thesquib wrote: | Is this automation of the Search Stack overflow, copy, and paste | workflow? | antpls wrote: | Another way to look at it : if an "AI" can predict what you would | code next, it means your program is probably not that innovative, | and was already created somewhere. | bootlooped wrote: | Most of the code I write is not particularly innovative or | novel. I'm just trying to get the job done most of the time. | adamnemecek wrote: | Is anyone here actually using this? | McGlockenshire wrote: | Well, it's a brand new product, so ... no? | adamnemecek wrote: | I imagine there was a private beta. | ranguna wrote: | According to the comments here, people have been using this for | a couple of weeks now. | devinl wrote: | Looks exciting! It is kind of disappointing the AI generated main | example on their home page has what appears to be a url encoding | bug in it though (in text=${text}, text should be url encoded | before being passed to fetch). | shadowgovt wrote: | Brilliant. | | I remember over a decade ago seeing a grad student project with a | very straightforward and very clever idea: extending JavaDocs | based on code snippets of actual use (to address the common | pattern problem in Java code that you often get an instance of an | object not by direct construction, but by calling a factory | function or singleton getter somewhere). Kicking myself that I | didn't see this day coming. | wpietri wrote: | Does this solve the right problem? Getting some code on the page | has rarely been the expensive part of building something. Indeed, | some long-ago experience with code generators suggests that | making it easy to create code makes many problems worse down the | line. | iwintermute wrote: | It's solving 'mechanical' problem. The optimistic twist on this | helper is that it just raises the bar - human programmer should | better be more useful than 'brainless' code generator - meaning | not only being able to write a loop or solve leetcode task, but | also understand context and what he's trying to solve for. | | As you say typing code is not a bottleneck for problem solving | nyghtly wrote: | This is a very good point. | koyote wrote: | I agree, I feel like it might be useful for whichever | programmers regularly have to search Stackoverflow and then | copy paste code snippets. | | Then I feel like useful code produced by this tool will have to | be treated in the exact same way as a rigorous code review: | going through every part of the logic and ensuring it is | correct. This seems like just as much or even more work than | writing it yourself (if it is written in an unfamiliar way, you | might need more time to wrap your head around it). | spec-obs wrote: | This is my concern. This could end up generating great swathes | of code that no one understands, so that when it breaks it | takes much longer to fix. | kburman wrote: | How do we know the code is working and doesn't have any bug? | ranguna wrote: | You review it. I guess that's why it's called copilot, for now. | joshmarlow wrote: | I suppose this is a whole new argument in favor of good code | commenting - to train/share context with your future tooling. | leventov wrote: | This is really the same argument as it used to be: help | intelligences (used to be only human, now artificial) to find | bugs by matching text with code. | drcode wrote: | We could get away with more user-friendly programming languages | over the years because Moore's law kept giving us more | opportunities to sacrifice raw performance for more developer- | friendly tools. | | But I'm worried these kinds of AI-assist tools will lead to "code | spam" that may increase developer productivity even more, yet we | no longer have Moore's law to absorb the additional inefficiency | in performance these tools may introduce. | didibus wrote: | I wonder if this breaches any of the open source licenses or | copyright? | | To some extent, it seems like this could suggest code chunks | found somewhere else verbatim, which sounds like a copyright | issue, but I also don't know if open-source licenses inherently | allow you to train on their code in the first place? | amelius wrote: | I would be more in favor of new languages which require less | boilerplate. | the_laka wrote: | I wonder if this can work with any editor out of the box, or it's | just going to be VS Code. | av_conk wrote: | From the FAQ: "Not yet. For now, we're focused on delivering | the best experience in Visual Studio Code only." | | I do hope they feature others at a later date, especially since | they are planning to develop a commercial version. | gumby wrote: | Won't be mainstream until it supports C++20 under Emacs. | | (Yes that's an HN-frowned-upon joke comment, but that is my dev | environment) | [deleted] | fartcannon wrote: | This technology should be available to everyone whose work | contributed to it's development to use as they see fit. Free of | Microsoft's tendrils. | | The absolute gall of Microsoft claiming fair use to gate keep | knowledge of millions of minds... | mikewarot wrote: | I've signed up to be a Guinea Pig, I've never pair programmed, | and my primary language is Pascal, and I'm old... this ought to | be a hoot. | [deleted] | pharmaz0ne wrote: | Would it work with remote files? Because Kite would not. | theden wrote: | Calling it now, there will be a "Copilot considered harmful" | post. | | If you need to go through the suggested code to ensure it's | correct, you may as well write it yourself? | | If you glance at it and it looks about right, you can potentially | overlook bugs or edge cases, you'll lose confidence in your own | code since you didn't properly conceptualise it yourself. | | Potentially for newer developers it robs them of active | experience of writing code. | | Much like learning an instrument, improvisation, or say physics, | a lot of people learn by doing it, even if it's grunt work. IMO | this is necessary for any professional. | | Maybe it will be seen as a crutch, maybe I'm getting old? I have | tons of code snippets, but it's usually stuff I've written and | understood and battle tested, even if it was initially sourced | from SO. Having it in the text editor and appear out of nowhere | with no context seems like it'd need some adjustment in | apprehension. | | Edit: I should have been clear, I'm not against others using | Copilot and will try it out myself. I can see it being useful in | replacing one-line libs like in nodejs, i.e., copying a useful | well-known and needed snippet vs installing yet another lib that | could be a sec issue. | | Also the industry is the real gatekeeper--we have tools that | don't require us to repeat prior-art, yet have to go through | hurdles of leetcode-style interviews for a job. Maybe in the | future the hardest part of being a (AI-driven) developer will be | getting a job? | aj3 wrote: | Honestly, this attitude (dismissing the feature even without | trying it out) comes across as insecurity and gatekeeping. | jpttsn wrote: | Is "you're gatekeeping!" the new "first comment!"? I fail to | see the information gain from adding this message to every | thread. | auggierose wrote: | Nope, it's called wisdom. Not everyone on here is under 12. | joshbert wrote: | There's that insecurity thing he was talking about. | auggierose wrote: | Not sure what is confident about doing pair programming | with an AI, josh. | shadowgovt wrote: | I'm not quite old enough to remember: when high level languages | like C gained wide adoption, how much pushback was there from | the philosophy that if you're not writing it in assembly, | you're not really writing code? | Nicholas_C wrote: | As a hobbyist this is going to save me so much time. Little | things like googling how to read CSVs in python for the 20th | time add up and I think this should help solve that. | aerovistae wrote: | > Potentially for newer developers it robs them of active | experience of writing code. | | And for those with experience, this will be obvious when | reviewing their code. There's only two possibilities -- either | copilot will get so good that it won't matter, or code written | by copilot will have obvious tells and when someone is over- | relying on it to cover up for a lack of knowledge, that will be | very clear from repetition of the same sorts of mistakes. | dgb23 wrote: | I honestly think this is solving a real problem with commonly | used languages and their lack of syntax abstraction and | expressiveness. | | I can imagine this being very useful in helping to type out | what I consider to be ,,mechanical noise": Things that you have | to type out to satisfy an expression rather than to convey | semantics. | | A good example of how this type of noise manifests: | | Observe two programmers, both being similarly strong in terms | of many concepts except for mechanical expertise. One uses the | editor as an extension of their body, it's beautiful to watch, | the other stumbles awkwardly over the code until it's finished. | You can observe the latter in programmers who are very smart | and productive, but they either didn't train their mechanics | deliberately or maybe they lack that kind of baseline eye hand | coordination. | shadowgovt wrote: | It is even entirely possible that this approach hits a | middle-ground that serves the corporate software-development | space better than highly-flexible languages. | | The difficulty with high flexibility is that the expressions | become very domain-specific very quickly, creating the | challenge of learning the new abstractions. So one isn't just | a LISP developer, one knows how to write in the specific | forest of macros that have been built up around one specific | problem domain. The end result is code that means nothing to | a reader who doesn't have a dense forest of macro definitions | in their brain (at least in this era, their IDE will likely | helpfully pull up the macro definitions with a mouse-over or | cursor-over gesture!). | | Contrast with this approach, where the complexity of | abstraction is being baked into the lower-flexibility | language. The code is less dense, and that's a tradeoff... | But grab any 10 developers off the street with experience in | that language and have them read it and 8 of them will likely | be able to tell you with some accuracy what the code is | doing. Not a trick I've seen possible with even very | experienced LISP developers on a codebase they've never seen | before. | | ... and, of course, being able to grab a random 10 developers | off the street and have 8 of them up-to-speed in no time at | all is crack cocaine to big businesses with large and complex | systems maintained by dozens, hundreds, or thousands of | people. | dgb23 wrote: | I've seen macro heavy code that is very semantic and | declarative. It's a powerful tool, so it is natural that | people need to learn and fail until they use it well. | pnt12 wrote: | Couldn't disagree more: | | Smart programmers are are not coding thousands of lines of | code every day. There's so much more to in software | engineering other than coding, that's why senior engineers | spend less time writing code than juniors. | | If a slow typer is using a auto-completer, he's not learning | to type faster. | | If auto complete fails, then the slow programmer will need to | invalidate the code first, and then type it anyway. | | All in all - maybe it's impressive AI research project, but I | don't see it as a useful product. | dgb23 wrote: | Maybe I reacted optimistically because your objections are | reasonable and relatable. But I'm still very interested in | how this will actually play out. I want to see and feel it | in action. | chongli wrote: | _I honestly think this is solving a real problem with | commonly used languages and their lack of syntax abstraction | and expressiveness._ | | Do you think there's a risk that a tool like this could lead | to an explosion of the size of codebases written in these | languages? It's great that programmers can be freed from the | need to write boilerplate but I fear that burden will shift | to the need to read it. | dgb23 wrote: | Damn that's the best objection I read so far. Writing | readable code is already hard and something I aspire to. | brown9-2 wrote: | All of the same can be said for copy-and-pasting code you find | in a tutorial in Google search results or in a Stack Overflow | answer. This just seems to be automating that process even | further. | theden wrote: | Those extra steps can be valuable, since you'll have to work | to even find the right code to copy/paste, and the context | which it's in can teach you something. | | Even something as simple as copying from the docs, it's | usually a good place to signal deprecation, use-case | applicability, API updates etc. you lose all that with the | automation. | | Oftentimes there's also discussion around a solution, and in | many ways can swing one's decision on whether to use the code | or not. | manquer wrote: | Think of it as a junior dev working under you and doing the | grunt work of typing in your ideas. Sometimes he can | StackOverflow a better snippet than you can write on your own, | you will probably learn a bit from it, but it won't surprise | you. | | It is no different from a code review of another perhaps junior | dev and only doing adding finishing touches. | | There is plenty of boilerplate you have to write, Intellisense/ | Auotfill only goes so far, this is next step in the evolution. | Sure it is not perfect but if i can express my ideas faster, | why not. | | Also It is a very probably poor tool for new devs, they won't | know that suggestion maybe not the best and probably won't | ignore it when it is wrong as they won't know any better. | asdfman123 wrote: | Most people on HN will probably be fine for a while. This | innovation though, once properly developed, could completely | screw over anyone wanting to enter programming. | | Code completion might be the new "junior dev." | manquer wrote: | All the computer scientists[1] at one point considered | software developers and IT in the same light as higher | level tooling evolved . | | While sure purist view is not wrong that average quality of | outcome has dropped since 70s-80s, the quantum of | throughput meant that impact has been positive and immense. | | Similarly I am expecting this kind of tooling would open up | to more types of new developers. | | [1] all the mathematicans thought similarly perhaps during | 50s and 60s. | crazygringo wrote: | > _If you need to go through the suggested code to ensure it 's | correct, you may as well write it yourself?_ | | Not really. People are generally _far_ faster at reading | something and evaluating whether it 's correct, than at writing | something. In the same way it's faster to read a book than to | write one. | | Not to mention the time it takes typing, fixing typos, etc. | | So this could genuinely be a huge timesaver if it helpful | enough of the time. | layla5alive wrote: | I completely disagree with you. Reading code for correctness | is difficult and not something most people do well at all. | Reading code and reading for correctness are not the same, | and most developers can write code a lot faster than they can | verify it. | crazygringo wrote: | I guess we just disagree? | | Honestly I don't even see how that's possible. Writing | code, you're thinking about all the different ways to do | it, eliminating the ones that won't work, evaluating the | pros and cons of the ones that seem like they'll work, you | start writing one and then realize it actually won't work, | then start writing it a different way, try to decide what | the best approach will be to make sure you're not | committing an off-by-one error, and so on... | | Whereas when you're reading code for correctness, you're | just following the logic that's already there. If it works, | it works. How could it possibly take longer than the whole | creative process of coming up with it...? | | Sure, maybe most people don't read code for correctness | well. But then the code they write is surely even _worse_. | amildie wrote: | >Whereas when you're reading code for correctness, you're | just following the logic that's already there. If it | works, it works. How could it possibly take longer than | the whole creative process of coming up with it...? | | That's exactly the problem. If you "just follow the | logic" you can miss important details or edge cases that | you would be forced to deal with by coding it yourself. | | I wouldn't mind using something like this for mundane | tasks, but I would be very careful with these tools while | developing high performance code intended to run on | specific hardware. | crazygringo wrote: | It's funny, I guess I'm just the opposite. | | If I'm reading code, I can give 100% of my attention to | the logic and details and edge cases, so I'm _more_ | likely to pick them up. | | While as I'm writing, I'm busy doing all of the stuff | that writing code involves, so I'm more distracted and | more likely to make mistakes. | | This gets proved to me time and time again when I run | something for the first time and have to debug it. I look | at the offending line, and think -- how could I have made | a mistake so obvious that it's immediately apparent? | Well, because I was busy/distracted thinking of 20 | different things while writing it. But it's immediately | obvious when _reading_ it, because it has my full | attention now. | jackbrookes wrote: | Sometimes you don't quite know how to implement something, | without thinking about it for a while. All of us would a | lot of the time search StackOverflow for the solution to a | simple problem, e.g. | | "recursively list all the files in a directory C#" | | https://stackoverflow.com/questions/929276/how-to- | recursivel... | | I imagine an AI copilot could streamline this, instead of | searching, reading and verifying, copy pasting, and | changing the variable names to my needs, I could now just | type the method name, arguments, and documentation and it | would similarly fill out the code for me. Then I have to | check it (as I normally would). | kajaktum wrote: | >most developers can write code a lot faster than they can | verify it | | what? so people just write code and never read it back? | layla5alive wrote: | Verifying code and reading it aren't the same thing. And | yes, most developers don't verify their code as carefully | as they should. But also, there are blind spots to | verifying your own code because brains take shortcuts. At | the same time, there are difficulties verifying other | people's code because of different shortcuts brains take. | | There was a simple function in java standard library | which was wrong for years because of this phenomenon. | | https://dev.to/matheusgomes062/a-bug-was-found-in-java- | after... | dgb23 wrote: | That's what a REPL and automated tests are for. | layla5alive wrote: | Automated tests find all bugs and replace code review? | I'd love some of those drugs! | dgb23 wrote: | No, of course not. I meant understanding what you write, | while you write it. Also to add a bit of nuance: I | typically spend much more time reading and thinking | rather than writing. But I read REPL output, logs, test | output just as much, sometimes more than the actual code. | adkadskhj wrote: | I'd say it's more nuanced. Reading code properly _is_ | writing code. Ie i have to work through the logic as if i'm | writing it, which is effectively writing it in my head, | before i know if that's what i believe to be optimal. | | I can _just_ read the code of course, and understand what | it does - but just reading isn't analyzing it to the degree | you do when you review/write the code. In that level of | analysis you're looking for edge cases, bugs, etc. Reasons | you'd write it differently. Which i suspect is functionally | similar, if not identical, to writing it. | layla5alive wrote: | It's pretty much this, but it's harder because if someone | else wrote it, there's a level of indirection between how | you would have written it and how they did which tends to | need a bit of extra mental resolving/processing for | correctness. | jdlshore wrote: | I'm going to pile on the disagree train. My experience is | that developers find (other people's) code much harder to | read. I suspect this tool will lead to code with subtle | problems because people will skim it, shrug "eh, looks about | right," and move on. | | _Edit:_ In fact, people in this thread are finding _exactly_ | those problems in the example code, which you would assume | had been checked fairly carefully. | theden wrote: | idk I always find writing code easier than reading code. Perl | is a fantastic example of this. | j2kun wrote: | I imagine, for a while the complexity that Copilot can handle | is limited to what most people would pull from stackexchange | anyway. And if it helps autocomplete documentation and provide | automatically generated (more descriptive) function/variable | names, it will probably be a net positive for those limited use | cases. | | That said, I can't wait to read the first postmortem where | someone deployed code generated via Copilot that has a bug. I | just hope it's not on a rocketship or missile guidance system. | maxwells-daemon wrote: | I'd rate myself as "above-average receptive" to ML-based tooling, | but after trying two "AI autocomplete" tools (Kite and TabNine) | I've decided it's not for me. The suggestions were usually good, | but I found having complex, nondeterministic IDE commands pretty | unsettling. | dilap wrote: | Could actually make you better at producing robust code, | though. It's somehow always easier to spot someone else's | mistakes than your own. | | Or to put it another way: you _know_ you can 't trust the AI- | generated code w/o convincing yourself it's correct. You | _think_ you can trust the code you wrote yourself, but you 're | probably wrong! :-) | Communitivity wrote: | I wonder if by using Github Copilot you are training Github | Copilot to code better. I don't see this as replacing a | programmer, but I could see it as an advanced, possibly semi- | automatic refactoring tool. This could allow a less skilled | programmer to be more productive and produce better code, making | them more valuable. Also, licensing was mentioned. I would not | want Github Copilot training on private repos. | youerbt wrote: | I wonder if this kind of technology can push industry into more | advanced languages. If a programmer can restrain the space of | available programs more, it should aid the tool to give even | better results. | ahofmann wrote: | How does this compare to tabnine? | Linked_Liszt wrote: | I haven't kept up with the adversarial ML field recently, but I | wonder how vulnerable these models are to adversarial attacks. | | - Could someone deliberately publish poor code to reduce the | overall performance of the model? | | - Could someone target a specific use case or trigger word by | publishing deliberately poor code under similar function | definitions? | abhijitr wrote: | Also: what happens when a nontrivial portion of public code out | there is ML-generated? How will it deal with feedback effects? | patwolf wrote: | I wonder if there's any potential for Copilot to suggest | malicious code because it's been trained on an open source | projects containing intentionally malicious code. | dgb23 wrote: | Since it is _per line_ I highly doubt it. I think of it as | intellisense+. You select suggestions that you would have | written anyway. | napolux wrote: | or broken code :D | [deleted] | ukoki wrote: | The averageRuntimeInSeconds example does not check for | division by zero so it creates broken code at least 20% of | the time based on the examples on the homepage :) | napolux wrote: | nothing more than what I was expecting :D | akersten wrote: | Maybe not malicious per se, but certainly I'd be concerned | about seemingly-correct but actually-wrong code being | suggested. Considering how often the top StackOverflow answer | is slightly wrong or how often antipatterns crop up across | various projects, I'm sure the training data is nowhere near | "perfect code" - implying the output cannot be perfect either. | inimino wrote: | I'm amazed to see how positive the overall response is to this | idea. Almost as if programmers think that writing programs is the | worst part of the job and ready to be automated away. | | As someone more aligned with the Dijkstra perspective, this seems | to me like one of the single worst ideas I've ever seen in this | domain. | | We already have IDEs and other tools leading to an increase in | boilerplate and the acceptance of it because they make it easier | to manage. I can only imagine what kinds of codebases a product | like this could lead to. Someone will produce 5000 lines of code | in a day for a new feature, but only read 2000 of those lines. | Folks that still expect you to only check in code you understand | and can explain will become a minority. | | I wonder how long it will be until someone sets up the rest of | the feedback loop and starts putting up github projects made of | nothing but code from this tool, and it can start to feed on | itself. | | Cargo-cult programming has always been a problem, but now we're | explicitly building tools for it. | asadlionpk wrote: | If this results in more overall programmers or enabling | existing programmers to make products quicker than before, it's | a win! | | Most codebases already majorly contain unread code; the | libraries (node_modules, etc). I am sure we can figure out a | pattern to separate human vs machine code in similar way. | | If the code you are about to write is already written by | someone else on the internet, that's probably not the most | innovative part of your codebase anyway, so why waste time? | anyonecancode wrote: | > I wonder how long it will be until someone sets up the rest | of the feedback loop and starts putting up github projects made | of nothing but code from this tool, and it can start to feed on | itself. | | This is my actual hoped-for endgame for the ad based internet. | At some point Twitter, FB, etc will be exclusively populated by | bots that post ads, and bots that simulate engagement with | those ads to drive up perceived value of advertising. They'll | use AI to post comments that are really just ads or | inflammatory keyword strings to drive further "engagement." The | tech companies will rake in billions and billions of dollars in | ad revenue, we'll tax all of it and use it to create flying | cars, high speed rail, and an ad-free internet closed off to | all non-humans. Occasionally a brave ML researcher may venture | out into the "internet" to take field notes on the evolution of | the adbot and spambot ecosystem. | jokethrowaway wrote: | The most unlikely thing you mentioned is that we will be able | to tax huge corporations. | TeMPOraL wrote: | You jest, but the parent is close to what I believe is a | possible scenario - the one Nick Bostrom calls "a | Disneyland with no children". | | No tax, no flying cars, eventually not even humans around - | just AI-driven companies endlessly trading with each other, | in a fully-automated, self-contained, circular economy, | from which sentient beings were optimized away. | nindalf wrote: | I invite you to try automating this and let us know what | happens. Try creating, let's say 1000 accounts and try liking | or posting and see what happens. I've seen that system at | work and doubt you'd get very far. | | More than that, you misunderstand how advertisers prioritise | their money. They pay for outcomes. If they notice that over | the past couple of months they've been receiving mostly bot | traffic, they stop advertising. Not everyone all at once, but | enough that revenue begins to decline. An ad based business | that cares about the long term will do it's best to weed out | the inauthentic engagement. | swiftcoder wrote: | > More than that, you misunderstand how advertisers | prioritise their money. They pay for outcomes. | | How they verify those outcomes, however, gets interesting. | See, for example, College Humour going under due to | inflated engagement metrics fed to them by Facebook video. | heavyset_go wrote: | > _I invite you to try automating this and let us know what | happens. Try creating, let's say 1000 accounts and try | liking or posting and see what happens. I've seen that | system at work and doubt you'd get very far._ | | Yes, naive attempts at manipulation will be detected these | days on big platforms. 5+ years ago such naive attempts | were successful, though. A few years ago I made a proof of | concept to show how easy it is to make new Reddit accounts | and automate them, and had registered hundreds of accounts. | Those logins still work even though Reddit has cracked down | on naive automation attempts. | | Today, that's why many firms buy real users' accounts. | They'll hire people to manually login to the accounts and | post. There's also the perpetual cat and mouse game between | bot creators and platform owners, and the platform owners | who benefit from the appearance of increased growth and | engagement that's actually just bot activity. | augustk wrote: | Indeed, using tools to manage complexity tends to make the | complexity acceptable and leads to more complexity. | s_dev wrote: | Coding is inherently difficult -- any tools even as basic as | color highlighting/spell checking massively help understanding | code in front of you. There isn't a hope this can replace any | programmers but instead aid their workflow. I great example is | simply refactoring code with SOLID after building a feature or | fixing a bug -- a lot of this can be easily automated. Having a | machine suggest and a human accept is a worthy trade off. | Another similar example is the Google bot that presents search | suggestions for you. | | I don't think your concerns are well grounded. | bluetwo wrote: | All I can think of is how many times I've grabbed a code | example from StackOverflow only to discover it had some obvious | bug in it. The answer is many, many times. | Florin_Andrei wrote: | In the (now not very) long run, programming was a job meant for | computers, anyway. The future will look back at "programming" | the way we now look at Charles Dickens characters toiling in | soot-filled factories. It's not what people are best at, and it | looks like soon there will be better ways to accomplish this | job. | vanusa wrote: | _Cargo-cult programming has always been a problem, but now we | 're explicitly building tools for it._ | | I get what you're saying, but I'm not worried. At the end of | the day, the programmer has to understand the code they're | submitting, both the fine grain and the holistic context. If | they don't know how to, or can't be bothered at least _curate_ | the suggestions the tool is making... then your organization | has much bigger problems than can be helped by reading a | Dijkstra paper or two. | staticassertion wrote: | > Almost as if programmers think that writing programs is the | worst part of the job and ready to be automated away. | | _writing_ the programs is definitely boring garbage work. | Typing is so slow and annoying - hence autocomplete being a | standard tool. This is, to me, just fancy autocomplete. | | > to an increase in boilerplate and the acceptance of it | because they make it easier to manage. | | Boilerplate optimizes for the right things - code that's easier | to read and analyze, but takes longer to write. IDEs and tools | like this cut the cost of writing, giving us easier to read | code and easier to analyze code for free. | | IDEs have supported templates forever. I never write out a full | unit test, I type `tmod` and `tfn` and then fill in the blanks. | This is basically the same thing to me. | | > Folks that still expect you to only check in code you | understand and can explain will become a minority. | | This isn't true at all. Having used TabNine I don't have it | write code for me, it just autocompletes code that's already in | my head, give or take some symbol names maybe being different. | | All this is is a fancy autocomplete with a really cool demo. | reader_mode wrote: | >Boilerplate optimizes for the right things - code that's | easier to read and analyze, but takes longer to write. | | This is wrong and the same retarded logic Java used to defend | not introducing var and similar features for ages. | Boilerplate is usually noise around the actual logic - it's a | result of limited abstractions. When you're repeating same | code over and over you raise that segment to a separate | concept, that's how abstraction and high level programming | works - it increases readability and maintainability. Being | easier to type has nothing to do with it. | staticassertion wrote: | Thanks for being the person who I knew would try to make | this about Java. I don't care about Java, it was a trivial | example. | | The rest of your post doesn't really have to do with mine. | Yeah, you can cut down on boilerplate with changes to | languages... duh. But in terms of conveying context there's | always a tradeoff of explicit vs implicit, and one of those | costs is taking the time to actually turn your mental model | into a written implementation - this eases that burden. | | As I said, it's a fancy autocomplete. | reader_mode wrote: | >But in terms of conveying context there's always a | tradeoff of explicit vs implicit | | Exactly - if a tool let's you write it explicitly too | easily you're making that the default, and it ignores the | readability/maintainability side of the tradeoff. | | Maybe it gets good enough to recognise when things can be | factored out for better readability as well. But in my | experience code generators rarely result in maintainable | code. | UnFleshedOne wrote: | Boilerplate is only easier to read and analyze if you can be | sure it is consistent, so you can tune it out. Usually | though, there is this one getter method that is not quite | like the others and you literally will not see the difference | until it bites you. | | We'll need more IDE enhancements, to highlight interesting | pieces and desaturate standard boilerplate... | staticassertion wrote: | When I think of boilerplate I think of context that is made | explicit. Things like type annotations, longer variable | names, the lifetime or attributes of some class or data | etc. These things are extremely helpful for a number of | things - they convey context from writer to readers, they | aid in proving the code correct, and they can make code | faster. | | The context almost always exists in the writers head. We | all have a specification of our program based on our | expectations, and we type out code to turn that model into | an implementation. We only spend so much time conveying | that model though - most of us don't write formal proofs, | but many of us will write out type annotations or doc | comments. | | The _cost_ is usually as simple as expressing and typing | out the model in our head as code. Languages that are | famous for boilerplate, like Java, enforce this - and it | makes writing Java slower, but can also make Java code | quite explicit (I 'm sure someone will respond talking | about confusing Java code, that's not the point). | | Reducing the cost of conveying context from writer to | reader means we can convey more of that context in more | places. That's a huge win, in my opinion, because I've | personally found that so much implicit context gets lost | over time, but it can be hard to always take the time to | convey it. | | Think about how many programs you've read with single | character variable names, or no type annotations, or no | comments. The more of that we can fix, the better, imo. | | Tools like this do that. TabNine autocompletes full method | type signatures for me in rust, meaning that the cost of | actually writing out the types is gone. That's one less | cost to pay for having clearer, faster code. | fossuser wrote: | People have made some variation of this argument since the move | from writing binary to writing assembly. | | With every new layer of abstraction there's more power. | | The long term benefit of a tool that can do this well far | exceeds what humans can do by hand, but that may not be true in | the very short term. | | Either way, I suspect the benefits to be big. | goodpoint wrote: | Not at all: this tool does not encourage more powerful | abstractions, but the very opposite. | | It makes boilerplate cheaper to churn out. | uticus wrote: | I disagree with the comparison. This isn't abstraction, it is | syntax completion. As if you typed the first four bytes and | GitHub (mostly correctly it must be mentioned!) completed the | remaining. | | Unlike an additional abstraction layer, the readibility is | not increased. | fossuser wrote: | The end goal of this based on what I've seen from Open AI | examples and related beta projects is more high level | language -> code. | | "Write a standard sign up page" -> Generated HTML | | "Write a unit test to test for X" -> Unit Test. | | It's more than just syntax completion - I'd argue that's | the beginning of a new layer of abstraction similar to | previous new abstraction layers. The demo on their main | page is more than syntax completion - it writes a method to | check for positive sentiment automatically using a web | service based on standard english. | | This is extremely powerful and it's still super early. | | I saw one example that converted english phrases into bash | commands, "Search all text files for the word X" -> the | correct grep command. | | That is a big deal for giving massive leverage to people | writing software and using tools. We'll be able to learn | way faster with that kind of AI assisted feedback loop. | | Similarly to compilers, the end result can also be better | than what humans can do eventually because the AI can | optimize things humans can't easily, by training for | performance. Often the optimal layout can be weird. | [deleted] | _fat_santa wrote: | The way I see it, the tool is only as good as the programmer | using it. This tool will generate the individual code blocks | for you, but you still have to understand how to put it all | together to deliver a working app. | | Sure there will be some codebases out there that are plastered | together using this tool, but when it comes to delivering | software that is well written, performant and maintainable over | the course of several years, you're still going to need a lot | of skilled engineers to pull that off. | hbosch wrote: | If we didn't need programmers to do the programming, that would | be a perfect world. | slver wrote: | This is inevitably leading to the moment where we don't need | humans, but I'm fine with that. | TheRealPomax wrote: | This ignores the ladder of abstraction, which you apparetly | need to be reminded of exists. Not all programmers need to work | at the same level of abstraction: some programmers need to | write original code all day long because their subject field is | close to the metal and there are no premade solutions. For | those folks, the idea of copy-pasting from SO is pretty | ridiculous, although SO _might_ have questions and answers that | allow them to write their own code solutions based on the | insights of others. Because we 're not going to dismiss highly | respected experts in our fields just because they helped answer | good questions with good detailed answers on Stackoverflow, are | we? | | A few rungs up on the ladder we _still_ have programmers but | now the kind whose job it is to write _as little code as | possible_ , where their worth comes from knowing exactly how | little code glue is needed to, necessarily and sufficiently, | make other people's libraries work together to functionality | that is larger than the sum of its parts. These folks aren't | solving unique problems, they make things work with as little | code as possible, and copy-pasting from SO for problems that | have been solved countless times already by others is 100% | fine: their expertise is in knowing how to judge other people's | code to determine whether that's the code they need to copy- | paste. | | And then, of course, there all the folks in between those two | levels of abstraction. | | The biggest mistake would be to hear "programming" and think | "only my job is real programming, all those other people are | just giving me a bad name". Different horses and different | courses, _and_ different courses for different horses. | hyperion2010 wrote: | Some people think the problem is that we don't have enough | code. Anyone that has to maintain code knows that the problem | is that we have too much code. | gorjusborg wrote: | This is even better! Now we can generate code we might not | even understand, without even hitting all the keys. | RationPhantoms wrote: | Okay, and I'd argue that a good portion of programmatic | wrangling is simply trying to do Y with Z. Something that's | probably done 10,000 times over by others but in the silo'd | confines of that single developer's workspace; an utter fucking | mystery to them. | | What's the carbon displacement for wasted time on those tasks? | It might be brow raising. | slver wrote: | You sound afraid you'll be replaced by software. | dmitrygr wrote: | Your fears seem justified, as per the site itself: | | Whether you're working in a new language or framework, or just | learning to code, GitHub Copilot can help you find your way. | Tackle a bug, or learn how to use a new framework without | spending most of your time spelunking through the docs or | searching the web. | davidthewatson wrote: | Cargo cults concerned me too but I realized that cargo-cult | programming flourishes when it's enabled by a culture that | doesn't care how the sausage is made. If the culture seeks full | stack truth, it's not likely to get fooled by bad generated | code, no matter whether it's generated by copy/paste, | metaprogramming, or AI. | | I'd love to know what Donald Knuth thinks given the history of | literate programming. | rPlayer6554 wrote: | The first thing that comes to mind is the recent article on the | front page about the docker footgun/default that allowed a | hacker to wipe a website's database. | dqpb wrote: | That entirely depends on the quality of the suggestion, does it | not? | Mountain_Skies wrote: | Not looking forward to dealing with this from a security point | of view. It's difficult to get developers to accept | responsibility for security vulnerabilities in libraries | they've selected for their project ("That's not my code!"). I | can see the same thing happening with generated code where they | don't want to take responsibility for finding a way to | remediate any vulnerabilities they didn't personally type in. | Of course those who exploit the vulnerabilities won't care how | it got into the code. They're just happy they're able to make | use of it. | IncludeSecurity wrote: | It says it's trained on "billions of lines of code" | | I would augment that to "billions of lines of code that may | or may not be safe and secure" | | If they could tie in CodeQL into Copilot to ensure the | training set only came from code with no known security | concerns, that would be a big improvement. | kbenson wrote: | I think the mistake you might be making is assuming that any | tool adopted is always used all the time. Even professional | race car drivers probably opt for an automatic transmission | over the manual on the mini-van if they get one. Different | choices for different needs. | | There will always be a place for meticulous consideration of | exactly what's being done, and many levels of that as well. For | the same reason people reach for python to mock up a proof of | concept or throw something together that is non-essential but | useful to have quickly, even meticulous programmers might use | this to good effect for small things they don't care to spend a | lot of time on because it's not as important as something else, | or the language they're using for this small task isn't one | they feel as proficient in. | albertkoz wrote: | Next step will be AI to approve the code, because. if someone | is producing 5k LOC a day there are people who need to read & | approve this code... | throwaway675309 wrote: | I mean you say this, but you and most likely the majority of | programmers rely on dozens of repositories, packages and | libraries with likely zero deep understanding of it (and at the | very least haven't read the source code of ) so I don't really | understand the difference here. | | The advantage of something like this is that instead of having | to go to stack overflow or any number of reference sites and | copy pasta it can just happen automatically without me having | to leave my IDE. | | The enjoyable part of programming for me is not typing the Ajax | post boilerplate bullcrap for the millionth time, it's the | high-level design and abstract reasoning. | mdellavo wrote: | if you are typing out boilerplate you should look to abstract | it away | Yaina wrote: | I think this goes beyond one project. In your lifetime you | just have to write certain things again and again and then | you have to write the abstractions again and again. | | Maybe that warrants a library, but then you also have to | hook that up with the ever so slightly different | boilerplate code. | | If this 90% of the easy stuff is done for you, that gives | you more time to focus on the 10% that matter. | Spivak wrote: | Oh god please don't do this indiscriminately. If you're | typing out boilerplate, document it and add a generator for | it. I've been bitten probably hundreds of times by bad | abstractions created to save some keystrokes that turned 50 | lines of boring easily readable code into an ungrokable | dense mess. | gugagore wrote: | What do you mean "add a generator for it"? Do you mean | something like a templating for source code, like the C | pre-processor? | | I think there is a pro and a con to that approach. | | The pro is that there is a meaningful and familiar | intermediate representation --- the output of the C pre- | processor is still C code. Another example is | https://doc.qt.io/qt-5/metaobjects.html | | The con is that, well, it introduces a meta layer, and | these meta layers are more often than not ad-hoc and | ultimately became quite unwieldy. It's a design pattern | that suggests that there is a feature missing from the | language. | Spivak wrote: | No absolutely not, I think that's the worst of both | worlds. I mean something like `rails generate` where it's | a parameterized block of generated code that you insert | inline and then edit to your needs. | | The disadvantage is that making sweeping changes is more | work. The advantage is that making sweeping changes can | be done incrementally. But the big win with code | generators is that all you need to understand what's | happening is right in front of you instead of having to | mentally unwind a clunky abstraction. | | Don't get me wrong if you have a good abstraction that | reduces the use of mental registers do it! But you would | and should do that regardless of boilerplate. | Verdex wrote: | "Okay, so you pass the function a lambda. And the input | parameter to that lambda is another function that itself | consumes a list of lambdas. And this is so that you don't | have to init and fill in a few dictionaries OR because | you might need to otherwise use an if-statement." | | I like abstractions as much as the next person, but | oftentimes you can just make due with the exact thing. | cunthorpe wrote: | Importing an external, tested, reliable dependency is | completely different from anonymous non-checked untested code | in your repository committed by someone who did not even read | it. | | Check out the memoize example. That fails as soon as you pass | anything non-primitive but there's no one documenting that. | rowanG077 wrote: | anonymous non-checked untested code is problematic in all | cases. This doesn't change that. | geraneum wrote: | A programmer who commits untested sloppy code of their own | writing, will do it regardless of having access to such a | service. Nothing will make me commit the generated code | without testing it. I think this tool could take care of | the boilerplate and the rest will still be on the | programmer. At least in the near future. | gnulinux wrote: | > anonymous non-checked untested code | | What? It's not anonymous, it's still committed by a dev. It | can be non-checked and untested, that's true. But it's not | any less untested than any other code. If you choose not to | write tests for your code, this won't change anything. | | The only issue I see with this is it being potentially | unchecked. And the solution to that is reading all the code | you commit, even though it's generated by AI. | TeMPOraL wrote: | It's about affordances. As presented, this tool | streamlines copy-pasting random snippets. The easier | something is, the more people do it. | | Testing doesn't even enter the picture here, we're at the | level of automating the stereotypical StackOverflow- | driven development - except with SO, you at least get | some context, there's a discussion, competing solutions, | code gets some corrections. Here? You get a black-box | oracle divining code snippets from function names and | comments. | | > _the solution to that is reading all the code you | commit, even though it 's generated by AI_ | | Relying on programmer discipline doesn't scale. Also, in | my experience, copy-pasting a snippet and then checking | it for subtle bugs is _harder_ than just reading it, | getting the gist of it, and writing it yourself. | veverkap wrote: | > As presented, this tool streamlines copy-pasting random | snippets. | | It synthesizes new code based on a corpus of existing | code. | TeMPOraL wrote: | Yes. Given how it does it, this makes it even more | dangerous than if it was just trying to _find_ a matching | preexisting snippet. | jbrot wrote: | > Programmer discipline doesn't scale | | Thank you for putting this so eloquently. This has | basically been the sole tenet of my programming | philosophy for several years, but I've never been able to | put it into words before. | ahepp wrote: | >Tests without the toil. Tests are the backbone of any | robust software engineering project. Import a unit test | package, and let GitHub Copilot suggest tests that match | your implementation code. | | It looks to me like they're _suggesting_ you use Copilot | to _write the tests_. | pvorb wrote: | I really wonder who those folks copy-pasting from Stack | Overflow all day are. I only rarely find pieces of code that | I can copy-paste. Typically Stack Overflow only gives me an | idea of how to solve something, but incorporating that idea | into my code base is still not trivial. | KingMachiavelli wrote: | There is certainly a balance. When I want to implement | feature X a client has requested but I have to deal with | home grown database abstraction layers and custom AJAX API | structures - I get the feeling that a third party library | probably does it better and has more eyes on the code than | exist at my company. | | That said I would probably not look to a third party | library to just to simple data transformation stuff. | Probably the only thing I do copy almost verbatim from SO | are things like Awk/Sed commands that are easy/low risk to | test but would take hours to derive myself. | dpq wrote: | I had a colleague who once tried to copy-paste a Scala | snippet into Python code and came to complain that it | doesn't work. We're no longer colleagues. | auggierose wrote: | Yep. And now imagine these people with Github Copilot in | their arsenal. God help us all. | bayindirh wrote: | > I really wonder who those folks copy-pasting from Stack | Overflow all day are. | | I can think about people who can do this happily. Some of | them are professional programmers. Some are self-taught. | Some have CS education. _Seriously_. | | OTOH, I'm similar to you. I either re-use my own snippets | or read SO or similar sites to get an idea generally how a | problem is solved and adapt it to my code _unless_ I find | the concept I 'm looking for inside the language docs or | books I have. | | Yes, I'm a RTFM type of programmer. | dec0dedab0de wrote: | A few weeks ago someone on a call said "we all know that | programming is mostly copy and pasting anyway" A few people | laughed, but I said that if I catch myself copy and pasting | then I know that something is very wrong. It was kind of | awkward, but I didn't like my job being trivialized by | people who never really did it. | | It would be like if I said plumbing or auto repair is just | watching youtube videos and going to lowes. Just because | I've managed to do a few simple things, doesn't mean I'm in | a position to belittle an entire profession. | | That said, I am also shocked by how many full time | developers don't take the time to understand their own | code. Let alone the libraries they use. | bayindirh wrote: | > That said, I am also shocked by how many full time | developers don't take the time to understand their own | code. Let alone the libraries they use. | | Me too, then I understood that code and programming is | commoditized. As long as it works and looks pretty on the | outside and it can be sold, it's fair game. | | "There'll be bugs anyway, we can solve these problems | somehow" they probably think. | | Heck even containers and K8S is promoted with "Developers | are unreliable in documenting what they've done. Let's | make things immutable so, they can't monkey around on | running systems, and make undocumented changes" motto. | | I still run perf on my code and look for IPC and cache | trashing ratio numbers and try to optimize things, | thinking "How can I make this more efficient so it can | run faster on this". I don't regret that. | godelski wrote: | Some people consider using stack as a heavy inspiration | to be equivalent to "copy and pasting." It's linguistic | shorthand really. | jka wrote: | I'd be willing to bet a reasonable amount that there's a | large future for "subtractive software development" (maybe a | slightly misleading or unfair term, since it'd include | bugfixes). | | Once we have multiple proven technologies that handle each of | the functional areas that we collectively need, then we'll | start to find greater benefit in maintenance, bugfixes, and | performance improvements for those existing technologies and | their dependencies than we find writing additional code and | libraries. | echelon wrote: | I think as our field evolves, more work will be dealing with | high level abstractions. There is a massive need for | distributed systems design. Companies have big ambitions, but | not enough labor to accomplish them. | | There will still be plenty of low level systems programming | work. The field is growing, not shrinking. | | One impact this may have is that it may make tasks easier and | more accessible, which could bring lots of new talent and | could also apply downward force on wages. But the counter to | that is that there is so much more work to be done. | | I'm all for new tools. | scrozier wrote: | Trying to figure out why in the world your comment was down | voted. | inimino wrote: | You don't see the difference between relying on a few battle- | hardened libraries, and copy-pasting into your own code some | mishmash of code that looked similar that other people wrote | and is probably something like what a machine learning model | thought you probably meant? Maybe we're in worse shape then I | thought. | | > The advantage [...] without me having to leave my IDE. | | You're arguing for the convenience, my point was that that | convenience creates a moral hazard, or if you prefer, a | perverse incentive, to increase the number of lines of code, | amount of boilerplate, code duplication, and to accept | horrible, programmer-hostile interfaces because you have tied | yourself to a tool that is needed to make them usable. | | > Ajax post boilerplate | | This is an argument for choosing the most appropriate | abstractions. The problem with boilerplate isn't that you | have to type it, it's that it makes the code worse: longer, | almost certainly buggier, harder to read and understand, and | probably even slower to compile and run. You could have made | an editor macro 20 years ago to solve the typing boilerplate | problem, but it wasn't the best answer then and it isn't now. | majormajor wrote: | > I mean you say this, but you and most likely the majority | of programmers rely on dozens of repositories, packages and | libraries with likely zero deep understanding of it (and at | the very least haven't read the source code of ) so I don't | really understand the difference here. | | I spend a probably half my coding time testing and digging | into those libraries because I don't understand them and | because they cause performance issues because nobody on the | team understands them sufficiently to make their "high level | design and abstract reasoning" accurate. | | One problem with the current world of programming tools is | that there's no good way to know which libraries are suitable | for use when correctness and performance and reliability | really matters, and which are only really meant for less | rigorous projects. | z3t4 wrote: | One problem with IDE's is that they can be antagonistic of | good practices such as writing comprehensible code, small | code bases, and good documentation. | js8 wrote: | > you and most likely the majority of programmers rely on | dozens of repositories, packages and libraries with likely | zero deep understanding of it | | Perhaps AI should work on simplifying the existing stack | first, without breaking the functionality. What about that? | awb wrote: | > The advantage of something like this is that instead of | having to go to stack overflow or any number of reference | sites and copy pasta it can just happen automatically without | me having to leave my IDE. | | In the examples, I wish the auto-generated code came with | comments or an explanation like SO does. The code I need help | with the most is the code that's a stretch for me to write | without Googling. The code I can write in my sleep I'd rather | just write without a tool like this. | [deleted] | karmasimida wrote: | > Someone will produce 5000 lines of code in a day for a new | feature, but only read 2000 of those lines. | | Shouldn't the one who has produced this code be responsible for | making sure the integrity of it? 5k LOC in a day without test | cases, then that is no code, it is a disaster. | | I think the marketing here is about right. This is no AI | programmer, but Copilot. It is an intelligent assistant that | does some mundane things for you with probability of failing | some of that even, but when the stars align, you are in luck. | | I see this as INCREDIBLY useful for certain niche of | programming: | | 1. Front end. Some components are really trivial but still | requires some manual rewiring and stuff, this could be the life | saver. | | 2. Templates for devops. Those are as soul crushing as | possible, and I couldn't think of a better domain to apply | Copilot to it. | | Overall, this is a huge win for programmer productivity, with | reasonable compromises. | z3t4 wrote: | If we exclude the VM the code is running on, and the OS layer, | and the kernel, and the micro-code, and the standard lib, then | people also like to include library code, and also like to | depend on third party PaaS and SaaS aka the "cloud"... If you | do know what all bits of your code does you can send me a PM. | _cough_ All software is shit, with few exceptions. Not | necessary because the developers don 't know their stuff, but | likely because of business priorities, politics, and layers of | management. Software is a "people problem". So if we remove the | "people" we might get better software ;) | vbezhenar wrote: | I think that there's a division between "productive" developers | and "meticulous" developers. I know that I'm not the first one. | My best days are when I'm removing code. I'm very wary from | using frameworks and huge libraries. I learned few frameworks | and libraries, I've chosen few that correspond to my style and | I'm very careful when it comes to adapting new ones. I prefer | to spend a week coding auth layer rather than installing some | SaaS connector and call it a day. I prefer to spend few days | reading source code and developing my own solution (or just | discarding the whole idea) rather than quickly google something | and move on. | | May be I'm just unprofessional, I don't know. I get my stuff | done, may be not as fast, but at least I understand every bit | of my code and I rarely have unexpected surprises. I understand | that there are other approaches, but I just don't enjoy that | way, so I follow mine as long as I can find work. And I | actually like things that other people find boring, according | to this thread. Withing "business code" - hate it, writing | "auth layer" - love it. | WanderPanda wrote: | Sounds like you should be working on libraries instead of | products then?! | oscribinn wrote: | So you're basically saying that if you're not a shitty | programmer, you shouldn't be working directly on products? | Are you a Windows developer or something? | b215826 wrote: | I don't think you're unprofessional. In fact, your sentiment | is a belief that was strongly held by early UNIX programmers. | Two quotes I particularly like: | | _" The real hero of programming is the one who writes | negative code."_ -- Douglas McIllroy | | _" One of my most productive days was throwing away 1000 | lines of code."_ -- Ken Thompson | | But unfortunately, we've come to a situation where SLOC and | innovation for the sake of innovation is more important than | code quality. | Bayart wrote: | I'm the same. I tend to get derailed on making my code | philosophically "right" and aesthetically "soothing" (for | lack of a better word), even when it doesn't obviously matter | to the scope of the project, rather than just it to the point | where it _works_ by some operation of the Holy Spirit. | Unsuprisingly I'm the "is he working ?" guy (I may fit in the | attention disorder category that was a point of discussion in | the confidence thread[1] the other day). But at least I'm not | the "his code broke our shit again" guy. | | [1] https://news.ycombinator.com/item?id=27533988 | quek wrote: | > As someone more aligned with the Dijkstra perspective, this | seems to me like one of the single worst ideas I've ever seen | in this domain. | | Absolutely true. | | If your code is so repetitive that can be correctly predicted | by an AI, you are either using a language that lacks | expressiveness or have poor abstraction. | | The biggest problem in the software world is excessive | complexity, excessive amounts of (poor) code and reinventing | the wheel. | slver wrote: | You're underestimating the sophistication of said AI. | novok wrote: | I've used tabnine for a while, and it's mostly just been a | faster executing normal autocomplete, with a %90 accuracy rate. | It's a tradeoff. It didn't have the large snippet behavior in | my usage like this new one although. | hintymad wrote: | Somehow I don't see people discuss this kind of tools from the | perspective of managing essential complexity versus accidental | complexity. Maybe copilot just increases the abstraction level | of coding, so we can treat generated code as a building block, | just like we nowadays rarely needs to care about how to write | assembly code or how a balanced tree works? | TeMPOraL wrote: | > _Maybe copilot just increases the abstraction level of | coding, so we can treat generated code as a building block_ | | At this point it _doesn 't_, and we _can 't_, because Copilot | is just a fancy autocomplete. The code is there, first class, | in your file. It doesn't introduce new concepts for you, it | just tries to guess what you mean by function signature + | descriptive comments, and generates that code for you. | JabavuAdams wrote: | You can only solve relatively small problems this way. As I get | older, I like the physical act of programming less and less, | and just want to solve problems so I can get going on all of my | ideas backlog. I've been programming almost every day for the | last 38 years. What I really want to do is solve (my) problems. | dennisy wrote: | I really agree with this even though my experience is much | smaller than yours. | | The biggest place I think it is frustrating to write code is | for ML pipelines where you know what you need, but it takes a | few hours to wrangle files and pandas until you can run your | idea/experiment. | JabavuAdams wrote: | Ermahgerd, yes! Doing this right now. | auggierose wrote: | But programming is the greatest fun on earth. Wait no, being | able to work on your ideas and communicating them to the | computer is the greatest fun on earth. Now, if your only | tools for communication are made by "productive developers", | this is where the problem is. Not with programming itself. | uticus wrote: | As a counterpoint to most of the other responses, I agree with | this comment. In my eyes, this is very similar to the issue of | bloated software being enabled by faster processors. This | doesn't mean slower processors are the answer, but rather that | there are often unintended consequences that need to be | considered when solving for a problem. So, as an example of | what could be better than making boilerplate easier to type, I | would suggest programming languages, frameworks, and tooling | that reduce the need in the first place would be worth | considering. | ionwake wrote: | I get the impression the only way to use this is in a github | cloud environment, which means all the code you type will | essentially belong to github in some capacity? | chad_strategic wrote: | I wonder if they are going to have a plugin for ATOM IDE? | Iv wrote: | Is it only available for Visual Code? | rvz wrote: | Yes. Obviously. | auggierose wrote: | What could go wrong? Expecting software to become even shittier. | maxpert wrote: | Now I am gonna have an outage because co-pilot wrote some code | that had an error. People have implicit bias of code being | reliable that they don't write, StackOverflow snippets are | perfect example for that. | ehsankia wrote: | > StackOverflow snippets are perfect example for that. | | You basically just explained why your worry is unfounded | yourself. This is already the status quo. People already write | buggy code and copy buggy code from SO all the time. | | The goal of this isn't to write perfect code, that's still up | to the programmer to do. This won't make a bad programmer not | write buggy code magically. | blain wrote: | > OpenAI Codex was trained on publicly available source code | (...) | | It would be nice if github made this tool publicly available in a | good spirit of open-source instead of straight up monetizing it. | | I get that github is not a non-profit but still. | buremba wrote: | The landing page is really cool. It looks like the screencast is | built with Javascript, is there any tool that helps building such | screencasts? I assume that it's not trivial to build such | animations. | taywrobel wrote: | Depends on what you need to do, but asciinema is pretty much | exactly that use case: https://asciinema.org/ | | Wouldn't work in this case with the overlays and styling tho. | bww wrote: | The example in the hero animation has a bug. The ${text} may not | be correctly URL-encoded, which would make the body invalid. And | because this sort of feature encourages people to blindly trust | the machine and not think about what they're doing this error is | much less likely to be caught. | | Personally I think this whole class of feature only offers | trivial short-term efficiency gains at the expense of long-term | professional development and deeper understanding of your field. | brundolf wrote: | For a long time now I've thought that AI would have a really | interesting role to play in developer experience, though this | isn't really the form I think it should take. | | I think it would make the most sense as a really advanced static- | analysis tool/linter/etc. Imagine writing something like C where | whole classes of errors can't be checked statically in a | mechanical way, but they could be found by a fuzzy ML system | automatically looking over your shoulder. Imagine your editor | saying "I'm not sure, but that feels like a memory error". And of | course you can dismiss it "no, good thought, but I really meant | to do that". Imagine an editor that can automatically sniff out | code-smells. But the human, at the end of the day, still makes | the call about what code to write and not write; they're just | assisted with extra perspective while doing so | ingvul wrote: | I can only imagine one of the big reasons to release Copilot to | the public for free is to make it better by sending back to | GitHub the "reviewed code". Example: | | - Copilot suggests me a snippet of code | | - It's almost what I wanted. I fix the code | | - Copilot sends back the fixed code to GitHub | | - Copilot gets better at guessing my (and others) wishes | | Unless Copilot is running locally, I won't use it. | ranguna wrote: | Same, I do quite a bit of mistakes like pasting secrets and | immediately deleting them right after, and I also have local | secrets that are gitignored which I think copilot would just | upload without a second thought. | hprotagonist wrote: | The IP implications of how network weights depend on their data | sources is, shall we say, a matter of ongoing legal discussion. | | We genuinely don't know what it means for licenses or whatnot | right now. | fabiospampinato wrote: | IMO a potentially more interesting application of this technology | would be a learning system that is able to learn your coding | style. You give it access to the codebase and it reformats all | files on save according to your likings, perfectly. | | Obviously a program that is able to write actual great code | reliably would be spectacular, but we aren't there yet, I don't | think Copilot presently is able to make me meaningfully more | productive. | 101011 wrote: | Don't most IDEs handle this already? | fabiospampinato wrote: | There are no tools that can format my code with my coding | style AFAIK. There are multiple tool that can format my code | with their coding style though, which I don't care about. | dawnerd wrote: | Most? All? Allow you to edit the config to get it to your | coding style but it would be cool to infer it from files | you've already written. | fabiospampinato wrote: | There aren't enough knobs to turn to get exactly what you | want, I have a 500+ lines file full of linter rules | configurations and that's still not good enough. | | At the end of the day I think it boils down to this | simple fact: you can't imperatively codify what makes the | face of a person beautiful for you because that's too | complicated, similarly you can't codify what makes for | beautiful code to your eyes, it's something that must be | learned from examples. | kleiba wrote: | Formatting code according to a given style is a much easier | task than what Copilot does. | alpaca128 wrote: | Apparently not that easy[0]. Depending on one's priorities | formatting/pretty-printing code can be trivial or very | difficult. Doing that dynamically based on a user's | individual preferences or even their existing codebase's | style is probably much harder. | | I don't doubt Copilot is also quite a problem to solve, most | likely indeed harder, though in that case at least there's an | abundance of training data. | | [0] https://news.ycombinator.com/item?id=22706242 | fabiospampinato wrote: | It depends what you mean by "given", I can't write a million | line document describing exactly what kind of style I want it | to use, the formatter must learn the style on its own from | examples. | | I agree that something like that would be much easier to make | in theory, hence why I'm suggesting it since maybe it could | be made ~perfectly, which Codepilot isn't (we haven't | unlocked AGI yet). | avipars wrote: | powered by openai, i'm guessing gpt 3 | dang wrote: | To see all the 800+ comments, you'll need to click More at the | bottom of the page, or like this: | | https://news.ycombinator.com/item?id=27676266&p=2 | | https://news.ycombinator.com/item?id=27676266&p=3 | | (Comments like this will go away when we turn off pagination. I | know it's annoying. Sorry.) | cfcf14 wrote: | From Clippy to this technology, in just under 25 years! Very | impressive. I wonder how much impact this tool might have as a | teaching aid, as well? | [deleted] | orliesaurus wrote: | I remember another company called Kite [1], working on a similar | approach - smart autocompletion - this however uses GPT-3 so it's | a bit different I guess because it doesn't detect what you're | trying to do but rather transforms technical natural language | into code. Right? | | [1] https://www.kite.com | dgb23 wrote: | Github sources suggestions from OS projects. There have been AI | completion tools that upload your code, basically spyware. | Definitely check thoroughly if that's the case! | WhompingWindows wrote: | I'm primarily an R and SQL user, excited to try this out on some | fun data analyses. | | How did you construct the Copilot? Did you use a learning | approach based on data from actual pair-programming sessions? Or | did you take tons of code as your input and use that to suggest | next methods based on input code? | | I learned a ton whenever I pair programmed, but now I'm at a | small company so I'm looking for fun ways to learn new methods :) | fzaninotto wrote: | I've been using the alpha for the past 2 weeks, and I'm blown | away. Copilot guesses the exact code I want to write about one in | ten times, and the rest of the time it suggests something rather | good, or completely off. But when it guesses right, it feels like | it's reading my mind. | | It's really like pair programming, even though I'm coding alone. | I have a better understanding of my own code, and I tend to give | better names and descriptions to my methods. I write better code, | documentation, and tests. | | Copilot has made me a better programmer. No kidding. This is a | huge achievement. Kudos to the GitHub Copilot team! | mdellavo wrote: | do you still go over the generated code line by line and | touchup in places where it did not do a good job? | fzaninotto wrote: | It suggests code line by line, so yes | mdellavo wrote: | I guess I dont see the point if only 10% of the time it's | exactly what you want and the rest of the time you have to | go back and touch up the line. | | Does it train a programmer for accepting less than ideal | code because it was suggested? Similar to how some | programmers blindly copy code from StackOverflow without | modification. | | Seems like there is a potential downside that's being | ignored. | staticassertion wrote: | > Does it train a programmer for accepting less than | ideal code because it was suggested? Similar to how some | programmers blindly copy code from StackOverflow without | modification. | | Maybe juniors, but I don't see this being likely for | anyone else. I've been using TabNine for ages and it's | closer to just a fancy autocomplete than a code | assistant. Usually it writes what I would write, and if I | don't see what I would write I just write it myself | (until either I wrote the whole thing or it suggests the | right thing). Of course, barring some variable names or | whatever. | | I don't have it "write code for me" - that happens in my | head. It just does the typing. | verst wrote: | This is not true - and I've been using copilot for many | months :) | | It suggests entire blocks of code - but not in every | context. | fzaninotto wrote: | My bad, you're right. I remember now that it suggested me | entire code blocks from time to time. | | Do you know in which "context" it suggests a block? | verst wrote: | It usually suggests blocks within a function / method in | my experience. Here's an example I created just now: | | https://gist.github.com/berndverst/1db9bae37f3c809e5c3f56 | 262... | blueblisters wrote: | With VSCode, Github and a perhaps a little bit of help from | OpenAI, Microsoft is poised to dominate the developer | productivity tools market in the near future. | | I wouldn't be surprised to see really good static analysis and | automated code review tools coming out of these teams very | soon. | croes wrote: | And still Windows is a mess. | _fat_santa wrote: | I'd bet money that the VSCode and Windows teams are | basically on different planets and Microsoft. | jrockway wrote: | I bet there are people that use Windows to develop VSCode | and use VSCode to develop Windows, so some people | probably know each other internally. I think what escapes | HN is how massively successful Microsoft is. Sure, the | search built into Windows sucks. There are many, many | more complicated components of a platform and OS than | that, and those seem to work as well as any other | platform and OS. | bostonsre wrote: | Compared to what other operating system(s)? | | wsl on windows 10 has been amazing to develop and work on. | azangru wrote: | > wsl on windows 10 has been amazing to develop and work | on. | | Now imagine how amazing it would be just on Ubuntu ;-) | croes wrote: | Every other OS. It's full of legacy APIs and scrapped new | APIs. Every release is like two steps one step forward, | one step back and one two the side. Just because | thousands of companies have written software and drivers | for it, it's still existing. If it were released today it | wouldn't stand a chance. | bufferoverflow wrote: | I've been on both Windows and Ubuntu for a while. I'd say | Ubuntu has a ton more issues and requires a ton more | initial configuration to behave "normally". | | I don't even remember the last time Windows got in my way, | in fact. | mssundaram wrote: | It sounds similar to the editor plugin called TabNine | fzaninotto wrote: | Side note: I recently suffered from a tennis elbow due to sub | optimal desktop setup when working from home. Copilot has | drastically reduced my keystrokes, and therefore the strain on | my tenders. | | It's good for our health, too! | yodon wrote: | I watched the same thing happen with the introduction of | Intellisense. Pre-Intellisense I had tons of RSI problems and | had to use funky ergonomic keyboards like the Kinesis | keyboard to function as a dev. Now I just hop on whatever | laptop is in front of me and code. Same reason - massive | reduction in the number of keys I have to touch to produce a | line of code. | iandanforth wrote: | What is the licensing for code generated in this way? GPT-3 has | memorized hundreds of texts verbatim and can be prompted to | regurgitate that text. Has this model only been trained on code | that doesn't require attribution as part of the license? | richardanaya wrote: | You're using the word "memorized" in a very loose way. | delaaxe wrote: | His point still holds, GPT-3 can output large chunks of | licensed code, verbatim | tsbinz wrote: | How is it loose? Both in the colloquial sense and in the | sense it is used in machine learning it is fitting. | https://bair.berkeley.edu/blog/2020/12/20/lmmem/ is a post | demonstrating it. | billti wrote: | The landing page for it states the below, so hopefully not | too much of an issue (though I guess some folks may find a | 0.1% risk high). | | > GitHub Copilot is a code synthesizer, not a search engine: | the vast majority of the code that it suggests is uniquely | generated and has never been seen before. We found that about | 0.1% of the time, the suggestion may contain some snippets | that are verbatim from the training set. | varispeed wrote: | Did they have a license to use public source code as a data | source for data set though? | jgworks wrote: | I'd be surprised if a company's legal department would be | OK with that 0.1% risk. | viraptor wrote: | Google already learned that one. "There's only a tiny | chance we may be copying some public code from Oracle." | may not be a good explanation there. | mempko wrote: | Finally, a faster way to spread bugs than copy/paste. | Abimelex wrote: | Not sure how this is handled in US, but in Germany a few | lines of code have in general not enough uniqueness to be | licensed. | gentleman11 wrote: | It could start to replace us in 20 years. Or reduce. It is | exciting for now | handrous wrote: | Until it automatically knows when _and how_ it 's wrong, | you'll still need a human to figure that out, and that human | will need to actually know how to program, without the | overgrown auto-complete. | | May or may not reduce the demand for programmers, though. | We'll see. | megablast wrote: | Thanks captain obvious. | softwaredoug wrote: | It seems to replace/shorten the loop of "Google for snippet | that does X" copy, paste, tweak, no? Which of course is super | cool for many tasks! | fzaninotto wrote: | It's smarter than that. It suggests things that have never | been written. It actually creates code based on the context, | just like GPT-3 can create new text documents based on | previous inputs. | | Edit: Check this screencast for instance: | https://twitter.com/francoisz/status/1409908666166349831 | semi-extrinsic wrote: | Anyone tried this on any sort of numeric computation? | Numpy, Julia, Pandas, R, whatever? | | I definitely see the utility in the linked screencast. But | I am left to wonder whether this effectiveness is really a | symptom of the extreme propensity for boilerplate code that | seems to come with anything web-related. | aqme28 wrote: | I'm not convinced that code snippet in the screencast had | never been written. It's fairly generic React, no? | RivieraKid wrote: | What's your estimate of the productivity boost expressed as a | percentage? I.e. if it takes you 100 hours to complete a | project without Copilot, how many hours will it be with | Copilot? | read_if_gay_ wrote: | I tried TabNine and it wasn't a _huge_ improvement because | what costs the most time isn't typing stuff but thinking | about what to type. | the_arun wrote: | In addition, I would like to see GitHub reviewing my code & | giving me suggestions on how I could improve. That will be | more educative & a tool to ensure consistency across code. | onlyrealcuzzo wrote: | I'm surprised this doesn't exist. Google, FB, and Apple | (and I imagine Microsoft) have a lot of this stuff built in | that is light-years better than any open source solution | I'm aware of. | | Given that MS owns GitHub and how valuable this is - I | imagine it will be coming soon. | kyawzazaw wrote: | SonarQube and IntelliJ does it in some way for me. | tomnipotent wrote: | +1 for SonarQube. Very easy way to add value to a project | without a lot of overhead. | staticassertion wrote: | Having been a TabNine user for a while I can say that it's | less of a productivity boost and more of a quality of life | win. It's hard to measure - not because it's small, but | because it's lots of tiny wins, or wins at just the right | moment. It _makes me happy_ , and that's why I pay for it - | the fact that it's probably also saving me some time is | second to the fact that it saves me from annoying interrupts. | fzaninotto wrote: | I'm not sure I spend less time actually coding stuff (because | I have to review the Copilot code). But the cost of the code | I write is definitely reduced, because: | | - the review from my peers is faster (the code is more | correct) - I come back less to the code (because I have | thought about all the corner cases when checking the copilot | code) - As I care more about naming & inline docs (it helps | copilot), the code is actually cheaper to maintain. | fzaninotto wrote: | Check out how it helped me write a React component: | https://twitter.com/francoisz/status/1409919670803742734 | | I think hit the Tab key more than I hit any other key ;) | chimtim wrote: | Unfortunately, one in 10 times is far from good enough (and | this is with good prompt engineering which after using large | language models for a while, one starts to do). | | I feel like the current generation of AI is bringing us close | enough to something that works once in a while but requires | constant human expertise ~50% of the time. The self-driving | industry is in a similar situation of despair where millions | have been spent in labelling and training but something | fundamental is amiss in the ML models. | croes wrote: | Maybe it's just because humans are not as creative as they | think. Whatever you do, thousands of others have done the same | already. So no need to pay a high level programmer, just a | mediocre one and the right AI assistant gives the same results. | IshKebab wrote: | I think it's more that this tool is only capable of | automating the non-creative work that thousands have done | already. | | It's still insanely impressive (assuming the examples aren't | more cherry picked than I'd expect). | jqbd wrote: | > So no need to pay a high level programmer, just a mediocre | one and the right AI assistant gives the same results. | | I think of it as not needing juniors for boring work, all you | need as a company is seniors and AI. | sterlind wrote: | without juniors, how do you get more seniors? | spaced-out wrote: | Maybe programmers will adopt a similar model to artists | and musicians. Do a lot of work for free/practically | nothing hoping that some day you can make it "big" and | land a full time role. | kyawzazaw wrote: | we already have that with unpaid interns and PhD students | 9dev wrote: | So where do these seniors come from? | Dylan16807 wrote: | You could have a much smaller pool of juniors/journeymen | that are focused on maximum learning rather than amount | of output. | failuser wrote: | That's beyond the planning horizon. | juancampa wrote: | Quite the opposite. Menial work will be automated away (e.g. | CRUD) and only good programmers will be needed to do the more | complicated work. | feross wrote: | I've also been using the Alpha for around two weeks. I'm | impressed by how GitHub Copilot seems to know exactly what I | want to type next. Sometimes it even suggests code I was about | to look up, such as a snippet to pick a random hex color or | completing an array with all the common image mime-types. | | Copilot is particularly helpful when working on React | components where it makes eerily accurate predictions. I see | technology like Copilot becoming an indispensable part of the | programmer toolbelt similar to IDE autocomplete for many | people. | | I also see it changing the way that programmers document their | code. With Copilot if you write a really good descriptive | comment before jumping into the implementation, it does a much | better job of suggesting the right code, sometimes even writing | the entire function for you. | FreeSpeech wrote: | Has anyone used Copilot with a more succinct language? It | appears to only automate boilerplate and rudimentary | patterns, which while useful in repetitive low signal to | noise ratio languages like React or Java, sounds less | appealing if you're writing Clojure. | spec-obs wrote: | spot the Github PR folks! | qaq wrote: | They were replaced by GPT too :) | cercatrova wrote: | More like GPT three :) | varispeed wrote: | I have been using the Alpha for two weeks as well. I'm | impressed by how GitHub Copilot appears to know exactly what | I want to type. Not often it even suggests code I was going | to peek, such as a snippet to a context menu or completing an | array with all Romanian postcodes. Copilot is particularly | helpful when working on Angular components where it makes | mesmerising predictions. I see technology like Copilot | becoming an essential part of the programmer's tool belt | similar to IDE autocomplete for many people and programmers. | | I also see it changing the way that people and programmers | document code. With Copilot if you write a very picturesque | comment before jumping into the implementation, it does a | mucho mejor trabajo de sugerir the right code, sometimes it | is even writing the entire function para ti. | pfraze wrote: | They finally did it. They finally found a way to make me | write comments. | asdfman123 wrote: | Pack it all up, boys, programming's over. Hello, AI. | | Anyone want to hire me to teach your grandma how to use the | internet? | anthropodie wrote: | Few days back, Sam Altman tweeted this | | "Prediction: AI will cause the price of work that can happen | in front of a computer to decrease much faster than the price | of work that happens in the physical world. This is the | opposite of what most people (including me) expected, and | will have strange effects" | | And I was like yeah I gotta start preparing for next decade. | bconnorwhite wrote: | https://twitter.com/sama/status/1404100794245214215 | itsoktocry wrote: | > _Prediction: AI will cause the price of work that can | happen in front of a computer to decrease much faster than | the price of work that happens in the physical world._ | | I'm skeptical. | | The envelope of "programming" will continue to shift as | things get more and more complex. Your mother-in-law is not | going to install Copilot and start knocking out web apps. | Tools like this allow _programmers_ to become more | productive, which _increases_ demand for the skills. | waprin wrote: | I strongly agree with you. | | Reminds me of something I read that claimed when drum | machines came out, the music industry thought it was the | end of drummers. Until people realized that drummers | tended to be the best people at programming cool beats on | the drum machine. | | Every single technological advancement meant to make | technology more accessible and eliminate expertise has | instead only redefined what expertise means. And the | overall trend has been a lot more work opportunities | created, not less. | atentaten wrote: | Yeah, but less drummers are being hired than before drum | machines came out. What you describe sounds like work has | become more concentrated into fewer hands. Perhaps this | will happen with software as well. | morei wrote: | What happened is what typically happens: Concentration of | expertise. The lower expertise jobs (just mechanically | play what someone else wrote/arranged) went away and | there was increased demand for higher expertise (be an | actual expert in beats _and_ drum machines). | | So the winners were those that adapted earlier and the | losers were those that didn't/couldn't adapt. | | This translates to: If you're mindlessly doing the same | thing over and over again, then it's a low value prop and | is at risk. But if you're solving actual problems that | require thought/expertise then the value prop is high and | probably going to get higher. | Karrot_Kream wrote: | But there's also the subtext that if you find yourself at | the lower-skill portion of your particular industry, then | you should probably have a contingency plan to avoid | being automated out of a job, such as retiring, learning | more, or switching to an adjacent field. | TimTheTinker wrote: | Exactly, and AI only means that this adage now applies to | programming as well. | runawaybottle wrote: | I think you have another thing coming. Think about what | really got abstracted away. The super hard parts like | scaling and infrastructure (aws), the rendering engines | in React, all the networking stuff that's hidden in your | server (dare you to deal with tcp packets), that's the | stuff that goes away. | | We can automate the mundane but that's usually the stuff | that requires creativity, so the automated stuff becomes | uninteresting in that realm. People will seek crafted | experiences. | jbay808 wrote: | Decreasing the price of programming work doesn't | necessarily mean decreasing the wages of programmers, any | more than decreasing the price of food implies decreasing | the wages of farmers. | | But on the other hand, it also _can_ mean that. | drusepth wrote: | I think the thought process is from the perspective of | the employer, if you assume these two statements are | true: | | 1) AI tools increase developer productivity, allowing | projects to get completed faster; and | | 2) AI tools offset a nonzero amount of skill | prerequisites, allowing developers to write "better" | code, regardless of their skill level | | With those in mind, it seems reasonable to conclude that | the price to e.g. build an app or website will decrease, | because it'll require either fewer man-hours 'til | completion and/or less skill from the hired developers | doing said work. | | You do make a good point that "building an app" or | "building a website" will likely shift in meaning to | something more complex, wherein we get "better" outputs | for the same amount of work/price though. | littlestymaar wrote: | Now replace "AI" in your 1 & 2 points with "Github" (and | the trend of open-sourcing libraries, making them | available for all). All you said still works, and it did | not harm programmer jobs in any way (quite the opposite). | | And actually, I really don't see AI in the next decade | making more of a difference than what Github did (making | thousands of man-hour of works available for free). | Around 2040 or 2050, maybe. But not soon, AI is still | really far. | frant-hartm wrote: | >that the price to e.g. build an app or website will | decrease | | Yes, and this in turn increases demand as more | people/companies/etc.. can afford it. | chaorace wrote: | It _may_ reduce the demand for the rank-and-file grunts, | though. | | Why would an architect bother with sending some work | overseas if tools like this would enable them to crank | out the code faster than it would take to do a code | review? | brodo wrote: | Exactly. I'm currently reading The Mythical Man-Month. | 90% of what the book discusses in term of programming | work that actually has to be done is completely | irrelevant today. Still the software industry is bigger | then ever. In the book it is also mentioned that | programmers spend about 50% of their time on non- | programming tasks. In my experience this is also true | today. So no matter the tools we've got, the profession | stayed the same since the early 70s. | laurent92 wrote: | What are notable books nowadays? It seems all the books I | can cite are from 2005-2010 (Clean Code, JCIP, even the | Lean Startup or Tribal Leadership...) but did the market | for legendary books vanish in favor of Youtube tutorials? | I'm running out of materials I can give to my interns to | gobble knowledge into them in bulk. | ALittleLight wrote: | Tools like email, instant messenger, and online calendars | made _secretaries_ much more productive which increased | demand for the skills. Wait... | | Replacement of programmers will follow these lines. New | tools, like copilot (haven't tried, but will soon), new | languages, libraries, better IDEs, stack overflows, | Google, etc will make programming easier and more | productive. One programmer will do the work that ten did. | That a hundred did. You'll learn to become an effective | programmer from a bootcamp (already possible - I know | someone who went from bootcamp to Google), then from a | few tutorials will. | | Just like the secretary's role in the office was replaced | by everyone managing their own calendars and | communications the programmer will be replaced by one or | two tremendously productive folks and your average | business person being able to generate enough code to get | the job done. | fungiblecog wrote: | I wonder does Sam Altman also believe that you can measure | programmer productivity by lines-of-code? | simias wrote: | Reading this thread it seems to me that AI is a threat for | "boilerplate-heavy" programming like website frontends, I | can't really imagine pre-singularity AI being able to | replace a programmer in the general case. | | Helping devs go through "boring", repetitive code faster | seems like a good way to increase our productivity and make | us _more_ valuable, not less. | | Sure, if AI evolves to the point where it reaches human- | level coding abilities we're in trouble, but that's the | case this is going to revolutionize humanity as a whole | (for better or worse), not merely our little niche. | Animats wrote: | _" Prediction: AI will cause the price of work that can | happen in front of a computer to decrease much faster than | the price of work that happens in the physical world. This | is the opposite of what most people (including me) | expected, and will have strange effects"_ | | I've been saying something like that for a while, but my | form was "If everything you do goes in and out over a wire, | you can be replaced." By a computer, a computer with AI, or | some kind of outsourcing. | | A question I've been asking for a few years, pre-pandemic, | is, when do we reach "peak office"? Post-pandemic, we | probably already have. This has huge implications for | commercial real estate, and, indeed, cities. | jahewson wrote: | I'm not sure why it's unexpected when it's essentially a | reframing of Baumol's cost disease. Any work that does not | see a productivity increase becomes comparatively more | expensive over time. | [deleted] | [deleted] | hn_throwaway_99 wrote: | I think this will result in classic Jevons paradox: | https://en.wikipedia.org/wiki/Jevons_paradox . As the price | of writing any individual function/feature goes down, the | demand for software will go up exponentially. Think of how | many smallish projects are just never started these days | because "software engineers are too expensive". | | I don't think software engineers will get much cheaper, | they'll just do a lot more. | throwawayboise wrote: | > Think of how many smallish projects are just never | started these days because "software engineers are too | expensive". | | Maybe many. If the cost/benefit equation doesn't work, it | makes no sense to do the project. | | > I don't think software engineers will get much cheaper, | they'll just do a lot more. | | If they do more for the same cost, they are cheaper. You | as a developer will be earning less in relation to the | value you create. | hn_throwaway_99 wrote: | > If they do more for the same cost, they are cheaper. | You as a developer will be earning less in relation to | the value you create. | | Welcome to the definition of productivity increases, | which is the only way an economy can increase standard of | living without inflation. | habibur wrote: | > You as a developer will be earning less in relation to | the value you create. | | Doesn't matter as long as I create 5x value and earn 2x | for it. I still am earning double within the same time | and effort. | shoguning wrote: | I'm guessing low expertise programmers whose main | contribution was googling stackoverflow will get less | valuable, while high expertise programmers with real | design skill will become even more valuable. | iechoz6H wrote: | I'm both of those things, what happens to my value? | iab wrote: | It goes up/down | ed_elliott_asc wrote: | Your legs will have to move faster than your arms. | mkr-hn wrote: | Sonic the Hedgehog's employment prospects are looking up. | blackearl wrote: | I just don't believe it. Having experienced terrible cheap | outsourced support and things like Microsoft's | troubleshooting assistant (also terrible), I'm willing to | pay for quality human professionals. They have a long way | to go before I change my mind. | qqqwerty wrote: | We went through the same hype cycle with self driving cars. | We are now ~15 years out from the DARPA challenges and to | date exactly 0 drivers have been replaced by AI. | | It is certainly impressive to see how much the GPT models | have improved. But the devil is in the last 10%. If you can | create an AI that writes perfectly functional python code, | but that same AI does not know how to upgrade an EC2 | instance when the application starts hitting memory limits, | then you haven't really replaced engineers, you have just | given them more time to browse hacker news. | ulber wrote: | Driving is qualitatively different from coding: an AI | that's pretty good but messes up sometimes is vastly more | useful for coding than for driving. In neither case can | you let the AI "drive", but that's ok in coding as | software engineering is already set up for that. Testing, | pair programming and code reviews are popular ways to | productively collaborate with junior developers. | | You're not replacing the engineer, but you're giving | every engineer a tireless companion typing suggestions | faster than you ever could, to be filled in when you feel | it's going to add value. My experience with the alpha was | eye opening: this was the first time I've interacted with | an AI and felt like its not just a toy, but actually | contributing. | qqqwerty wrote: | Writing code is by far the easiest part of my job. I | certainly welcome any tools that will increase my | productivity in that domain, but until an AI can figure | out how to fix obscure, intermittent, and/or silent bugs | that occur somewhere in a series of daisy-chained | pipelines running on a stack of a half-dozen | services/applications, I am not going to get too worked | up about it. | worldsayshi wrote: | I agree. It kind of amazes me though there is so much | room for obscurity. I would expect standardisation to | have dealt with this a long time ago. Why are problems | not more isolated and manageable in general? | bseidensticker wrote: | What is your definition of "replace"? Waymo operates a | driverless taxi service in Phoenix. Sign ups are open to | the general public. IMO this counts as replacing some | drivers as there is less demand for taxi service in the | operating area. | | https://blog.waymo.com/2020/10/waymo-is-opening-its- | fully-dr... | asdfman123 wrote: | > AI does not know how to upgrade an EC2 instance when | the application starts hitting memory limits | | That's exactly the kind of thing "serverless" hosting has | done for a while now. | qorrect wrote: | Yeah really bad example there. | kansface wrote: | This isn't self driving for programming, its more like | GPS and lane assist. | megablast wrote: | Self driving is used in the mining industry, and lots of | high paid drivers have been replaced. | | But you are clearly more knowledgeable with your 0 | drivers replaced comment. | oblio wrote: | Mining as in those big trucks or mining as in trains on | tracks? | rpeden wrote: | The trucks: https://im-mining.com/2020/06/02/suncor- | speeds-komatsu-980e-... | jostmey wrote: | I am blown away but not scared for my job... yet. I suspect | the AI is only as good as the training examples from | Github. If so, then this AI will never generate novel | algorithms. The AI is simply performing some really amazing | pattern matching to suggest code based on other pieces of | code. | | But over the coming decades AI could dominate coding. I now | believe in my lifetime it will be possible for an AI to win | almost all coding competitions! | asdfman123 wrote: | I guess it's worth pointing out that the human brain is | just an amazing pattern matcher. | | They feed you all these algorithms in college and your | brain suggests new algorithms based on those patterns. | jostmey wrote: | Humans are more than pattern matchers because we do not | passively receive and imitate information. We learn cause | and effect by perturbing our environment, which is not | possible by passively examining data. | | An AI agent _can_ interact with an environment and learn | from its environment by reinforcement learning. It is | important to remember that pattern matching is different | from higher forms of learning, like reinforcement | learning. | | To summarize, I think there are real limitations with | this AI, but these limitations are solvable problems, and | I anticipate significant future progress | visarga wrote: | Fortunately the environment for coding AI is a compiler | and a CPU which is much faster and cheaper than physical | robots, and doesn't require humans for evaluation like | dialogue agents and GANs. | qorrect wrote: | Well you still have to assess validity and code quality | which is a difficult task , but not unsolvable. | | Also Generative Adversarial Networks original | implementation was to pit neural networks against each | other to train them , they don't need human intervention. | timkam wrote: | To generate new, generally useful algorithms, we need a | different type of "AI", i.e. one that combines learning | and formal verification. Because algorithm design is a | cycle: come up with an algorithm, prove what it can or | can't do, and repeat until you are happy with the formal | properties. Software can help, but we can't automate the | math, yet. | jostmey wrote: | I see a different path forward based on the success of | AlphaGo. | | This looks like a clever example of supervised learning. | But supervised learning doesn't get you cause and effect, | it is just pattern matching. | | To get at cause and effect, you need reinforcement | learning, like AlphaGo. You can imagine an AI writing | code that is then scored for performing correctly. | Overtime the AI will learn to write code that performs as | intended. I think coding can be used as a "playground" | for AI to rapidly improve itself, like how AlphaGo could | play Go over and over again | [deleted] | dharmaturtle wrote: | > we can't automate the math, yet | | This exists: | https://en.wikipedia.org/wiki/Automated_theorem_proving | bccdee wrote: | This is moreso automation-assisted theorem proving. It | takes a lot of human work to get a problem to the point | where automation can be useful. | | It's like saying that calculators can solve complex math | problems; it's true in a sense, but it's not not strictly | true. We solve the complex math problems using | calculators. | sterlind wrote: | and there's already GPT-f [0], which is a GPT-based | automated theorem prover for the Metamath language, which | apparently submitted novel short proofs which were | accepted into Metamath's archive. | | I would very much like GPT-f for something like SMT, then | it could actually make Dafny efficient to check (and | probably avoid needing to help it out when it gets | stuck!) | | 0. https://analyticsindiamag.com/what-is-gpt-f/ | visarga wrote: | You mean like AlphaGo where the neural net is combined | with MCTS? | GrinningFool wrote: | > If so, then this AI will never generate novel | algorithms. This is true, but the most programmers don't | need to generate novel algorithms themselves anyway. | habibur wrote: | > I now believe in my lifetime it will be possible for an | AI to win almost all coding competitions! | | Then we shall be reaching singularity. | jostmey wrote: | We will only reach a singularity with respect to coding. | There are many important problems beyond computer coding | like engineering and biology and so on | michaelmrose wrote: | Coding isn't chess playing it's likely about as general | as math or thinking. If you can write novel code you can | ultimately do biology or engineering or ultimately | anything else. | visarga wrote: | > Anyone want to hire me to teach your grandma how to use the | internet? | | Only for the first time to train a model for that. | Joeri wrote: | Automation has always produced an increase in jobs so far, | although sometimes in a disruptive way. I consider this like | the switch from instruction-level programming to compiled | languages, a level of abstraction added that buys a large | increase in productivity and makes projects affordable that | weren't affordable before. If anything this will probably | lead to a boom in development work. But there's a bunch of | low skill programmers who can't do much more than follow | project templates and copy paste things. Those people will | have to level up or get out. | addicted wrote: | Yes, but AI isn't the same as automation. | | Automation is a force multiplier. AI is a cheaper way of | doing what humans do. | | And the AI doesn't even need to be "true" AI. It simply | needs to be able to do stuff better than what humans do. | visarga wrote: | > AI is a cheaper way of doing what humans do. | | Like protein solving? /s | Hammershaft wrote: | > Automation has always produced an increase in jobs so far | | Do you have a source for this re the last 20 years? It | seems to me automation has been shifting the demand | recently towards more skilled cognitive work. | seppin wrote: | A global increase in jobs, a decrease in the west. | croes wrote: | I think you are confusing correlation and causation. Not | automation produces jobs, more people and more income for | that people produces jobs, because more people means more | demand. | [deleted] | asdfman123 wrote: | I feel like the inevitable path will be: | | 1) AI makes really good code completion to make juniors way | more productive. Senior devs benefit as well. | | 2) AI gets so good that it becomes increasingly hard to get | a job as a junior--you just need senior devs to supervise | the AI. This creates a talent pipeline shortage and screws | over generations that want to become devs, but we find ways | to deal with it. | | 3) Another major advance hits and AI becomes so good that | the long promised "no code" future comes within reach. The | line between BA and programmer blurs until everyone's | basically a BA, telling the computer what kind of code it | wants. | | The thing though that many fail to recognize about | technology is that while advances like this happen, | sometimes technology seems to stall for DECADES. (E.g. the | AI winter happened, but we're finally out of it.) | drusepth wrote: | I could also see an alternative to #2 where it becomes | increasingly hard to get a job as a senior dev when | companies can just hire juniors to produce probably-good | code and slightly more QA to ensure correctness. | | You'd definitely still need _some_ seniors in this | scenario, but it feels possible that tooling like this | might reduce their value-per-cost (and have the opposite | effect on a larger pool of juniors). | | As another comment said here, "if you can generate great | python code but can't upgrade the EC2 instance when it | runs out of memory, you haven't replaced developers; | you've just freed up more of their time" (paraphrased). | visarga wrote: | No, programmers won't be replaced, we'll just add this to | our toolbox. Every time our productivity increased we | found new ways to spend it. There's no limit to our | wants. | wittycardio wrote: | I feel like you're neglecting to mention all the people | who need to build and maintain this AI. Cookie cutter | business logic will no longer need programmers but there | will be more highly skilled jobs to keep building and | improving the AI | foolinaround wrote: | AI will keep building and improving the AI, of course! | notsureaboutpg wrote: | Telling the computer what you want IS programming... | | When a new language / framework / library comes around, | GitHub copilot won't have any suggestions for when you | write in it. | arcturus17 wrote: | > and the rest of the time it suggests something rather good, | or completely off | | In what, proportion roughly? | fzaninotto wrote: | Hard to say, really. When writing React components, Jest | tests and documentation, it's often not very far. I found it | off when writing HTML markup (which is hard to describe with | words). | handrous wrote: | Seems like it's best classed (for now) as an automated tool | to generate and apply all the boilerplate snippets I | _would_ be creating & using manually, if I weren't too | lazy, and/or too often switching between projects, to set | all those up and remember how to use them (and that they | exist). | BrandonJung wrote: | for those of you like this concept but 1. did not get into the | alpha (or want something that has lots of great reviews) 2. | need to run locally (security or connectivity) 3. want to use | any IDE... please try Tabnine | gmaijoe wrote: | do you have an invite? very interested to check it out | hobofan wrote: | Have you used any intelligent code completion in the past? E.g. | I'd really be interested how it compares to TabNine[0], which | already gives pretty amazing single line suggestions (haven't | tried their experimental multi-line suggestions yet). | | [0]: https://www.tabnine.com | ayush--s wrote: | I'm curious as to how relevant Copilot would be when | autocompleting code that is specific to my codebase in | particular, like Tabnine completes most used filters as soon | as I type the db table name for the query. I'm a big tabnine | fan because it provides this feature. I'm much more often | looking to be suggested a line than an entire function | because I'm mostly writing business logic. | | also tabnine is useless in multi-lines completes. which is | where co-pilot should be strong. | meowface wrote: | Yeah, I've been very happy with Tabnine for a while, but | the prospect of good multi-line completions is appealing. I | might try running both Tabnine and Copilot simultaneously | for a bit to A/B test. | fzaninotto wrote: | I have used IDEs with good knowledge of the types and | libraries I'm using (e.g. VSCode with TypeScript). They offer | good suggestions once you start typing a function name. | | But nothing gets close to Copilot. It "understands" what | you're trying to do, and writes the code for you. It makes | type-based autocompletions useless. | grandchild wrote: | tabnine works quite similarly to copilot. it's not a thing | that "knows about types and libraries", it's a similar | predictive machine learning method as copilot seems to use. | pgib wrote: | I've been using TabNine for a couple years - constantly | impresses me, especially how quickly it picks up new | patterns. I wouldn't say it's doing my job for me, but | definitely saves me a lot of time. | skateris wrote: | Interestingly the founder of TabNine (which was acquired by | Codota[0]) is currently working at Open AI. I imagine they're | livid about Open AI creating a competing product. | | TabNine at times was magical, but I stopped using it after | Codota started injecting ads directly into my editor[1] | | [0] https://betakit.com/waterloo-startup-tabnine-acquired-by- | isr... [1] https://github.com/codota/TabNine/issues/342 | hobofan wrote: | Ah, thanks for the insight! It seems though that he is no | longer working with OpenAI according to his personal | website[0]. | | [0]: https://jacobjackson.com/about | fabiospampinato wrote: | How big/complicated are the functions Copilot is autocompleting | for you? I'm thinking perhaps reading 10 potential candidates | is actually slower and less instructive than trying to write | the thing yourself. | fzaninotto wrote: | It shows the suggestions line by line, and only shows the | best guess. It's not more intrusive than Intellisense. | | You can actually see all the code blocks Copilot is thinking | about if you want to, but that is indeed a distraction. | fabiospampinato wrote: | The problem I see with that is that's not possible for it | to understand well which code is the best, GPT-3 is trying | to mimic human writing in general, the thing is most human | code is garbage, if this system was able to understand how | to make code better you could keep training it until you | had perfect code, which is not what the current system is | giving you (a lot of the times anyway). | drusepth wrote: | >if this system was able to understand how to make code | better you could keep training it until you had perfect | code | | Based on the FAQ, it looks like some information about | how you interact with the suggestions is fed back to the | Copilot service (and theoretically OpenAI) to better | improve the model. | | So, while it may not understand "how to make code better" | on its own, it can learn a bit from seeing how actual | devs do make code better and theoretically improve from | usage. | Trollmann wrote: | You're missing the problem he stated: Code written by | humans is usually bad so the model is trained on garbage. | fzaninotto wrote: | I guess you miss the point. It's not trying to suggest | the perfect code. Only you know it. It's saving you time | by writing a good (sometimes perfect) first solution | based on method/argument names, context, comments, and | inline doc. And that is already a huge boost in | productivity and coding pleasure (as you only have to | focus on the smart part). | fabiospampinato wrote: | Maybe you are right, in my experience either the code you | have easily available to you (either because another | person or a computer wrote it) is perfect for your use | case (to the best of your knowledge anyway) or rewriting | it from scratch is usually better than morphing what you | have into what you need. | apexalpha wrote: | The animated example on https://copilot.github.com/ shows | it suggesting entire blocks of code, though. | verst wrote: | It does actually suggest entire blocks of code. I haven't | quite figured out yet when it suggests blocks or lines - | if I create a new function / method and add a doc string | it definitely suggests a block for the entire | implementation for me. | ehsankia wrote: | I see, I think the most useful case for me would be where | I write a function signature+docstring, then get a list | of suggestions that I can browse and pick from. | | Do you have examples of what the line ones can do? The | site doesn't really provide any of those. | verst wrote: | Take a look at this minimal example I just created where | I did just that -- created a new function and docstring. | This example is of course super simple - it works for | much more complex things. | | https://gist.github.com/berndverst/1db9bae37f3c809e5c3f56 | 262... | grumple wrote: | Even snippets are so bad I have to turn them off. I can't | even fathom how bad the suggestions are going to be for | full blocks of code. But I guess I'll see soon... | mdellavo wrote: | Should I be impressed that the example parse_expenses.py on the | home page doesnt include any error handling and uses a float for | currency? This seems like it's going to revolutionize copy and | paste programming. | acid__ wrote: | The output is great for a quick one-off script. Maybe if you | make the comments look more "enterprise-y", it'll go for more | careful code? | mdellavo wrote: | I would say the use of float makes it a nonstarter - even for | "quick one-off" scripts. That's a fundamental error in the | generated code. It maybe looks correct at a quick glance but | it's only introducing subtle errors to find down the line. | arrayjumper wrote: | It's a copilot. You're still the pilot. To be honest this seems | like it can definitely save me a bunch of googling and let me | stay in the ide. | mdellavo wrote: | in their own example the copilot aimed for the mountains with | float() - thanks but no thanks | rglover wrote: | Tell that to the new programmer who builds a piece of | software using this creating an absolute mess. | | The danger here isn't with experienced developers (this is, | obviously, a tool with great potential for productivity). | It's with people who just blindly trust what the robot spits | out. | | Once code like that is implemented in a mission-critical | system without discernment, all hell will break loose. | | Edit: worth watching this for context | https://www.youtube.com/watch?v=ZSRHeXYDLko | sergiomattei wrote: | So? If you're making a mission critical system, don't hire | subpar developers. | | It's not Copilot's problem. Powerful tools can be misused | by anyone. | rglover wrote: | Tell that to the HR departments responsible for hiring | developers at major companies. | | Not hiring subpar developers, especially in a massive | company isn't a matter of "if" but "when." And it only | takes one screw up to crash an airplane because of | software. | | And guess who massive companies trust for their | technology? | | Microsoft. | frakt0x90 wrote: | So now I make a bot to upload repositories of intentionally buggy | code so that when people blindly use this autocomplete my hacking | becomes easier! | ranguna wrote: | Awh man, brilliant idea! | emsy wrote: | We already do this with libraries and frameworks in a more | primitive way. The hard work is actually done by a hand full of | programmers, everyone else is sticking pipes together. I don't | think that's a bad thing per se, but in my experience most people | don't make the distinction. You're not going to be able to write | your own database server with this tool if you weren't able to do | so without it. If you're one of the few programmers that are able | to build a database, a graphics engine, a compiler etc. you're | fine. Everyone else should probably feel a mild panic. You'll be | automated away in a couple of AI generations. | asdfman123 wrote: | Is there really that much of a difference between a programmer | building a database and one building something mundane? | | Obviously, there's some core algorithmic work that needs to | occur, but that's always been done by a very small number of | people anyway. The rest is still glueing. | pgib wrote: | Doesn't look like it will be editor-agnostic which is a big | shame. I've been using TabNine in Vim for a couple years, and I | love it. | beeskneecaps wrote: | Four years later: your AI replacement? When do you all predict | something like this will happen? | castlecrasher2 wrote: | I imagine that AI like this will certainly speed up | development, but I suspect you will almost always need someone | in the middle putting the pieces together. | joelbluminator wrote: | Maybe thirty four years later. I don't think there's AI to | gather requirements, talk to people, understand a problem and | produce code. That's kinda general intelligence level AI. But | this thing can possibly make devs work easier and if it's good | enough maybe smaller teams can produce more. | sitkack wrote: | This will allow for a D or C level coder to be a B/B- coder | which is great, quality goes up. But corps will use this to | depress wages and finally be able to create that wonderous | unicorn of completely fungible coder. | | This kind of tooling is akin to the crossbow. | | It will allow for less skilled folks to push out code that is | like other code at great speed. A copy pasta accelerator if | you will. | yellowfish wrote: | Is that a bad thing? sw-developers are grossly overpaid to | the point it's damaging | booleandilemma wrote: | Do you think project managers are overpaid too? | grumple wrote: | We each produce 1MM+ in actual revenue per year but | paying us 100k+ is too much? | nyghtly wrote: | If the reduction of developer wages led to the increase | of wages for other workers, sure. But of course that | won't happen. The reduction of wages for any class of | worker will simply lead to further consolidation of | wealth. | Sholmesy wrote: | Grossly overpaid for effort/comparison to other | wages/industries. | | Adequately paid for value captured (hence why companies | are willing to pay for them at this rate). | | Not a moral statement, but this gives more tools to those | in power. | xeromal wrote: | I tend to think like you. Somehow we convinced businesses | that a mostly-blue collar job gets paid white collar | salary but I've been told that SW-engineers aren't | overpaid. Most people are just very underpaid. | | What are your thoughts on that? I can lean that way just | because I have a genius mechanical engineer friend who | only makes 60k in his 30s. | yellowfish wrote: | I think it's easier to bring people down to the same | level rather then bringing them up to it | trutannus wrote: | I'm not at all worried about AI taking over software | development. In all likelihood, what you'll see instead are AI | plugins in IDE editors which just assist in a much more | advanced way than the intellisense we have now. Having machines | code out the business logic is very much so something that | would be less efficient than having a person do it. | | Realistically, it just means that, rather than your coworker | code-reviewing you and making a handful of comments, you get a | machine to do that, and get two or so comments from your | coworker about the business logic. | | To answer your question: never. | skocznymroczny wrote: | Nah. Remember that "Google AI signs you up for a hair stylist | session" demo? We never got anything out of that. | ImprobableTruth wrote: | Unless we achieve AGI this is never going to replace | programming because it will instead just make programming a | higher level task. And well, if we achieve AGI (which I think | could be pretty soon), all jobs will replaced, so it's not | something I think anybody should be worried about. | freedomben wrote: | As a programmer who specializes in security, I worry that many of | the common errors that people make will get picked up the AI and | recommended to new users. It looks like it gets reinforcement | from the code that the user selects. In my experience the | _majority_ of developers make security errors, so how is the | algorithm going to learn _not_ to do it the wrong way when it 's | learning from bad code and getting reinforced by developers who | are wrong? (This is an honest question, not a criticism. I think | this product is fascinating) | asdfman123 wrote: | Most programmers are pretty careless and just get something to | "work." | | Maybe the first generation of this sort of completion will be | bad, but I have full faith it will be better than the average | human at avoiding security issues as early as the next | generation. | bruce343434 wrote: | faith based on what? So far in my experience AI projects tend | to peter out. | manmal wrote: | The real magic of writing code is the compound interest it pays - | if you structure the lower level components well, then they can | be combined into ever more powerful components, saving time and | effort in an exponential way. | | This product seems to encourage the complete opposite - hack | together stuff without thinking about how it could fit into your | accumulated Baukasten of components. | david-cako wrote: | and away we go to singularity. | MeinBlutIstBlau wrote: | I think this would be a great resource for beginners | particularly. It helps give them the code that will work for them | and they can understand what it's returning. That was my biggest | issue when I first started with REST calls was not knowing what | to use and why things were being used on the server side. | Eventually it started to click for me, but I had a lead guide me. | In particular we worked with SharePoint which I had no idea had | it's own API at the time which added to the complexity. Overall I | think the "See different examples" is going to be the best | feature out of all of them. | ncr100 wrote: | Am I out of a job yet? (I am a programmer) | switchb4 wrote: | Calling it a pair programmer is a bit exaggeration though. How | can it match a human? | darkstarsys wrote: | Where's the Emacs lib for it? | mjsweet wrote: | Hmmm, there is something a bit... "Uncanny Valley" about this | code. What will my coworkers think? | [deleted] | hnarn wrote: | I'm sorry if I'm derailing the discussion here but "copilot" | really is a much better phrase to use for assisting software than | "autopilot". If more companies would choose phrases like these | that accurately emphasize that the human is not being replaced | but assisted (no names mentioned) I think it would benefit | everyone in terms of clarity. | | Sure, you might say "it's all marketing and if AP exists nobody | would buy CP", but I don't think it's that simple. Customers | understand when their expectations are being met, exceeded, or | let down. | fcsp wrote: | This looks pretty impressive and I'd like to play with it, but | it's disappointing that you're forced to accept telemetry to | enlist on the waitlist. I guess I'll wait for general | availability. | stakkur wrote: | The goal, of course, should be to train your AI partner to | eventually do _all_ the work, while you sip a beer at the pub and | monitor Slack (where an AI version of you is maintaining | conversations). | grouphugs wrote: | y'all really love fascist organizations, but not for long, trust | me | Trollmann wrote: | Will this hurt open source? | | I assume companies that were fine with providing you | functionality for free may think about this twice because with | that they're giving away knowledge of how to build functionality. | account_created wrote: | If you are interested to read[1] more about it. | | [1] https://docs.github.com/en/early- | access/github/copilot/resea... | antoniuschan99 wrote: | There's a lot of products that try to do away with programming in | general such as no-code. | | However, the elephant in the room is definitely a tool that can | auto fix bugs - the type of bug that is usually given to a junior | developer because the team doesn't want them building features | yet. | binarymax wrote: | Don't use this. You'll just be giving training data to OpenAI | (which NOT "Open" by any means). | schmorptron wrote: | Wow, that's incredible! Any chance we'll get a downloadable and | editable open source version of this so we can play around with | it, train it on smaller own datasets and generally experiment? It | sounds super exciting! | throwawake wrote: | Did you miss OpenAI part /s | | As FAQ mentions, They are planning to launch a commercial | product so likely not I guess. | IncRnd wrote: | For anyone who has this installed, how much of the codebase under | development leaves the machine? I am asking from the standpoint | of working on proprietary source code. | | Other than that single issue, which is really a set of issues, it | seems to work really well from the videos I've seen. | ranguna wrote: | Reading the their privacy policy: as much as Microsoft wants. | So you can probably expect all your codebase to be uploaded. | azinman2 wrote: | I worry about auto-complete on a more philosophical level. I've | noticed with gmail that it'll often suggest it's way of either | replying to an email or completing a sentence even though I'd | never actually use those words in that situation, simply because | it's easier. | | It's a pretty bad feedback loop that robs us of our independent | thought by way of falling victim to laziness, a fundamental human | weakness. You can imagine something where in this case the code | autocomplete is so large that you really want to make that | autocompletion work, even if you know it's not elegant or | possibly even correct code... or maybe it's just repetitive and | not abstracted well, but here it is autocompleted and done so why | would you fight that? | | If we continue abstracting more and more of this way, based upon | datasets that are averaged across everyone, we lose the | individual in favor of the masses, bringing us all down to a | common denominator. | | If we must lose our humanity to the machine, I'd at least like to | see an autocomplete from Peter Norvig's code, or writing from | particularly effective communicators or famous authors. | aerospace_guy wrote: | > feedback loop that robs us of our independent thought by way | of falling victim to laziness | | This was my first thought when I saw this. I intentionally | don't use predictive text whenever I can to preserve whatever | originality I have left. | arcturus17 wrote: | How does this compare to TabNine or Kite? | [deleted] | smoldesu wrote: | I'm willing to go all-in on something like this, _only_ if I can | get a promise that this will be an open project as time goes on. | I 'm not a fan of all of the "commercial" talk in this... if | OpenAI is involved, and most of this can run locally, why can't | it be fully open source? | ranguna wrote: | Most of it can run locally? Please point me to a trained gpt 3 | model, probably the main brains of this tool. | omgwtfbyobbq wrote: | I feel like this could be a great way to help people understand | new languages faster than they could otherwise. | wejick wrote: | Goodbye stack overflow copy pasting, welcome copilot auto | generated code. It will bring similar problem of someone using | something they don't understand. | | Another extra work for code reviewer. However on the right hand | it will be very powerful! | dragonwriter wrote: | > Goodbye stack overflow copy pasting, welcome copilot auto | generated code. It will bring similar problem of someone using | something they don't understand. | | To be fair, "beat on it until it seems to work" without AI | assistance or copy-pasting code _also_ frequently involves a | fair amount of "using something they don't understand". | | OTOH, copy pasta and AI assisted code are less likely to get "I | don't know why this works" or "I don't know why this is | necessary" comments to highlight that to the reviewer. | throwawayboise wrote: | Can I use it in Emacs? | jimmar wrote: | This would help me tremendously. I don't code frequently--a few | times a month at best. I find myself having to google syntax | constantly to write basic programs. If I want to write a simple | python command line program that parses input, for example, I can | guarantee I'm going to be opening tabs to figure out the syntax. | It would be great if something like Copilot could help me stay in | my editor. | nhumrich wrote: | I would love to see an equivalent where it generates all the | tests for you | robertlagrant wrote: | https://www.diffblue.com | | Here you go :) | fabiospampinato wrote: | It should be able to (try to) do something like that too. | | There's a little demo about that here: | https://copilot.github.com/ | | It's "just" an autocompletion system basically, if you write | something that looks like the beginning of a test it should | understand that and try to autocomplete that. | LeifCarrotson wrote: | I'd say it's more than 'just' an autocomplete system. | | Naive autocomplete, as implemented in Excel since forever ago | (and I'm sure long before that, I'm just familiar with being | annoyed by Excel suggesting wrong entries from its simple and | over-eager autocomplete system), merely matches a sequence of | characters - if I typed "aut" again in this paragraph it will | suggest "autocomplete" because I recently typed it. | Implementing it is the kind of task you give to a first-year | programming student to practice string matching data | structures, similar to a spell checker that merely checks | that a string exists in a dictionary. | | There's a spectrum from 'just' autocomplete, to a syntax- | aware system like VS Intellicode, to this, and eventually | beyond this. As mobile predictive text is to a spell checker, | so Github Copilot is to autocomplete. As mobile predictive | text is to GPT3 [1], so Github Copilot is to...what next? | GPT3 is not just a spell checker. | | [1] Also by OpenAI: | https://news.ycombinator.com/item?id=23345379 | fabiospampinato wrote: | I agree, that's why I put the "just" under quotes. It is | basically an autocompletion system, just much smarter than | human-coded ones in many ways. | duckkg5 wrote: | It does! | tsumnia wrote: | I wonder how Copilot's suggested snippet compares if the comment | is a CS1 homework prompt. | [deleted] | ericls wrote: | If you compile English to Python why not just compile it to even | lower level? | lxe wrote: | If you played with OpenAI beta, you won't be surprised -- it was | only a matter of time until this becomes widespread. | williesleg wrote: | Chink poo poo haba jama mama shadow banned yay hacker news! | hartator wrote: | How does it compare to Tabnine? | | I really like thar Tabnine train against your own codebase and | suggest things based on it. It's crazy accurate and smart a | surprising amount of time. | nomoreplease wrote: | CTRL+F for Privacy & Security. No mention? | amelius wrote: | I want one-shot learning for refactoring. I.e., show the editor | once what transformation I want to perform, and then the editor | takes over. | | Autocompleting code in general? Sounds like a bad idea, imho. | jms55 wrote: | So the AI doesn't actually understand the code does it? It only | looks for similar things. So if I'm writing a game in Rust using | the hecs ECS library, how is it going to help me? How many other | people have written a game in that language using that library in | this genre trying to do this task before? Probably very very few. | | And yeah that's a niche example, maybe this is super helpful for | writing a react app or something very library heavy. But gut | feeling without trying it is that there's no way this could | actually work on anything but the most common tasks that you can | google and find solutions for already, just made easier so you | don't actually have to google for it. | | When I used tabnine which learned from you as you edit, it was | fairly helpful for very repetitive code within the same project. | But it was no where near "read english and write the code I meant | for it". I'm curious to know how well this actually performs for | non-common tasks, and whether it can understand ideas in your | codebase that you come up with. If I make a | WorldAbstractionOverEntities thing in my code, and then later use | it in the project, will the AI be able to help me out? Or is it | going to go "sorry, no one on github has used this thing you came | up with an hour ago, I can't help you". An AI that could | understand your own codebase and the abstractions you make and | not just popular libraries would be infinitely more useful imo. | | That said, I haven't tried this, maybe it'll turn out really | good. | sktrdie wrote: | > how is it going to help me? How many other people have | written a game in that language using that library in this | genre trying to do this task before? | | Because you write code using a bunch of patterns: iterating | over data, destructing an object, calling functions etc. If you | can generalize the usage of these patterns in such a way that | it understands the context where you want to use them you're | essentially doing this Copilot thing. | | I agree that it'll be hard to have it understand fully the | context and paradigms behind your existing project, but if it | can help me automate some things in such a way that I can just | let it run its thing and than have me "poke around it to get it | right" then this is still amazing. | blondie9x wrote: | If it's reducing keystrokes. Eventually it might reduce the | problem down to no input needed at all. | | In that case? Are those who don't control the AI needed? What | about when those optimizations can be made by the AI itself also? | bravogamma wrote: | Generating new code is not very hard for humans. But maintaining, | extending, debugging code - often that's where the real | challenges are. Will the AI copilot be able to fix bugs in its | code based on bug reports? | [deleted] | modshatereality wrote: | github was such a nice place until microsoft fucking ruined it. | detaro wrote: | What has Microsoft done to GH to "ruin it"? | modshatereality wrote: | ignoring THE FUCKING TOPIC HERE. the UI is hideous now | without javascript enabled, before it was actually navigable. | Geee wrote: | I've been using TabNine[0] for a few years, which is great. How | does this compare? | | [0] https://www.tabnine.com | clement_b wrote: | This will have the same deep impact Stack Overflow has had on | code for the past decade. Good and bad! | Deathmax wrote: | GitHub says you don't need to credit GitHub for any of the code | suggestions, but since it's trained on public sources of code, | anyone have a clue on potential licensing pitfalls? | vgalin wrote: | From the FAQ at the bottom of the project showcase page[0]: | | "GitHub Copilot is a code synthesizer, not a search engine: the | vast majority of the code that it suggests is uniquely | generated and has never been seen before. We found that about | 0.1% of the time, the suggestion may contain some snippets that | are verbatim from the training set. Here is an in-depth | study[1] on the model's behavior. Many of these cases happen | when you don't provide sufficient context (in particular, when | editing an empty file), or when there is a common, perhaps even | universal, solution to the problem. We are building an origin | tracker to help detect the rare instances of code that is | repeated from the training set, to help you make good real-time | decisions about GitHub Copilot's suggestions." | | [0] https://copilot.github.com/ | | [1] https://github.co/copilot-research-recitation | 6gvONxR4sf7o wrote: | (not a lawyer) Copyright issues tend to involve the question of | how transformative a work is. This means the code coming out | the other end is probably fine. I don't know about the training | side, though. Are there license issues in using copyrighted | training data without any form of licensing? Typically ML | researchers have a pretty free-for-all attitude towards 'if I | can find data, I can train models on it.' | tasuki wrote: | > Tests without the toil. Tests are the backbone of any robust | software engineering project. Import a unit test package, and let | GitHub Copilot suggest tests that match your implementation code. | | Isn't that the wrong way around? I'd like to start by writing the | tests, and for GitHub Copilot to please implement the function | that makes them pass. | enobrev wrote: | I just watched this demo on twitter[1] and found it entertaining | that it was auto-completing useless, redundant comments for the | developer. Pretty sure that says more about all of us developers | as a whole than about Copilot. | | 1: https://twitter.com/gr2m/status/1409909849622601729 | drummer wrote: | So this is basically fully automated Stackoverflow assisted | programming? Excellent. | sama wrote: | AI FTW! | | (dang please don't ban me for a low-quality comment :) i couldn't | resist but will not make it a habit!) | [deleted] | YeGoblynQueenne wrote: | From the FAQ: | | https://copilot.github.com/#faqs | | >> How does GitHub Copilot work? | | >> OpenAI Codex was trained on publicly available source code and | natural language, so it understands both programming and human | languages. | | I want to propose a new Bingo style game where you get points | when AI researchers (or rather their marketing teams) throw | around big words that nobody understands, like, er "understands" | with abandon. But I can't be bothered. | | An AI researcher called Drew McDermot warned against this kind of | thing in a paper titled "AI meets natural stupidity": | | http://www.cs.yorku.ca/~jarek/courses/ai/F11/naturalstupidit... | | In _1976_. | | Why doesn't anybody ever learn from the mistakes of the past? | pnt12 wrote: | In AI research lingo "understands" really means "makes decent | correlations". | YeGoblynQueenne wrote: | Please provide a reference for this explanation. | f38zf5vdt wrote: | This could be the kind of thing easy to train and run locally on | a GPU. | | - You only need code relevant to your language. | | - The amount of unique code in your language is going to be | relatively small in compared to say, the entire history of | internet comment. | | - Training a model also shouldn't take long. | | I personally would never use an online version feeding all my | code back to a home server somewhere, and leaving breadcrumbs | that might suggest I'm violating the GPL. | ranguna wrote: | Tabnine already does this locally, but I didn't have that good | of an experience. | rberger wrote: | Assuming this is actually a useful and powerful addition to a | programmer's quiver (which it looks like it will be if not | already is). Then the tying it to Visual Studio is a variation of | Microsoft's "Embrace and Extend" philosophy but for Programmers | and open source in general. | | It uses Open Source as its input but, as far as I can tell (and I | would be pleasantly surprised if I was wrong), CoPilot itself is | not Open Source. | | It is also tied to Visual Studio, making Visual Studio, a | Microsoft Product and on its way to monopoly position even more | up the power law curve to monopoly status. | | This would be much more interesting and less concerning if | CoPilot was Open Source and designed to plug in to other Editors | / IDEs like via lsp or something similar. | kujino wrote: | VSCode has telemetry, the extension market place can't be used | by non-microsoft products, VSCode is not open source (only | VSCodium is), many of the MS extension are not open source | (like live collaboration), etc. | | VSCode followed the classic big tech recipe : 1) make it open | source to look like the good guys & get adoption and | contributions 2) close-source many key components & add | spyware. | | Story of Android too pretty much | gmueckl wrote: | I expected the appearance of machine learning based developer | tools pretty much the moment Github was bought by MS. Liberal | access to everything in the largest collection of source code in | the world is the ideal starting point for developer tools like | this one. Githun/MS doesnt have to obey the same rate limiting | when accessing the site that the rest of the world has to put up | with. | | Now there are only three questions left: how will this be | monetized? Will it recoup the purchasing price of Github? And how | can vendor lock in be achieved with this or sny follow up | products? | dannywarner wrote: | This is a great use of the OpenAI Codex. It's fast in VS Code. My | early impression is that the more popular the language the better | it does. So Javascript is almost magic and something like Rust | still useful. I'm looking forward to using it more. | ramoz wrote: | How is GitHub/OpenAI ensuring this tech doesn't throw the | industry towards a flawed human/tech capability or dystopian of | sorts? | | Like... a bubble of developing code from yesterday's code (plenty | flawed itself) and with exponential growth based on feedback | loops of self-fulfilling prophecy ---- I'm assuming copilot (and | every openai variant to come) will essentially retrain itself on | new code developed,which overtime might all be code it wrote | itself. | | Did we just create a hamster wheel for the industry, or high-rise | elevator? | ramoz wrote: | I do think this turns into a modern stack overflow however. If | observing local runtime errors, debugging process, and | tidying/fixing code it produces itself. Plus train on all the | SO questions out there. ___________________________________________________________________ (page generated 2021-06-29 23:00 UTC)