[HN Gopher] Analyzing the legal implications of GitHub Copilot
       ___________________________________________________________________
        
       Analyzing the legal implications of GitHub Copilot
        
       Author : HNCommenterAD
       Score  : 149 points
       Date   : 2021-07-15 16:08 UTC (6 hours ago)
        
 (HTM) web link (fossa.com)
 (TXT) w3m dump (fossa.com)
        
       | darau1 wrote:
       | I honestly thought the great-gitlab-exodus indicated that people
       | saw this coming a hundred miles away.
        
       | heavyset_go wrote:
       | > _"If you look at the GitHub Terms of Service, no matter what
       | license you use, you give GitHub the right to host your code and
       | to use your code to improve their products and features," Downing
       | says. "So with respect to code that's already on GitHub, I think
       | the answer to the question of copyright infringement is fairly
       | straightforward."_
       | 
       | GitHub's Terms of Service doesn't override licensing terms.
        
         | invokestatic wrote:
         | Actually, it does. When you upload code to Github, you are
         | effectively "dual licensing" the code to them under Github's
         | terms. Github is not bound to any other licenses you may have
         | applied to your license, because it did not agree to those
         | terms. It only agreed to the terms spelled out in the Terms of
         | Service. Of course, there are edge cases in which you could
         | upload code to Github that you do not own, for which I do not
         | know the answer to.
        
         | lakecresva wrote:
         | It doesn't have to, the rights you grant Github when you agree
         | to the ToS and upload your work exist independent of any rights
         | you might grant as part of the repo's license.
        
       | sampo wrote:
       | > Downing thinks there's a strong case that Copilot uses said
       | code in a transformative manner, which would support a fair use
       | argument that there is no copyright infringement.
       | 
       | Fair use seems to be a legal concept that mostly only exists in
       | the anglosphere. How will this be in the many other countries,
       | then?
       | 
       | https://en.wikipedia.org/wiki/Fair_use
        
       | Rolpa wrote:
       | Here's an inquiry for those more knowledgeable about IP law than
       | myself: what's the state of the law regarding training an AI on
       | copyrighted material besides code? I was debating this with
       | someone in relation to the high definition texture packs for old
       | games people have been making using models such as ESRGAN - do
       | these infringe the copyright of the rights holders of the
       | original assets? Or are they considered sufficiently
       | transformative to be considered an original work?
        
       | mdasen wrote:
       | The problem with GitHub Copilot is that you never quite know
       | where the suggestion comes from.
       | 
       | As the article notes, longer and more complex blocks of code are
       | most likely copyrightable.
       | 
       | > GitHub reports that Copilot is mostly producing brand-new
       | material, only regurgitating copies of learned code 0.1% of the
       | time.
       | 
       | For me, the issue is one of risk. Let's say that you have 100
       | developers at your company making software for you and they
       | decide that Copilot is great. 1 in 1,000 suggestions is
       | regurgitated code verbatim. Let's say that only 1 in 10 of those
       | suggestions is sufficiently long and complex enough that it
       | warrants copyright protection. Within a week, you'd have to
       | assume that you have dozens of copyrighted pieces of code in your
       | codebase. The big issue is that you now don't know where the code
       | came from and which pieces might be direct copies. It opens up a
       | bit of a can of worms for a company looking to avoid risk.
       | 
       | I think one of the pieces that might get overlooked is someone
       | trying to weaponize Copilot. For example, Wikipedia has seen
       | people upload creative-commons licensed media to Wikipedia and
       | then become very litigious against people who might be slightly
       | off in the attribution requirements. Attribution requirements are
       | often more complicated than just "provide whatever attribution
       | you think makes sense." The images are legitimately creative-
       | commons licensed, but if someone doesn't provide the correct
       | attribution, they sue them. This attribution can include the
       | documentation of the modifications made, author, link, link to
       | the license (which I think a lot of people forget), copyright
       | notice, etc.
       | 
       | https://news.ycombinator.com/item?id=27606035
       | 
       | I don't think most people are looking to be copyright trolls.
       | However, Copilot offers a neat little way to potentially inject
       | your code into other people's programs. Will people start
       | searching for uses of their code and use it as a form of
       | copyright trolling? I don't think most people will, but we've
       | seen it happen with patents and images.
       | 
       | If you have a hundred engineers creating dozens of co-pilot
       | suggested blocks per day, we're talking around a million blocks
       | in a year. I don't think the odds of any individual suggested
       | block being a problem are high. The issue is when you start
       | scaling that up. If we're talking about a large company, the risk
       | can start getting large. You don't know where the code came from
       | and it starts getting likely that verbatim pieces of someone
       | else's code are finding their way into your codebase.
       | 
       | Does Copilot offer enough value to offset this risk? Will future
       | versions of Copilot make sure that the suggestions are
       | sufficiently different from the training source? Heck, there can
       | even be chicken-and-egg problems where someone claims copyright
       | on a block of code that was generated by Copilot and you then
       | have to prove that your identical code generated by Copilot isn't
       | an infringement. Can you prove "yes, Copilot would have generated
       | that code block before you pushed your code into Github" when
       | they claim "Copilot only generated that code block for you
       | because of our code on Github"? It might not even be a company
       | that's evil doing this. Large companies often have no idea what
       | different parts of the company are doing - especially several
       | years later.
       | 
       | One thing I want to make clear is that this isn't just about
       | cases that would win on their merits. One of the big parts of the
       | Wikipedia discussion on overly-litigious uploaders is that it can
       | cost a lot more to fight infringement claims than they're asking.
       | If someone slaps your startup with a $250 "you stole my
       | copyrighted code" claim, do you hire a lawyer at an hourly rate
       | that might cost more, risk a trial costing tens of thousands, and
       | risk a judgement against you? Or do you pay them off with a small
       | amount of money to make it go away? I'm not saying this is a good
       | situation. I'm just noting that it definitely exists and trolls
       | can try and come after you at the worst times like when you're
       | trying to raise funding. Do you decide to fight it when you're
       | trying to IPO? Do you let the IPO price sink by a few percent and
       | lose you lots of money when they're just looking for $5,000 to go
       | away?
       | 
       | It just seems like adding a lot of risk.
        
       | tvirosi wrote:
       | If this really counts as fair use it turns into a giant loophole
       | to steal any IP you want. Just create a website with a github-
       | like TOS, upload some disney copyrighted pictures to it, train a
       | GAN super overfitted on the images, and then claim mickey mouse
       | as your own.
        
         | invokestatic wrote:
         | The legal system is generally pretty nuanced, considering
         | things such as intent and purpose. In this particular case, it
         | doesn't really matter how the new work was generated or
         | created. I don't really think that would be very relevant. The
         | most important factors would be how similar the new work was to
         | the original work, the intent, and how the new work affects the
         | value of the original.
         | 
         | Your proposal is just so substantively different from Copilot
         | that I don't see how the arguments for Copilot would apply.
        
         | 6gvONxR4sf7o wrote:
         | You can't claim mickey mouse as your own, but you can exploit
         | all the labor that went into creating all the work you're
         | training on. Point a generative model at someone else's labor
         | and now it'll do that for you. It seems like the person whose
         | labor is being used should be somehow compensated, or at least
         | have some say in its use.
        
         | sobellian wrote:
         | The lawyer's argument is that Copilot's query system is
         | transformative. If you assemble Copilot's output to replicate a
         | copyrighted work, then even if Copilot isn't infringing, you
         | are by taking that work out of context. The burden is on the
         | owner to ensure they don't infringe.
        
         | erhk wrote:
         | You cant just upload copyrighted photos. You dont own them.
        
         | tompazourek wrote:
         | This is interesting.
         | 
         | But I think you'd violate Disney's copyright by uploading their
         | pictures to the website.
         | 
         | To make it work, Disney would have to upload the pictures
         | themselves and agree to the TOS.
        
           | heavyset_go wrote:
           | Is GitHub making sure that license terms are being met when
           | they train Copilot on hosted code? Because anyone can rehost
           | code that they don't have the rights to, and it seems like
           | GitHub will still train Copilot on it.
        
             | tompazourek wrote:
             | If someone is rehosting code that they don't have copyright
             | to, it's like if someone would upload a pirated movie to
             | YouTube.
             | 
             | YouTube will still make money from it for some time
             | (selling ads, luring customers in, ...), then the copyright
             | holder asks YouTube to take it down, and then they take it
             | down.
             | 
             | The difference is that open source authors don't care that
             | much about that. But maybe now they will when they see what
             | GitHub is doing...
        
               | heavyset_go wrote:
               | > _YouTube will still make money from it for some time
               | (selling ads, luring customers in, ...), then the
               | copyright holder asks YouTube to take it down, and then
               | they take it down._
               | 
               | YouTube isn't publishing derivative work from the videos
               | it hosts, though, like Microsoft is doing with Copilot
               | and GitHub.
               | 
               | If Copilot was trained on material it doesn't have the
               | license to, it can potentially output that unlicensed
               | code it was trained on, like in this example[1].
               | 
               | Copilot could serve up copyrighted work in the same way
               | YouTube does, but the analogy isn't complete, because
               | YouTube itself isn't a derivative work in the same way
               | the Copilot's model is a derivative of the data it was
               | trained on.
               | 
               | [1]
               | https://twitter.com/mitsuhiko/status/1410886329924194309
        
       | 6gvONxR4sf7o wrote:
       | Regardless of whether it's fair use, copilot wouldn't be possible
       | without the enormous amount of person-hours of work that has gone
       | into writing the code it was trained on. There should be some
       | kind of compensation for the content creators when their work is
       | used to train models. The fair use argument is that "I could see
       | it" is enough to justify no compensation and no say in how their
       | work is used.
       | 
       | Legal? Probably. Should we do better? Probably.
       | 
       | At the very least, it should be opt-in. We'll probably need new
       | IP law to make this kind of thing opt-in.
        
       | dekhn wrote:
       | These agree with my conclusions- it's fair use or permitted by
       | license, but that it remains untested (as the GPL does in a
       | larger sense) by law.
       | 
       | I guess in about 5 years we'll see Softbank v. GitHub CoPilot in
       | the supreme court deciding whether ML can make transformative
       | work.
        
         | shadilay wrote:
         | Often times legal issues are more a question of finding a
         | plaintiff with enough money to sue rather than the letter of
         | the law.
        
         | kyrra wrote:
         | Google v Oracle took 11 years to reach conclusion.
        
         | wolverine876 wrote:
         | Why do we accept that the courts are so slow? I don't
         | understand why there isn't a drive to reform courts by
         | accelerating outcomes by an order of magnitude, and by making
         | outcomes not depend on wealth.
        
           | sokoloff wrote:
           | A court needs a case. A case requires at least one litigant
           | who is willing to go all the way to the extent of forcing a
           | court hearing (rejecting settlements along the way and
           | risking that a court will decide against them). I don't see
           | the courts as being the rate-limiting factor when we're
           | contemplating licenses that are 30 years old last month (GPL
           | v2)
        
           | [deleted]
        
           | gnopgnip wrote:
           | The courts being slow to change, respecting precedent is a
           | feature. It should be on congress to change the law and
           | ammend the copyright act
        
           | pc86 wrote:
           | You're assuming that the speed of a court case has something
           | to do with the wealth of the participants?
        
       | AdamJacobMuller wrote:
       | "no matter what license you use, you give GitHub the right to
       | host your code and to use your code to improve their products and
       | features"
       | 
       | I contribute my code to X project outside of github (say on a
       | mailing list) under the explicit understand that my code is under
       | GPL (say GPLv3 to be specific). If someone later uploads my code
       | to github and github uses my code to train their ML model in
       | violation of GPLv3 isn't the point that the person who uploaded
       | my code to github is in violation of GPL by giving it to someone
       | else under less restrictive terms?
       | 
       | Does this mean that the github terms of service are perhaps
       | fundamentally incompatible with uploading copyleft-style (or
       | perhaps specifically only GPLv3 level) restrictive licenses?
       | 
       | And, if so, probably they always were but nobody cared until now.
        
       | tyingq wrote:
       | _"If you look at the GitHub Terms of Service, no matter what
       | license you use, you give GitHub the right to host your code and
       | to use your code to improve their products and features," Downing
       | says. "So with respect to code that's already on GitHub, I think
       | the answer to the question of copyright infringement is fairly
       | straightforward."_
       | 
       | I don't know if it's really that straightforward. The TOS
       | includes snippets like this in that area:
       | 
       |  _" This license does not grant GitHub the right to sell Your
       | Content. It also does not grant GitHub the right to otherwise
       | distribute or use Your Content outside of our provision of the
       | Service"_
       | 
       | I'm omitting other language, but if you read that area of the
       | TOS, they seem to have purposefully scoped down their license-to-
       | use for hosting, backups, etc.
        
         | lindenksv85 wrote:
         | They technically don't distribute any code. They show it to you
         | in a hosted environment. It's the user that causes a
         | distribution. They "provide it as part of the service."
        
           | tyingq wrote:
           | I'm not sure how sending it over the wire to Visual Studio
           | doesn't count as distribution. Distribution without
           | attribution, reference to where it came from, how it's
           | licensed, etc.
        
             | dheera wrote:
             | Conceptually it's not particularly any different from
             | distributing it to your web browser. They basically just
             | turned Visual Studio into a fancy Github browser that has
             | some editing features.
        
               | inlined wrote:
               | I disagree. They're the API host, so they're the
               | distributor as well as the user agent
        
           | zufallsheld wrote:
           | In the same vain streaming sites don't distribute movies,
           | they show them to, you in a hosted environment. and this
           | argument did not hold up in court.
        
             | lindenksv85 wrote:
             | OSS license obligations mostly kick in upon distribution,
             | hence this is a pivotal concept in this context. It's also
             | important because of the language in the TOS that says the
             | code won't be used outside the service. The stuff related
             | to streaming is kind of unrelated here because movies
             | aren't under copyleft licenses and so the question of
             | whether or not there was distribution there is not
             | relevant- the question is whether or not the copyright
             | holder's monopoly right were violated and those include the
             | right of public performance, public display, as well as
             | distribution. They would have violated other copyright
             | rights even without a finding of distribution.
        
             | [deleted]
        
         | tomrod wrote:
         | Further, what if I branched the code from something hosted
         | outside of Github -- and failed to follow proper attribution?
         | 
         | This is a huge legal mess and its not being used to IMPROVE
         | Github products and ops, it IS the Github product.
        
           | kmeisthax wrote:
           | It's actually fairly difficult to remove attribution from a
           | Git repository. It's embedded in each commit. You'd have to
           | rewrite the entire project history - something far different
           | from just "branching" a repo.
        
         | matmann2001 wrote:
         | Technically, they aren't distributing "your" code. It's
         | laundered through their machine learning algorithm first.
        
           | hansvm wrote:
           | Is that actually different legally from an "ML" algorithm
           | that xors the code with the same garbage 1M times or
           | otherwise does something expensive to implement a noop?
        
         | grawprog wrote:
         | > to use your code to improve their products and features
         | 
         | >It also does not grant GitHub the right to otherwise
         | distribute or use Your Content
         | 
         | I'm curious. Copilot isn't actually part of github. It's a
         | plugin for Visual Studio wouldn't that mean copilot is
         | distributing code hosted on github, outside of github? You
         | can't use copilot without visual studio.
         | 
         | How is this not Microsoft just parasitizing all the code hosted
         | on github to make visual studio better? Which as far as i know,
         | depsite being owned by Microsoft is not actually part of
         | github.
        
           | LeifCarrotson wrote:
           | I think it all depends on who 'they' are and what 'their
           | products' re. The definition says:
           | 
           | > _' GitHub,' 'We,' and 'Us' refer to GitHub, Inc., as well
           | as our affiliates, directors, subsidiaries, contractors,
           | licensors, officers, agents, and employees._
           | 
           | Does that also include OpenAI? Does it include the Visual
           | Studio team? All of Microsoft?
           | 
           | The license granted by users to Github is:
           | 
           | > _We need the legal right to do things like host Your
           | Content, publish it, and share it. You grant us and our legal
           | successors the right to store, archive, parse, and display
           | Your Content, and make incidental copies, as necessary to
           | provide the Service, including improving the Service over
           | time. This license includes the right to do things like copy
           | it to our database and make backups; show it to you and other
           | users; parse it into a search index or otherwise analyze it
           | on our servers; share it with other users; and perform it, in
           | case Your Content is something like music or video._
           | 
           | > _This license does not grant GitHub the right to sell Your
           | Content. It also does not grant GitHub the right to otherwise
           | distribute or use Your Content outside of our provision of
           | the Service..._
           | 
           | IANAL, but naively, Github appears to be a code hosting
           | platform. If they need to analyze my code to make it work
           | with Git and with their code hosting features, that makes
           | sense. For example, they might have a feature to prevent
           | inadvertent commits of private keys, and would need to parse
           | my code to do so. Maybe my code contains stuff that doesn't
           | work with their generic private-key-finding parser, and they
           | need to specifically run a subset of my code on their
           | platform through their parser in a debugger to fix the
           | feature. That's a sensible license to grant to a code hosting
           | platform, they're not a no-knowledge encrypted storage
           | provider.
           | 
           | They don't appear to be a software vendor that sells code to
           | other private parties for use in closed-source applications.
           | Their license appears to specifically deny them the right to
           | sell snippets of my code to others.
           | 
           | I suspect, however, that this isn't a black-and-white factual
           | issue, rather, one for a court to decide. One could probably
           | hire an attorney to argue any possible angle on the legality
           | of Copilot. And by a similar mechanism to the "Winner's
           | curse", the company who developed a tool like Copilot would
           | always have been one where their internal counsel advised
           | them that what they were doing was totally legal.
        
           | lindenksv85 wrote:
           | The "services" is that which is provided by GitHub. "GitHub"
           | is defined to include all of its affiliates, including
           | Microsoft.
        
         | nomoreplease wrote:
         | And "and to use your code to improve their products and
         | features," does not explicitly include "or to create new
         | products and features".
         | 
         | CoPilot is a NEW product, not an existing product (Github
         | itself) that the ToS gives permission to improve.
        
         | matmann2001 wrote:
         | Technically, they aren't distributing Your Content. It's
         | laundered through their machine learning model first.
        
           | tyingq wrote:
           | With degrees of laundered varying from _" copied verbatim"_
           | to _" minor things like symbol names changed"_ to _" actually
           | transformed significantly"_.
        
         | [deleted]
        
         | BeefWellington wrote:
         | An interesting thought experiment around this whole topic: If I
         | were to take all the scripts of profitable films rated G or PG
         | and train an AI on it, generate a bunch of scripts, then made
         | movies out of those scripts, would I lose in court?
         | 
         | Tangibly, how is this AI method substantially different from
         | non-clean-room implementations?
         | 
         | In terms of business use, it seems incredibly risky to me to
         | even just *use* GitHub since their license agreement/ToS permit
         | them to use my code to improve their tools which now apparently
         | includes tooling where it may copy your code wholesale as
         | someone else's suggestion.
        
           | kmeisthax wrote:
           | No thought experiment needed. If I watch a bunch of movies
           | and then make my own movie, whether or not I lose in court
           | depends on if the movie I made is at least "substantially
           | similar" to any movie I happened to watch - or, in other
           | words, had "access" to. That's a fact-intensive thing that
           | juries usually decide on a case-by-case basis.
           | 
           | The difference between that and having an AI do it is
           | probably low. My gut instinct is that using an AI constitutes
           | "access" to the AI's training corpus, so if it spits out
           | something at least substantially similar to that corpus, then
           | I'm infringing if I use that output. If it _doesn 't_
           | constitute access, then a copyright owner would have to prove
           | "striking similarity", which would really only cover things
           | like using Copilot to spit out fragments of old Quake code
           | verbatim.
           | 
           | Clean-room is a way of arguing down the level of access that
           | you have to something that you want to make a non-infringing
           | copy of. It usually requires having actual attorneys review
           | everything the clean-room engineers get to see, and stripping
           | out the parts that are actually copyrightable. Merely
           | training an ML system on input as a way to only have access
           | to the uncopyrightable parts of that input probably wouldn't
           | work.
           | 
           | Pretty much every Internet service is going to have similar
           | clauses to GitHub's; because anything else would basically be
           | a "click here to make me liable for copyright infringement"
           | button. In fact, I wouldn't be surprised that merely running
           | something like GitHub but without a ToS would still give you
           | similar levels of implied license over whatever people push
           | to your server.
        
         | kzrdude wrote:
         | What about open source projects where the uploader and github
         | users are not the only copyright holders? As a user i can't
         | grant github any random license for the code, if I maintain for
         | example Linux or python or any other old project there.
         | 
         | The ONLY available terms are those given by the license,
         | surely?
        
           | lindenksv85 wrote:
           | If you are putting up code on GitHub to which you don't have
           | all the rights you're actually in violation of their TOS and
           | you are violating the rights of other copyright holders. I
           | understand this is common and may not violate community norms
           | or expectations but it is technically a license violation on
           | multiple fronts. Contributors who add to existing GitHub
           | projects are providing the same license to GitHub as the
           | project maintainer though per the TOS.
        
             | rightbyte wrote:
             | I guess the Github Copilot authors did not handpick
             | projects they checked were legitimately put on Github. So
             | they are accomplices in that case.
        
               | filoleg wrote:
               | YouTube doesn't really handpick things that get put on
               | their platform either, beyond very basics and whatever
               | automated tools they have to cover that.
               | 
               | Beyond that, that's what DMCA takedown requests are for.
               | Github would only be an accomplice in that case, if they
               | got a legitimate DMCA takedown request and chose to
               | completely ignore it.
        
             | jolmg wrote:
             | > If you are putting up code on GitHub to which you don't
             | have all the rights you're actually in violation of their
             | TOS and you are violating the rights of other copyright
             | holders.
             | 
             | I can't find where in the TOS it says that you must "have
             | all the rights [to the code]". It just says that you must
             | not violate copyright nor other laws.[1] FOSS licenses by
             | definition permit redistribution, so uploading to GitHub
             | seems to be in-line with the license granted by the
             | copyright holders.
             | 
             | What are the violations you mention?
             | 
             | > Contributors who add to existing GitHub projects are
             | providing the same license to GitHub as the project
             | maintainer though per the TOS.
             | 
             | Sure, but that's not the only way. If you contribute to a
             | FOSS project elsewhere, those changes go under the same
             | license of the project. Whoever you pass those changes to
             | has liberty to redistribute per the terms of the license.
             | The TOS is unneeded to legally redistribute FOSS-licensed
             | projects with GitHub.
             | 
             | The TOS saying that you must grant GitHub these permissions
             | is only to protect GitHub in cases where people upload
             | projects without licenses.
             | 
             | [1] in addition to content restrictions, like no porn.
        
             | sudosysgen wrote:
             | But legally, they can't provide such a license. So GitHub
             | can't have that license, surely, because they never had the
             | legal authority to bestow it upon Github.
        
               | lindenksv85 wrote:
               | That was a problem before copilot though. And copyright
               | holders have and will continue to have the right to send
               | DMCA take-down notices if they like.
        
               | jacoblambda wrote:
               | But the thing to note is that a user can have a right to
               | distribute (as with GPL) but does not necessarily have
               | the rights to the license.
               | 
               | So if the user uploads the source to GitHub, they agree
               | to the terms (which they may not actually have the rights
               | to) but that isn't equivalent to the rights owner giving
               | GitHub the rights to distribute the source under a
               | different license.
               | 
               | The TOS can only modify those distribution terms (if it
               | even can be found to be legally binding) if the user
               | uploading the source is the rights owner which in so many
               | cases is not the case.
        
               | josephh wrote:
               | I think the bigger question is whether GitHub will be
               | able to honor DMCA requests that pertain to copyrighted
               | materials showing up in Copilot's suggestions.
        
               | Animats wrote:
               | A third party who finds their GPL code on Github but is
               | not themselves a user of Github has a right of action.
               | They're not bound by Microsoft's terms.
        
               | lakecresva wrote:
               | I'm not sure that someone who published their work under
               | the GPL hasn't thereby given consumers the right to put
               | the repo on github. If the rights Github asks for in
               | their ToS can be construed as a subset of the rights
               | granted by the GPL, Github is just another GPL licensee.
               | Unless they violate the conditions of the license,
               | they're just utilizing their GPL rights.
        
               | eitland wrote:
               | > Github is just another GPL licensee. Unless they
               | violate the conditions of the license, they're just
               | utilizing their GPL rights.
               | 
               | And here is exactly the problem.
               | 
               | GitHub seems to be copying copyrighted code left and
               | right _and pretend they made it!_
               | 
               | No attribution, no license.
               | 
               | They are of course allowed to let their AI study the
               | code, but as "employer" of that AI GitHub/Microsoft has a
               | responsibility if that AI breaks copyright right and left
               | and they as a company pretend the code is theirs to give
               | away.
        
               | AdamJacobMuller wrote:
               | > is not themselves a user of Github
               | 
               | Is it that widely scoped? Can't we narrow it to "A third
               | party who finds their GPL code on Github but has not
               | uploaded that specific code to Github themselves has a
               | right of action limited to that specific code."
               | 
               | Just because I created a github account once and agreed
               | to the TOS doesn't mean that I agree to let others upload
               | my code to github, where would that scope end. Could
               | someone steal code off my computer which i've never
               | published and put it on Github and that was OK because I
               | once signed up for a github account, clearly a contrived
               | example but.
        
             | kzrdude wrote:
             | Today is the first time I've considered that, but it's
             | certainly something we should think about. If big projects
             | moved on this, I think github would take notice and "issue
             | a clarification".
        
         | btilly wrote:
         | _I don 't know if it's really that straightforward._
         | 
         | It gets worse. To the extent that it is that straightforward,
         | the correct takeaway is that you do not have permission to
         | include someone else's GPLed code in your Github repository.
         | 
         | And that to the extent that GitHub relies on that permission in
         | using the code that they host, they are liable for potential
         | copyright claims from copyright owners that they have no
         | relationship with, who never gave GitHub permission to use that
         | code.
         | 
         | I therefore think that GitHub should do some careful thinking
         | about how much they can rely on a ToS to do as they want with
         | the copyleft code that they host. And I further think that
         | people who host GPLed projects should ask whether GitHub is
         | where they should be hosting those projects.
         | 
         | (Insert the mandatory, "I am not a lawyer and this is not legal
         | advice.")
        
           | [deleted]
        
         | phkahler wrote:
         | Yeah, I don't think bettering their products includes verbatim
         | incorporation of code into those products.
         | 
         | Also, for the part about small snippets being non
         | copyrightable. I would suggest looking at the Google/Oracle
         | case. Google was found guilty of infringement for a very small
         | number of lines, but the award to Oracle was IIRC rather a joke
         | (something like one dollar, indicating it was infringing but
         | largely irrelevant).
        
           | zja wrote:
           | The Supreme Court found Google's use to be fair, not
           | infringement.
        
             | jcelerier wrote:
             | the supreme court did not reconsider the previous judgment
             | on the 9 lines of sorting algorithm being copied, which was
             | _not_ considered fair use
        
       | MrStonedOne wrote:
       | >"If you look at the GitHub Terms of Service, no matter what
       | license you use, you give GitHub the right to host your code and
       | to use your code to improve their products and features," Downing
       | says. "So with respect to code that's already on GitHub, I think
       | the answer to the question of copyright infringement is fairly
       | straightforward."
       | 
       | Not as straightforward as they think thou.
       | 
       | If a code project used (a)gpl code found elsewhere on the
       | internet in their repo, and another user took the project and
       | hosted it on github, the tos can not give github a license to use
       | the code outside of the license given by (a)gpl, even if github
       | thinks they have one, that won't shield them from legal
       | liability, nor will it shield co-pilot users from being legally
       | compelled to (a)gpl their code if a court case was won on those
       | grounds.
       | 
       | The github tos is basically a non-factor in this case.
        
       | SethTro wrote:
       | I find the first argument, that if you're project is in GitHub
       | then they have the right to train in it, weak. Plenty of projects
       | are hosted elsewhere but have been mirrored by random users (e.g.
       | not the copyright holder) to GitHub
        
         | kbenson wrote:
         | I think they have a right to train in it, but not to present
         | portions verbatim. Do you have a right to look at a bunch of
         | open source code and come to conclusions about good programming
         | practices? Are you prevented from knowing that a specific
         | library in a language is good/common for a specific task
         | because you see others using it?
         | 
         | That's analogous to training, where there are associations
         | between things, in my mind. I don't think that means they can
         | provide licensed code verbatim though, just as you should not
         | copy GPL code directly out of a Github repo and paste into your
         | own private commercial code base.
        
           | heavyset_go wrote:
           | You're taking the machine "learning" metaphor literally. A
           | human being learning something is not analogous to training
           | an ML model. Training models is more analogous to compilation
           | or lossy encoding or compression.
        
             | michaelpb wrote:
             | The biggest mistake of the ML field is its metaphorical
             | naming. So many people seem to be taking Artificial
             | Intelligence, Machine Learning, Neural Networks etc
             | literally. They don't do this for other concepts in coding
             | (eg for an absurd example, no one is arguing we ride a CPU
             | "bus" to work), but with ML algos its a free-for-all.
             | Grandiose naming conventions might be good for extracting
             | VC money but it's also seriously confusing people.
        
             | kbenson wrote:
             | I'm thinking more "association" than "learning", and in
             | both cases.
             | 
             | If an algorithm of some sort scans a bunch of repos
             | regarding video encoding and decoding and sees a lot of
             | ffmpeg use, it might associate ffmpeg with video encoding
             | and decoding, and decide to present some info about ffmpeg
             | and a _generic_ snippet to include ffmpeg as a library and
             | initialize it if it associates the current project with
             | that.
             | 
             | If I have perused a few encoding or decoding repos at some
             | point and I think of the current project as having to do
             | with encoding or decoding of video, I might immediately
             | think ffmpeg even if I've never used it in a project as a
             | library because I remembered seeing it in projects that
             | used it, and look for some initialization code.
             | 
             | In what ways are these materially different? What makes the
             | random conceptual associations in my head from what I've
             | seen previously different than an algorithm that collects
             | the same?
             | 
             | > Training models is more analogous to compilation or lossy
             | encoding or compression.
             | 
             | And learning in people isn't? Isn't all knowledge
             | transference in people analogous to lossy encoding and
             | compression?
             | 
             | I don't know about you, but in college I don't remember
             | regurgitating sections of "Advanced Programming in the UNIX
             | Environment" to complete assignments, I remember studying
             | it, internalizing parts of it on a conceptual level (as
             | well as remembering specific fairly small chunks almost
             | exactly), and using that to solve problems or answer
             | questions or make associations.
             | 
             | I'm not saying ML and and learning in humans is the same. I
             | do think for the very specific case presented here in how
             | it's used, there are some parallels. Feel free to disabuse
             | me of that notion if you have evidence that contradicts it
             | though. I'm not wedded to that position, but I would want
             | to see arguments to the contrary before abandoning it.
        
       | gdsdfe wrote:
       | well a lot of people in here don't like these conclusions
        
       | legerdemain wrote:
       | Somewhat tangentially, Kate Downing is also the person who
       | somewhat recently campaigned to raise awareness of the crisis in
       | affordable housing in the Bay Area and Palo Alto in particular,
       | and wrote a viral editorial after giving up and moving to the
       | more affordable Santa Cruz.[1]
       | 
       | https://news.ycombinator.com/item?id=12288306
        
       | guitarbill wrote:
       | It would be nice if we moved from a copyright discussion to an
       | ethical one, since it could be years until the law is even
       | tested.
       | 
       | Is it ethical to do this, when some licenses are clearly chosen
       | because of e.g. attribution or sharing improvements? Did
       | Microsoft/GitHub consider the ethical implications, for example a
       | chilling effect on code being open sourced in future (i.e. people
       | choosing not to open source stuff so it doesn't get gobbled up by
       | Copilot et al.)?
        
         | dmitrygr wrote:
         | I wonder if one could enforce a license's "this code may not be
         | used to train any ML model of any sort for any reason without
         | prior permission".
        
           | progbits wrote:
           | I've been wondering the same though found basically no
           | discussion on this topic.
           | 
           | Let's ignore whether GPL or whatever license allows GitHub to
           | do this - let the lawyers sort this out. Instead we should
           | focus on whether it is possible to legally prevent such
           | behavior via license.
           | 
           | In other words, where is my GPLv4 with anti-ml clause?
        
             | EamonnMR wrote:
             | FSF and SFC and OSI if you're listening, this would be very
             | nice.
        
         | lindenksv85 wrote:
         | I think it's important to recognize that most ML models will
         | not be built in top of copyleft material. It will mostly use
         | data that we as users have voluntarily provided to someone at
         | some point and to which that platform now claims ownership. So
         | we need to think long and hard about whether or not we believe
         | any of these models should receive any copyright protection at
         | all and in a much broader context. I think if we insist on
         | claiming that copilot is copyrightable itself and should be
         | under GPL then we have totally capitulated with respect to all
         | other use cases in a way that actually further protects
         | incumbent advantage for large companies and which deprived
         | everyone else of any benefit or remuneration for their own
         | data. You're basically saying it's ok for companies to
         | privatize the collective knowledge of all of humanity. I'm not
         | on board with that.
        
           | guitarbill wrote:
           | I don't know if by "You're basically saying" you mean me
           | specifically, but if you do, you're dead wrong. I'm not ok
           | with this at all. However, I'm not so stupid to think me, as
           | a non-IP lawyer can make sense of the current legal situation
           | (which is what copyright is; law) or even propose new laws.
           | 
           | However, as a dev I can think about it and say "to me, this
           | is immoral and unethical", and refuse to use Copilot, not
           | work for any company that uses Copilot, not use
           | GitHub/Microsoft products, pull code from GitHub (if I had
           | any), and decide not to open source stuff in future. Ethics
           | has always been underemphasised in software compared to other
           | engineering disciplines.
           | 
           | Generally, non-technical people are (more) impacted by ML,
           | but in this case it's us as developers and our open source
           | communities. So I hope devs will give it some thought this
           | time. And if this leads to devs thinking about ML more
           | carefully in general, great. Things don't have to be illegal
           | to be unethical.
        
             | lindenksv85 wrote:
             | I didn't mean you specifically. I think the ethical
             | conversation is more interesting but I also think that
             | people will feel different if, say, the Linux Foundation
             | releases its own version of copilot and it's not just one
             | company reaping the rewards of all that code. And I'd like
             | to make it easy for other competitors to do exactly that.
             | It will be harder for them to do that if we think that the
             | models themselves are copyrightable. I don't think
             | something like copilot is going to make anyone think twice
             | 5 yrs from now any more than we think twice about something
             | like google autocomplete or google search thumbnail images.
             | I think stuff like copilot if properly tuned won't be
             | providing a substitute for whole GPL projects. I don't
             | think OSS communities will be damaged by this in any way.
             | In fact those same oss communities are going to be some of
             | the biggest users of these sorts of tools just like they
             | use stackoverflow today.
        
             | erhk wrote:
             | Github is not required to open source your work
        
         | erhk wrote:
         | You can opensource with Git without using github. You can self
         | host.
        
           | xdennis wrote:
           | If it's publically available, there's no guaranty that
           | Microsoft won't gobble up your code.
        
         | kube-system wrote:
         | The ethical discussion certainly has its merits, but the legal
         | discussion is very relevant for those of us who do not want to
         | be part of the legal test case.
        
       | jefftk wrote:
       | _> If you look at the GitHub Terms of Service, no matter what
       | license you use, you give GitHub the right to host your code and
       | to use your code to improve their products and features. So with
       | respect to code that's already on GitHub, I think the answer to
       | the question of copyright infringement is fairly
       | straightforward._
       | 
       | This doesn't sound right. Alice writes code, and releases it
       | under some restrictive license (GPL, something source-available,
       | etc). Bob uploads it to GitHub, correctly labeled as GPL.
       | Regardless of GitHub's TOS, Bob isn't able to give GitHub any
       | additional rights to the code beyond what Alice gave him.
       | 
       | I think the later discussion about whether this falls under Fair
       | Use is the important question.
        
         | pc86 wrote:
         | If Bob is unable to give GitHub the rights that GitHub demands,
         | then it means Bob was unable to lawfully upload the code to
         | GitHub in the first place. You're making an argument that Bob
         | violated GitHub's terms, not that GitHub is violating Alice's
         | (though that may also be true).
        
           | mikeryan wrote:
           | You're right that Bob's the infringer here and not GitHub.
           | But I'm not sure where that would place the derived work
           | that's eventually used.
           | 
           | Which is the point of the article a bit. GitHub is likely not
           | infringing but it's also not absolving the end user of any
           | infringement. Neat trick.
           | 
           | That being said I think the risk is minimal enough that I'd
           | be pretty comfortable with using it.
        
           | eitland wrote:
           | I learned here on HN that contracts are supposed to be a
           | "meeting of minds".
           | 
           | And in EU, as a consumer, you can pretty much ignore most
           | EULAs because they aren't valid if they break EU consumer
           | protections.
           | 
           | Now if your interpretation is correct the idea of a meeting
           | of minds falls completely on its face.
           | 
           | And, as a lot of individuals also upload their projects to
           | GitHub, GitHub is on shaky ground there as well.
           | 
           | I think most EULAs have clauses like these in them but we are
           | always told that it is because of crazy American lawyers and
           | nothing to worry about.
           | 
           | If Microsoft decides to prove once and for all that we should
           | worry about ridiculously broad claims in EULAs I think it
           | will be hard for GitHub to continue to operate in more sane
           | jurisdictions.
        
           | kuratkull wrote:
           | Violating the TOS is not illegal, Bob would be in breach and
           | GitHub could take measures against Bob's account. Github
           | would be violating Alice's copyright license though, that is
           | legally enforceable
        
             | lindenksv85 wrote:
             | Sort of. DMCA protects service providers against copyright
             | infringement claims related to stuff uploaded to their
             | services by third parties. So long as they adhere to DMCA
             | requests, they're not violating copyright law themselves.
        
               | bigwavedave wrote:
               | > Sort of. DMCA protects service providers against
               | copyright infringement claims related to stuff uploaded
               | to their services by third parties. So long as they
               | adhere to DMCA requests, they're not violating copyright
               | law themselves.
               | 
               | This is probably an extremely stupid question as I'm
               | neither a lawyer nor an ML dev (merely an humble backend
               | developer), but let's say that the above situation
               | applies and that Github has taken down Bob's repo as per
               | Alice's DMCA request. However, let's say that in between
               | Bob uploading the offending code and Alice submitting the
               | DMCA request, Github used Bob's repo as part of a
               | training set for Copilot. Now that they've complied with
               | the takedown request, does Github have to restore Copilot
               | to an earlier state that hadn't yet been trained by Bob's
               | repo? Does this question even make sense since I only
               | know the absolute barest bones of ML?
        
               | rented_mule wrote:
               | Also not a lawyer, but I've been around ML for a while.
               | The question makes perfect sense to me!
               | 
               | It takes some amount of time to comply with a takedown
               | notice. For example, time passes between receiving
               | Alice's notice and taking down Bob's repo.
               | 
               | I would expect Copilot's model(s) to be retrained
               | periodically in order to remain relevant. The next
               | retraining could exclude Alice's code. That might be a
               | longer window than the case of the repo takedown, but as
               | long as it doesn't take _too_ long they might be okay?
               | 
               | There are incremental training approaches that evolve
               | models over time rather than completely retraining them.
               | In my experience, complete retraining is a far more
               | common approach because the highly path dependent nature
               | of incremental training can lead to outcomes that are
               | hard to manage. For example, what if you discover bad
               | training data like repos that collect anti-patterns? Or
               | Alice's takedown notice? You typically want your models
               | to be able to "unsee" things and that's hard with purely
               | incremental training. Even when incremental approaches
               | are used, there is often an occasional complete
               | retraining to overcome such issues.
               | 
               | To be clear, I have no idea what training approach is
               | used for Copilot.
        
           | jefftk wrote:
           | GitHub's primary business is hosting open source software.
           | There's no way they are going to claim that every user who
           | uploaded code without owning the full rights is violating
           | their TOS.
        
             | dogleash wrote:
             | >There's no way they are going to claim that every user who
             | uploaded code without owning the full rights is violating
             | their TOS.
             | 
             | If they're taken to court and that part of the TOS is
             | relevant to the issue, then yes, they can and will argue
             | exactly that.
        
               | eitland wrote:
               | Well then let's hope the judges apply the same standards
               | as when criminals claim they wasn't aware that the money
               | they got was being laundered through them.
        
               | jefftk wrote:
               | That would destroy their business.
        
           | ipaddr wrote:
           | That's moving the goalposts after the fact.
           | 
           | What about all of the code that existed before Microsoft
           | purchased them and before new licease language was
           | introduced?
        
             | mikeryan wrote:
             | TOS language is usually not grandfathered in.
        
             | lindenksv85 wrote:
             | All of these terms of service have an assignment provision
             | that allows the provider to assign the agreement to an
             | acquirer. So the license you gave to GitHub moves to
             | Microsoft (though here the license likely remains with
             | GitHub because they are an independent subsidiary). All of
             | these agreements also say they can unilaterally change
             | terms whenever. The terms are generally always broad enough
             | to cover these circumstances.
        
       | fhajm wrote:
       | This lawyer does not understand GitHub. Half of the code is
       | uploaded by third parties who do not hold any copyright.
       | 
       | These people either think that for ideological reasons everything
       | should be on GitHub or they want Google links to their companies.
       | 
       | Furthermore "improve their services" reasonably only applies to
       | their core service that was present _when people agreed to the
       | TOS_ and not to some new code laundering AI.
       | 
       | It is frightening that this matter could be decided by such
       | lawyers in the US. People should just all leave GitHub, then
       | Microsoft can play with its own AI and enjoy the silence.
        
         | rhdunn wrote:
         | Then there's the issue of any project that uses CoPilot. For
         | example, if a developer of proprietary software uses this and
         | it is later found that the code matches GPL code, they would be
         | liable. Likewise if an open source uses code from a different
         | license or proprietary code via this.
         | 
         | Looking at the source code or the function and variable names
         | in binaries, you cannot tell if CoPilot is used or not, so
         | there isn't a functional difference between someone copying
         | that code or CoPilot copying it.
        
       | coding123 wrote:
       | I just keep flagging this. It's getting over analyzed into the
       | ground. Sue them if you want but it's just a waste of time to
       | keep talking about this.
        
       | wolverine876 wrote:
       | > As we mentioned, GitHub trains Copilot on numerous pieces of
       | public code, many of which are covered by strong copyleft
       | licenses (i.e. GPL v2, GPL v3). Copyleft licenses require that
       | derivative works (of the copyleft-licensed code) must carry the
       | same license as the original code.
       | 
       | Even when no GPL v2/3 code is quoted by Copilot, is using the
       | code for training a non-free product allowed under the license?
       | Under the license, is Copilot therefore now licensed GPL v2/3?
       | The GPL code was certainly used to create a critical, integral
       | part of Copilot, and to create its output.
       | 
       | If I understand correctly, GPL v2/3 were designed to prevent non-
       | free products from being parasites on FOSS code, taking and not
       | giving. If that's the spirit of GPL, Copilot seems to clearly
       | violate it.
        
         | invokestatic wrote:
         | When you upload code to Github, you agree to license it to them
         | under Github's terms, and not whatever license the software is
         | typically distributed under. You are effectively "dual
         | licensing" software by uploading it to Github, whether you
         | realize it or not. Of course, there are edge cases in which you
         | don't have the rights to license the software to Github, but in
         | those cases, I don't have the answer.
        
       | jcelerier wrote:
       | > "To the extent you see a piece of suggested code that's very
       | clearly regurgitated from another source -- perhaps it still has
       | comments attached to it, for example -- use your common sense and
       | don't use those kinds of suggestions."
       | 
       | how is "use common sense" even remotely a meaningful thing
        
         | rjzzleep wrote:
         | Especially coming from a supposed lawyer.
        
         | scintill76 wrote:
         | I thought this was weird too. Why are comments the dividing
         | line? Because they sound like a human? How do we know Copilot
         | won't regurgitate an exact copy of human code that doesn't have
         | comments?
         | 
         | It's kinda surprising Copilot even reads and outputs comments.
        
       | pedrocr wrote:
       | Given this fair use argument that the work is probably
       | transformative enough here's what I'll be doing next. I'll take
       | the Windows and Office source code, run it through a decompiler
       | and then train a neural network on that output. This sequence of
       | steps should be at least as transformative of Microsoft's
       | copyright as what Copilot is doing with the open-source corpus,
       | probably much more so. I will then use that neural network to
       | write patches for ReactOS and WINE. Since those projects are very
       | weary of interaction with Microsoft copyrighted works could
       | Microsoft Legal please publicly state their assurance that all
       | this is perfectly legal use of their copyrights? Maybe that would
       | help convince people.
        
         | EamonnMR wrote:
         | Might be faster to generate verbatim copies of Disney IP.
        
         | invokestatic wrote:
         | I've heard something similar in response to Copilot in another
         | thread (something like offering a sum of money to Github if
         | they train their model exclusively on the Windows NT source
         | code). But I think the legal theory here is that Copilot is
         | trained on many thousands of sources. If Copilot was trained on
         | a single source, or even a small handful of sources, the
         | derivative work claim becomes much stronger. When trained on
         | many sources, it becomes much harder to claim that its a
         | derivative of another work.
         | 
         | Take for example a human. If I studied a bunch of different
         | open-source projects, learned techniques from them, and
         | implemented them in my own projects, is that a derivative work?
         | Probably not. But if I were to reverse engineer Windows and
         | implement the techniques I saw in ReactOS, that's where it
         | seems issues start to arise.
        
           | pedrocr wrote:
           | So I just need to decompile Oracle's database and a few other
           | commercial products as well and I'm good? Is Microsoft legal
           | happy if I do Windows+Office+OpenSourceCorpus? I'd take that
           | statement as well. Or even if they just do that themselves
           | and train Copilot on their internal source code just as they
           | do with the public open-source corpus. That would be a strong
           | statement as well.
        
           | breischl wrote:
           | >If I studied a bunch of different open-source projects,
           | learned techniques from them, and implemented them in my own
           | projects, is that a derivative work? Probably not.
           | 
           | That's pretty unclear actually. If it's quite close to the
           | original work, it is derivative. Even though you have
           | probably been "trained" on quite a few different codebases
           | over the years. Hence the existence of clean-room
           | implementations, wherein the people building a new
           | implementation have never seen the original.
           | 
           | Also, given that code that has been passed through a
           | biological network (ie, brain) can constitute infringement,
           | it seems obvious that code passed through a mechanical one
           | could too. Maybe not in every case, but it certainly seems
           | plausible.
        
           | erhk wrote:
           | Well i could certainly construct a data set wjere windows NT
           | is an atomic outlier and muddy the water with arbitrary
           | inputs to satisy your irrelevant requirement. Perhaps ill jam
           | some pictures of cows in, or any nubmber or animal photos.
           | Maybe even some classical literature. Hell, maybe i even just
           | jam a shit ton of Javascript in. Thats code right?
        
           | heavyset_go wrote:
           | You're taking the machine learning metaphor literally.
           | Training an ML model is not the same thing as a human being
           | learning off of material.
           | 
           | A human being can understand abstract concepts and reason
           | about them based on material they learn from. An ML model is
           | a statistical model that is closer to compilation or lossy
           | encoding or compression.
           | 
           | Often, ML models can encode their training data verbatim in
           | the model itself, which is exactly what happened with Copilot
           | and this example[1].
           | 
           | [1] https://twitter.com/mitsuhiko/status/1410886329924194309
        
           | klyrs wrote:
           | > When trained on many sources, it becomes much harder to
           | claim that its a derivative of another work.
           | 
           | Sounds good in theory, until it starts producing snippets
           | verbatim from uniquely-identifiable sources.
        
         | ghoward wrote:
         | I am doing something very similar:
         | https://twitter.com/GavinDHoward/status/1415380847537135620 .
         | We'll see if they answer.
        
       | modeless wrote:
       | > If you look at the GitHub Terms of Service, no matter what
       | license you use, you give GitHub the right to host your code and
       | to use your code to improve their products and features
       | 
       | Sure, that's fine if the author of the code chooses to upload it
       | to GitHub. But what if they don't, and then someone else does? If
       | I take an AGPL project that someone else wrote and upload it to
       | GitHub, does that grant GitHub the right to use the code "to
       | improve their products and features" which are closed source? I
       | don't have the right to relicense the code, and neither does
       | GitHub, so clearly not.
        
       | _ph_ wrote:
       | I think the discussions miss a bit an important point. IANAL, but
       | I think if a young programmer reads a lot of source code on
       | GitHub, and based on this reading becomes a better programmer,
       | this is a fair use of copyrighted material and pretty much
       | independant of the license. If I read any book and learn the
       | corresponding language, this isn't a copyright violation of the
       | book either. This starts, when I begin to quote from that book or
       | the programmer takes snippets from the programs that got read.
       | 
       | The problem is, I don't think you can really claim that Copilot
       | learned to program. While some of the output seems to be
       | something new, most of the times it looks more like a
       | recomposition of learned fragments if not even longer pieces of
       | verbatim code taken from copyrighted material. We have seen
       | examples of this. And in this moment, it becomes a copyright
       | discussion, probably determined by the volume of copyrighted
       | material reproduced. Which by the way is always the risk if a
       | human uses certain training material. The better one is at
       | memorizing things, the more there is the risk.
       | 
       | Or put it the other way around: if Copilot would use its
       | "knowledge" of programs to advise the programmer like pointing
       | out potential errors without reproducing anything it used for
       | learning, it should be fine. But that is not how it works.
        
         | jozvolskyef wrote:
         | If a person who's never seen a goat looks at a million
         | copyrighted images of goats and draws a goat, are they
         | committing copyright infringement?
         | 
         | What if an algorithm does the same? The result is 'a
         | recomposition of learned fragments' in either case.
        
       | aaron695 wrote:
       | Do we have any proof that Copilot works?
       | 
       | I assume it's a pile of rubbish that's currently fooling Youtube
       | hype based programmers and followers. Has any ok but real
       | programmer used it solid for a week yet and wants to keep going?
       | 
       | This is tied to the legal argument.
       | 
       | If Copilot works (Which I cannot believe it would) it changes
       | many legal points. Garbage spewing out copyright code is
       | different to something that 'understands' copyright code.
       | 
       | And who cares about copyright if it's like all other hype based
       | AI currently, unusable in the real world. All the current HN
       | seems to be bike shedding around legal. Does noone program
       | anymore?
        
       | Zambyte wrote:
       | Google Books is a great parallel to make with Microsoft's
       | Copilot. The key differences between the two is
       | 
       | A) Google Books produces verbatim results 100% of the time, while
       | Microsoft's Copilot produces verbatim results some N > 0% of the
       | time (with some % of results greater than N that would be
       | considered a derivative if a human wrote it), and
       | 
       | B) Google Books doesn't make the claim that you own the copyright
       | to any greater than 0% of the search results, while Microsoft's
       | Copilot makes the claim that you own the copyright to 100% of the
       | results.
        
         | rjzzleep wrote:
         | If you copy a quote from Google Books you still have to
         | attribute the original author. It's not magically your text
         | just because it was hosted on Google Books. Why do they even
         | compare these two?
         | 
         | You can compare Github itself with Google Books, but not
         | copilot.
        
           | ghaff wrote:
           | >If you copy a quote from Google Books you still have to
           | attribute the original author.
           | 
           | Mostly because if you don't do so, that's a plagiarism issue
           | which the law mostly doesn't concern itself with except
           | insofar as an attributed quote, unless the length is truly
           | excessive, is likely to be seen as Fair Use while an
           | unattributed quote, especially if it's more than a minimal
           | snip, is not. (IANAL, etc.)
        
       | Animats wrote:
       | If you want to kill GitHub CoPilot, start posting useful snippets
       | of code which contain security backdoors, and wait for CoPilot to
       | put them in something.
        
       | swhalen wrote:
       | Does the fair use exemption (or an equivalent) exist in all
       | countries?
        
         | xdennis wrote:
         | Obviously not. Neither does DMCA, but that doesn't stop the USA
         | from enforcing it worldwide.
        
       ___________________________________________________________________
       (page generated 2021-07-15 23:02 UTC)