[HN Gopher] We've filed a lawsuit against GitHub Copilot
       ___________________________________________________________________
        
       We've filed a lawsuit against GitHub Copilot
        
       Author : iworshipfaangs2
       Score  : 444 points
       Date   : 2022-11-03 20:30 UTC (2 hours ago)
        
 (HTM) web link (githubcopilotlitigation.com)
 (TXT) w3m dump (githubcopilotlitigation.com)
        
       | iworshipfaangs2 wrote:
       | It's also a class action,
       | 
       | > behalf of a proposed class of possibly millions of GitHub
       | users...
       | 
       | The appendix includes the 11 licenses that the plaintiffs say
       | GitHub Copilot violates:
       | https://githubcopilotlitigation.com/pdf/1-1-github_complaint...
        
       | CobrastanJorji wrote:
       | As a non-lawyer, I am very suspicious of the claim that
       | "Plaintiffs and the Class have suffered monetary damages as a
       | result of Defendants' conduct." Flagrant disregard for copyright?
       | Sure, maybe. The output of the model is subject to copyright? Who
       | knows! But the copyright holders being damaged in some what?
       | Seems doubtful. The best argument I could think of would be
       | "GitHub would have had to pay us for this, and they didn't pay
       | us, so we lost money," but that'd presumably work out to pennies
       | per person.
        
         | toomuchtodo wrote:
         | The parallels to music sampling are somewhat humorous. Where is
         | fair use vs misappropriation? To be discovered!
        
           | schappim wrote:
           | Soon we'll have to use Mechanical Turk[0] to identify
           | existing opensource code similar to what Girl Talk did with
           | "Feed the Animals"[1].
           | 
           | Unrelated, how is it that Mechanical Turk was never truely
           | integrated w/ AWS?
           | 
           | [0] https://www.mturk.com/
           | 
           | [1] https://waxy.org/2008/09/girl_turk/
        
         | citilife wrote:
         | Say I produce a licensed library. Someone can pay me $5/year
         | per license. I keep the code private and compile the code
         | before sending it to customers.
         | 
         | If you have co-pilot trained on my code base (which was
         | private), that then reproduces near replica's of my code then
         | they sell it for $5/year...
         | 
         | Well, I'm eligible for damages.
        
           | sigzero wrote:
           | I don't believe it does anything with private repos and that
           | isn't what is being alleged.
        
             | mdaEyebot wrote:
             | It's the license that matters, not whether the code is
             | visible on Microsoft's website.
             | 
             | Code which anybody can view is called "source available".
             | You aren't necessarily allowed to use the code, but some
             | companies will let their customers see what is going on so
             | they can better integrate the code, understand performance
             | implications, debug and fix unexpected issues, etc. The
             | customers would probably face significant legal risks if
             | they took that code and started to sell it.
             | 
             | "Open source" code implies permission to re-use the code,
             | but there is still some nuance. Some open-source licenses
             | come with almost no restrictions, but others include
             | limiting clauses. The GPL, for example, is "viral": anybody
             | who uses GPL code in a project must also provide that
             | project's source code on request.
             | 
             | What do you think the chances are that Microsoft would
             | surrender the Copilot codebase upon receipt of a GPL
             | request?
        
           | yawnxyz wrote:
           | I don't think this is possible for co-pilot to do?
           | 
           | (If it was, please tell me how, since that would save me
           | $5/year across multiple libraries..!)
        
           | cheriot wrote:
           | > that then reproduces near replica's of my code
           | 
           | Copying a few lines is not the same as copying the whole
           | thing. Sharing quotes from a book is not copyright
           | infringement.
        
             | test098 wrote:
             | > Sharing quotes from a book is not copyright infringement.
             | 
             | It is if I take those quotes and publish them as my own in
             | my own book.
        
             | heavyset_go wrote:
             | If your intent is to create a competing product for profit,
             | chances are that won't be found as fair use, given that
             | determining fair use depends on intent and how the content
             | is used.
             | 
             | Using clips from a movie in a movie review is probably fair
             | use.
             | 
             | Using clips from a movie in knock-off of that movie for
             | profit? Probably not fair use if it's not a parody.
             | 
             | Copilot is not like a movie reviewer using clips to review
             | a movie. Copilot is like a production team for a movie
             | taking clips from another movie to make a ripoff of that
             | movie and selling it.
        
             | bawolff wrote:
             | I dont think that's comparable. For starters, its not just
             | the length of a quote that makes it fair use, but the way
             | quotes are used i.e. to engage in commentary.
        
           | joxel wrote:
           | But that isn't what is being alleged
        
         | TheCoelacanth wrote:
         | Aren't there statutory damages for copyright infringement, i.e.
         | there is a presumption that each work infringed is worth at
         | least a certain amount without proving actual damages?
        
         | kube-system wrote:
         | Those damages are enumerated on pages 50-52. Remember,
         | "damages" is being used in a legal sense here -- for a non-
         | lawyer, you can interpret it more like "a dollar value on
         | something someone did that was wrong". This is more broad than
         | the colloquial use of the word.
         | 
         | Sometimes damages are statutory, i.e. they have a fixed dollar
         | amount written right into the law. This lawsuit references one
         | such law: https://www.law.cornell.edu/uscode/text/17/1203
        
         | belorn wrote:
         | The common practice in copyright cases is to calculate damages
         | based on the theoretical cost that the infringer would have
         | paid if they have bought the rights in the first place. This
         | method was used during the piratebay case to calculate damages
         | caused by the sites founders.
         | 
         | They did not actually calculate damages in terms of lost movie
         | tickets or estimates vs actually sales number of sold game
         | copies. When it came to pre-releases where such product
         | wouldn't have been sold legally in the first place, they simply
         | added a multiplier to indicate that the copyright owner
         | wouldn't have been willing to sell.
         | 
         | For software code, an other practice I have read is to use the
         | man-hours that rewriting copyrighted code would cost. Using
         | such calculations they would likely estimate the man hours
         | based on number of lines of code and multiply that with the
         | average salary of a programmer.
        
           | pmoriarty wrote:
           | _" Using such calculations they would likely estimate the man
           | hours based on number of lines of code and multiply that with
           | the average salary of a programmer."_
           | 
           | The average salary of a programmer in which country?
           | 
           | So much programming is outsourced these days, and in some
           | places programmers are very cheap.
        
             | imoverclocked wrote:
             | Probably in the place where GitHub copilot is used and the
             | location of the authority of the court.
        
             | belorn wrote:
             | This is just my guess, but I think the intention from the
             | judges is not to actually calculate a true number. The
             | reason they used the cost of publishing fees in the
             | piratebay case was likely to illustrate how the court
             | distinguished between a legal publisher vs an illegal one.
             | The legal publisher would have bought the publishing
             | rights, and since piratebay did not do this, the court uses
             | those publishing fees in order to illustrate the
             | difference.
             | 
             | If the court wanted to distinguish between Microsoft using
             | their own programmers to generate code vs taking code from
             | github users, then the salary in question would likely be
             | that of Microsoft programmers. It would then be used to
             | illustrate how a legal training data would look like
             | compared to an illegal one.
        
           | whiddershins wrote:
           | I believe there are statutory damages or penalties in many
           | cases. At least with music and images.
        
           | karaterobot wrote:
           | The one thing we can say with complete certainty is that most
           | programmers who had their code used without permission will
           | not receive very much money at all if this class action
           | lawsuit is decided in their favor.
        
             | mike_d wrote:
             | I don't care about the money. I support this because it
             | will establish case law that other companies can't ignore
             | licenses as long as they throw AI somewhere in the chain.
             | 
             | If "I took your code and trained an AI that then generated
             | your code" is a legal defense, the GPL and similar licenses
             | all become moot.
        
               | bastardoperator wrote:
               | But that's not what's happening here. Also, you grant
               | GitHub a license.
               | 
               | https://docs.github.com/en/site-policy/github-
               | terms/github-t...
               | 
               | "You grant us and our legal successors the right to
               | store, archive, parse, and display Your Content"
               | 
               | Copilot displays content. Case closed.
        
               | mike_d wrote:
               | Feel free to keep reading the next line down:
               | 
               | "This license does not grant GitHub the right to sell
               | Your Content. It also does not grant GitHub the right to
               | otherwise distribute or use Your Content"
        
             | heavyset_go wrote:
             | I don't want money, I want the terms of my licenses to be
             | adhered to.
        
             | sqeaky wrote:
             | Money likely isn't the main goal (maybe it is for the
             | lawyers), these are open source repos. Maybe they didn't
             | consent to have their code used as training and that seems
             | like the kind of thing consent should be needed for. Maybe
             | this the AI spitting out copied snippets is a violation of
             | open source licensing without attribution.
        
           | michaelmrose wrote:
           | So for iseven can we go for how much a student might accept
           | 20 an hour say and multiply that by the one minute required
           | to create it and offer them 33 cents?
        
       | bpodgursky wrote:
        
       | Yahivin wrote:
       | Copilot does include the licenses...
       | 
       | Start off a comment with // MIT license
       | 
       | Then watch parts of various software licenses come out including
       | authors' names and copyrights!
        
       | machiste77 wrote:
       | bruh, come on! you're gonna ruin it for the rest of us
        
       | r3trohack3r wrote:
       | I'm not confident in this stance - sharing it to have a
       | conversation. Hopefully some folks can help me think through
       | this!
       | 
       | The value of copyleft licenses, for me, was that we were fighting
       | back against the notion of copyright. That you couldn't sell me a
       | product that I wasn't allowed to modify and share my
       | modifications back with others. The right to modify and
       | redistribute transitively though the software license gave a
       | "virality" to software freedom.
       | 
       | If training a NN against a GPL licensed code "launders" away the
       | copyleft license, isn't that a good thing for software freedom?
       | If you can launder away a copyleft license, why couldn't you
       | launder away a proprietary license? If training a NN is fair use,
       | couldn't we bring proprietary software into the commons using
       | this?
       | 
       | It seems like the end goal of copyleft was to fight back against
       | copyright, not to have copyleft. Tools like copilot seem to be an
       | exceptionally powerful tool (perhaps more powerful than the GPL)
       | for liberating software.
       | 
       | What am I missing?
        
         | zeven7 wrote:
         | The only thing you're missing is that some people lost the plot
         | and think it _is_ all about copy left.
        
         | jhkl wrote:
        
         | flatline wrote:
         | Nobody is laundering away proprietary livenses, because that
         | code is not open source and not in public github repos. And OSS
         | capabilities are now present in copilot, which is neither free
         | nor open. Furthermore these contributions are making their way
         | into proprietary code and the OSS licensing becomes even
         | further watered down. This is the epitome of what copyleft is
         | against!
        
           | TheCoelacanth wrote:
           | Code published on Github is not necessarily open source.
           | There is a lot of code there that has no particular license
           | attached, which means that all rights are reserved except for
           | those covered in the Github TOS, which I believe just covers
           | viewing the code on Github.
        
             | jacooper wrote:
             | Copilot includes all public repos on GitBub, so this
             | includes source-available and Proprietary code too.
        
           | yjk wrote:
           | Indeed, the ability to 'launder away' proprietary licenses
           | when source is available means that companies in the future
           | (that would otherwise provide source under a non-permissive
           | license) will shift in favour of not providing source code at
           | all.
        
           | r3trohack3r wrote:
           | I'm not sure this is true. Proprietary source code gets
           | leaked and that can be used to train a NN. I find it likely
           | that Copilot was trained against at least one non-OSS code
           | base hosted on GitHub.
           | 
           | Second, if copyright is being laundered away we can get
           | increasingly clever with how we liberate proprietary
           | software. Today, decompiling and reverse engineering is a
           | labor intensive process. That's the whole point of "open
           | source" - that working in source is easier than working in
           | bytecode. Given the hockey-stick of innovation happening in
           | AI right now, I'd be surprised if we don't see AI assisted
           | disassembly happening in the next decade. If you can go from
           | bytecode to source code, that unlocks a lot. Even more so if
           | you can go from bytecode to source code and feed that into a
           | NN to liberate the code from its original license.
        
         | an1sotropy wrote:
         | I think (1) you're mainly missing that copyleft vs non-copyleft
         | is actually irrelevant for the copilot case. You also (2) may
         | be missing the legal footing of copyleft licenses.
         | 
         | (1) The problem with copilot is that when it blurps out code X
         | that is arguably not under fair use (given how large and non-
         | transformed the code segment is), copilot users have no idea
         | who owns copyright on X, and thus they are in a legal minefield
         | because they have no idea what the terms of licensing X are.
         | 
         |  _Copilot creates legal risk regardless of whether the
         | licensing terms of X are copyleft or not._ Many permissive
         | licenses (MIT, BSD, etc) still require attribution (identifying
         | who owns copyright on X), and copilot screws you out doing that
         | too.
         | 
         | (2) Whatever legal power copyleft licenses have, it is
         | ultimately derived from copyright law, and people who take FOSS
         | seriously know that. The point of "copyleft" licenses is to use
         | the power of copyright law to implement "share and share alike"
         | in an enforceable way. When your WiFi router includes info
         | about the GPL code it uses, that's the legal of power of
         | copyright at work. The point of copyleft licenses is _not_ to
         | create a free-for-all by  "liberating" code.
        
         | swhalen wrote:
         | > It seems like the end goal of copyleft was to fight back
         | against copyright, not to have copyleft.
         | 
         | Whether this was the original motivation depends on whom you
         | are asking.
         | 
         | You may disagree, but the "Free Software" movement (RMS and the
         | people who agree with him) essentially wants everything to be
         | copyleft. The "Open Source" movement is probably more aligned
         | with your views.
        
         | MrStonedOne wrote:
        
         | adgjlsfhk1 wrote:
         | the problem is you can't launder copyrighted code with this
         | because you don't see the copyrighted code in the first place.
        
         | thomastjeffery wrote:
         | It looks like you're missing the entire purpose of copyleft vs
         | public domain.
         | 
         | The point is that copyleft source code cannot be used to
         | improve proprietary software. That limitation is enforced with
         | copyright.
         | 
         | Proprietary software is closed source. You can't train your NN
         | on it, because you can't read it in the first place.
         | 
         | If someone takes your open source code and incorporates it into
         | their proprietary software, then they are effectively using
         | your work for their _private_ gain. The entire purpose of
         | copyleft is to compel that person to  "pay it forward", by
         | publishing their code as copyleft. This is why Stallman is a
         | _proponent_ of copyright law. Without copyright, there is no
         | copyleft.
        
           | Gigachad wrote:
           | Copyleft wouldn't need to exist without copyright because
           | there would be no proprietary software to fight against.
           | 
           | Sure, there would be software with code not published, but if
           | it was ever leaked which it often is, you could do whatever
           | you want with it.
           | 
           | But in a world where copyright does exist, copyleft is a tool
           | to fight back.
        
             | thomastjeffery wrote:
             | Yes, but we aren't here talking about whether copyright
             | should exist. We're talking about whether Copilot violates
             | it.
        
               | Gigachad wrote:
               | I'm replying to the comment that RMS supports copyright.
               | I don't believe he does, I believe he would rather it not
               | exist at all but since it does, you have to make use of
               | it.
        
           | r3trohack3r wrote:
           | > If someone takes your open source code and incorporates it
           | into their proprietary software, then they are effectively
           | using your work for their private gain.
           | 
           | And then if we can close that loop by taking their
           | proprietary software and feeding it into a NN to re-liberate
           | it isn't that a net win for software freedom?
           | 
           | Today crossing the sourcecode->bytecode veil effectively
           | obfuscates the implementation beyond most human's ability to
           | modify the software. Humans work best in sourcecode. Nothing
           | saying our AI overlords won't be able to work well in
           | bytecode or take it in the other direction.
           | 
           | I guess what I'm saying is, today a compiler is a one-way
           | door for software freedom. Once it goes through the compiler,
           | we lose a lot of freedom without a massive human investment
           | or the original source code. Maybe that door is about to
           | become a two way door with copyright law supporting moving
           | back and forth through that door?
        
             | thomastjeffery wrote:
             | > And then if we can close that loop by taking their
             | proprietary software
             | 
             | From where? They aren't publishing it. That's literally the
             | meaning of proprietary.
        
               | xigoi wrote:
               | That's not the meaning of proprietary, but otherwise
               | you're right.
        
         | bjourne wrote:
         | You can "launder" away the license of any source code you have
         | copied simply by deleting it! No snazzy neural network needed..
         | The litigants argument is that this is what GitHub CoPilot
         | does. It allows others to publish derivative works of
         | copyrighted works with the license deleted. Given that it
         | apparently is trivial to get CoPilot to spit out nearly
         | verbatim copies of the code that it was trained on, I don't
         | think it satisfies the "transformative" requisite of the
         | (American) Fair use doctrine.
        
           | cactusplant7374 wrote:
           | Is stable diffusion any different when including a famous
           | artwork or artist in the prompt? The images produced are
           | eerily similar to training data.
        
             | Taniwha wrote:
             | probably not and likely open to similar law suits - this is
             | not really a bad thing
        
               | cactusplant7374 wrote:
               | It seems like the ideal way to proceed is to make the AI
               | output unique and creative. Perhaps that requires AGI
               | because currently the model has no understanding of art.
        
         | krono wrote:
         | Farmers plant their crops out in the open too. Should Boston
         | Dynamics be allowed to have their robots rob those fields empty
         | and sell the produce without having to at least pay the farmer?
         | They'd be walking and plucking just like any human would be.
         | 
         | Some source code might be published but not open source
         | licensed. At least some such code has been taken with complete
         | disregard of their licenses and/or other legal protections, and
         | it's impossible to find and properly map out any similar
         | violations for the purposes of a legal response.
        
       | bergenty wrote:
        
       | abouttyme wrote:
       | I suspect this will be the first of many lawsuits over training
       | data sets. Just because it is obscured by artificial neural
       | networks doesn't mean it's an original work that is not subject
       | to copyright restrictions.
        
         | ketralnis wrote:
         | Yeah yeah my code produces the complete works of Micky Mouse
         | but it's it's okay because _algorithms_!
        
           | m00x wrote:
           | Copyright is different than patent law and license law.
        
           | judge2020 wrote:
           | I don't know why we're treating it as anything less than a
           | human brain. A human can replicate a painting from memory or
           | a picture of mickey mouse and that would likely be copyright
           | infringement, but they could also take a drawing of Mickey
           | Mouse sitting on the beach and given him a bloody knife &
           | some sunglasses and it'd likely be fair use of the original
           | art.
           | 
           | The AI can copy things if it wants, but it can also modify
           | things to the point of being fair use, and it can even create
           | new works with so little of any particular work that it's
           | effectively creativity on the same level of humans when they
           | draw something that popped into their heads.
        
       | jeffhwang wrote:
       | Wow, this is interesting iteration in the ongoing divide between
       | "East Coast code" vs. "West Coast code" as defined by Larry
       | Lessig. For background, see https://lwn.net/Articles/588055/
        
       | SighMagi wrote:
       | I did not see that coming.
        
       | brookst wrote:
       | I wonder if the plaintiffs' code would stand up to scrutiny of
       | whether any of it was copied, even unintentionally, from other
       | code they saw in their years of learning to program? I know that
       | I have more-or-less transcribed from Stack Overflow/etc, and I
       | have a strong suspicion that I have probably produced code
       | identical to snippets I've seen in the past.
        
         | zach_garwood wrote:
         | But have you done so on an industrial scale?
        
           | brookst wrote:
           | I'm just one person! Give me a team of 1000 and I'll get
           | right on that.
        
       | bilsbie wrote:
       | Laws need to change to match technology.
       | 
       | Did you know before airplanes were invented common law said you
       | owned the air above your land all the way to the heavens.
        
         | m00x wrote:
         | Can you explain what damages you incur from Copilot?
        
           | jacooper wrote:
           | People not following your license ? And not making their
           | derived works under the same license like I require?
        
       | 0cf8612b2e1e wrote:
       | Is there any amount of public data/code/whatever I can make an
       | offline backup of today in the event this gets pulled?
        
         | kyleee wrote:
         | That's what I am wondering, as a contingency plan so at least a
         | replica service can be created if copilot shuts down.
        
       | bugfix-66 wrote:
       | Ask HN: I want to modify the BSD 2-Clause Open Source License to
       | explicitly prohibit the use of the licensed software in training
       | systems like Microsoft's Copilot (and use during inference). How
       | should the third clause be worded?                 The No-AI
       | 3-Clause Open Source Software License            Copyright (C)
       | <YEAR> <COPYRIGHT HOLDER>            All rights reserved.
       | Redistribution and use in source and binary forms, with or
       | without       modification, are permitted provided that the
       | following conditions       are met:            1. Redistributions
       | of source code must retain the above copyright          notice,
       | this list of conditions and the following disclaimer.
       | 2. Redistributions in binary form must reproduce the above
       | copyright          notice, this list of conditions and the
       | following disclaimer in          the documentation and/or other
       | materials provided with the          distribution.            3.
       | Use in source or binary forms for the construction or operation
       | of predictive software generation systems is prohibited.
       | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
       | CONTRIBUTORS       "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
       | INCLUDING, BUT NOT       LIMITED TO, THE IMPLIED WARRANTIES OF
       | MERCHANTABILITY AND FITNESS FOR       A PARTICULAR PURPOSE ARE
       | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT       HOLDER OR
       | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
       | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
       | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
       | USE,       DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
       | CAUSED AND ON ANY       THEORY OF LIABILITY, WHETHER IN CONTRACT,
       | STRICT LIABILITY, OR TORT       (INCLUDING NEGLIGENCE OR
       | OTHERWISE) ARISING IN ANY WAY OUT OF THE USE       OF THIS
       | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
       | 
       | https://bugfix-66.com/f0bb8770d4b89844d51588f57089ae5233bf67...
        
         | kochb wrote:
         | For this clause to have any positive effect, you need to 1) be
         | willing to pursue legal action against violators and 2)
         | actually notice that the clause has been violated.
         | 
         | Such language must be carefully written. What is the definition
         | of "construction" and "operation" in a legal context? What is a
         | "predictive software generation system"? That's a very specific
         | use case, you sure you covered everything you want to prohibit?
         | 
         | You've inserted your clause in such a way that this dependency
         | cannot be used in any way to build anything similar to a
         | "predictive software generation system", even with attribution,
         | as it would fail clause 3.
         | 
         | You have to consider that novel licenses make it difficult for
         | any party that respects licenses to use your code. It is
         | difficult to make one-off exceptions, especially when the text
         | is not legally sound. So adoption of your project will be
         | harmed.
         | 
         | So if you are serious about this license, you need a lawyer.
        
         | [deleted]
        
         | an1sotropy wrote:
         | IANAL, and I'm no fan of copilot, but I wonder if this kind of
         | clause (your #3) is going to fly: you're preemptively
         | prohibiting certain kinds of reading of the code (when code is
         | read by the ML model in training). But is that something a
         | license can actually do?
         | 
         | The legal footing that copyright gives you, on which licensing
         | rests, certainly empowers you to limit things about how others
         | may _redistribute_ your work (and things derived from it), but
         | does it empower you to limit how others may _read_ your work?
         | As a ridiculous example, I don 't think it would be enforceable
         | to have a license say "this code can't be used by left-handed
         | people", since that's not what copyright is about, right?
        
           | bugfix-66 wrote:
           | The license conditionally permits (i.e., controls)
           | "redistribution and use in source and binary forms".
           | 
           | I think we can constrain use with the third clause.
           | 
           | My question is, how should we word that clause?
        
             | an1sotropy wrote:
             | Licenses get to set terms of redistribution. But training
             | of the ML model -- the thing described by your #3 -- is
             | _not_ redistribution (imho). So maybe it 's as
             | unenforceable as saying left-handed people can't read your
             | code.
             | 
             | The redistribution happens later, either when copilot
             | blurps out some of your code, or when the copilot user then
             | distributes something using that code (I'm curious which).
             | At that point, whether some use of your code is infringing
             | your license doesn't depend on the path the code took, does
             | it? (in which case #3 is moot)
        
               | bugfix-66 wrote:
               | The BSD license also controls "use", not just
               | "redistribution":                 Redistribution and use
               | in source and binary forms, with or without
               | modification, are permitted provided that the following
               | conditions       are met:
               | 
               | That's word-for-word BSD license.
               | 
               | The only change I made is adding clause 3:
               | 3. Use in source or binary forms for the construction or
               | operation          of predictive software generation
               | systems is prohibited.
        
             | bombcar wrote:
             | Many licenses have constraints, whether this wording is the
             | best way to do it is up for discussion, but it's certainly
             | possible to do it.
        
         | ilc wrote:
         | If I read this right, I can't use auto-complete. No thanks.
        
           | tedunangst wrote:
           | Yeah, lol. New rule: code may be used for autocomplete, but
           | only by a push down automata.
        
         | m00x wrote:
         | Get a lawyer since this is nonsense.
        
           | tptacek wrote:
           | Is it? A similarly casual clause in the OCB license prevented
           | OCB from being used by the military for many years (granted,
           | it prevented OCB from being used almost everywhere else,
           | too).
           | 
           | I have no idea if this license language works or doesn't, but
           | this is hardly the least productive subthread on this story.
           | It's concrete and specific, and we can learn stuff from it.
        
             | tedunangst wrote:
             | OCB is a fun case study because they later granted an
             | exception for OpenSSL, but only for software literally
             | named OpenSSL.
        
           | bugfix-66 wrote:
           | It's literally the standard BSD 2-Clause License, word for
           | word, with an additional third clause:                 3. Use
           | in source or binary forms for the construction or operation
           | of predictive software generation systems is prohibited.
           | 
           | Hardly nonsense, but obviously you aren't equipped to judge.
           | More about the BSD licenses:
           | 
           | https://en.m.wikipedia.org/wiki/BSD_licenses
        
             | m00x wrote:
             | Yes, that added clause is nonsense. On top of being
             | nonsense, there is significant precedent.
             | 
             | Remember the lawsuit of HiQ labs vs LinkedIn? Scraping, or
             | viewing public data on a public webpage is legal.
             | 
             | https://gizmodo.com/linkedin-scraping-data-legal-court-
             | case-...
        
               | bugfix-66 wrote:
               | If the GPL can defeat Copilot, we need an more permissive
               | MIT/BSD-style license to do the same.
        
               | tptacek wrote:
               | This does seem like a pretty compelling rebuttal, since
               | the preceding comment suggests that GPL does nothing to
               | Microsoft's ability to incorporate code into Copilot's
               | model.
        
               | bugfix-66 wrote:
               | They attempt to exclude GPL code, and fail sometimes.
               | 
               | Eventually Microsoft will succeed in excluding it.
               | 
               | As a law-abiding corporation, they intend to exclude GPL
               | code.
        
             | nverno wrote:
             | How would you ever prove the parameters of a model were
             | generated by specific training data? Couldn't multiple sets
             | of training data produce the same embeddings/parameters? I
             | imagine there could be infinite possible sets of training
             | data that would lead to the same results, depending on the
             | type of predictive software.
        
               | bugfix-66 wrote:
               | Law-abiding companies like Microsoft won't knowingly
               | violate a license.
               | 
               | The GPL defeats Copilot. Microsoft tries to exclude all
               | GPL software from its training.
               | 
               | What we need is a more permissive MIT- or BSD-style
               | license that defeats Copilot.
               | 
               | We just need to convince law-abiding companies that they
               | shouldn't use the code.
        
         | CrazyStat wrote:
         | The legal theory for copilot is that training an ML model is
         | fair use, not that the license allows it. If it _is_ fair use
         | then you can 't prohibit it by license, no matter what you put
         | in your license.
        
       | ISL wrote:
       | Can anyone with Copilot access give a short summary of its
       | response to the prompts:                 function
       | force=Gmmr2Array(mass1, mass2)
       | 
       | and                 function [force,
       | torque]=pointMatrixGravity(array1,array2)
       | 
       | ?
       | 
       | I'd love to know if some of my GPL v3 code [1, 2] has landed in
       | the training set
       | 
       | [1]
       | https://github.com/4kbt/NewtonianEotWashToolkit/blob/master/...
       | 
       | [2]
       | https://github.com/4kbt/NewtonianEotWashToolkit/blob/master/...
        
         | [deleted]
        
         | jm4 wrote:
         | I got something a bit different from the other person who
         | responded to you.
         | 
         | function force=Gmmr2Array(mass1, mass2) {                   var
         | G = 6.67408e-11;              var m1 = mass1;              var
         | m2 = mass2;              var r = 1;              var force = G
         | \* m1 \* m2 / (r \* r);              return force;
         | 
         | }
         | 
         | function [force, torque]=pointMatrixGravity(array1,array2) {
         | var G = 6.67408e-11;              var m1 = array1(1);
         | var m2 = array2(1);              var r =
         | norm(array1(2:4)-array2(2:4));              var force = G \* m1
         | \* m2 / (r \* r);              var torque =
         | cross(array1(2:4)-array2(2:4), force);              return
         | [force, torque];          }
        
         | jmcphers wrote:
         | First prompt:                   var G = 6.67e-11;         var
         | force = G * mass1 * mass2 / distance * distance;         return
         | force;
         | 
         | Second prompt:                 % pointMatrixGravity -
         | Calculates the gravitational force and torque on a       %
         | point mass due to a matrix of point masses.       %        %
         | [force, torque]=pointMatrixGravity(array1,array2)       %
         | %   Inputs:       %       array1 - 3xN array of point masses
         | %       array2 - 3xN array of point masses       %   Outputs:
         | %       force - 3x1 array of gravitational force       %
         | torque - 3x1 array of gravitational torque       %        %
         | Author: David J. R. MacKay,       %   email:
        
           | ridiculous_fish wrote:
           | FYI if you use the "Open GitHub Copilot" command in VSCode
           | you will get up to 10 different outputs for the same prompt.
           | 
           | Intereting that my results were different than yours!
        
         | ridiculous_fish wrote:
         | For Gmmr2Array:
         | https://gist.github.com/ridiculousfish/9a25f5f778d98ecd81099...
         | 
         | For pointMatrixGravity:
         | https://gist.github.com/ridiculousfish/af05137a4090e92de3a97...
        
       | solomatov wrote:
       | The most important part of this is not whether the lawsuit will
       | be won or lost by one of the parties, but what is the legality of
       | fair use in machine learning, and language models. There's a good
       | chance that it gets to Supreme Court and there will be a defining
       | precedent to be used by future entrepreneurs about what's
       | possible and what's not.
       | 
       | P.S. I am not a lawyer.
        
       | layer8 wrote:
       | Copilot reminds me of the Borg: You will be assimilated. We will
       | add your technological distinctiveness to our own. Resistance is
       | futile.
        
       | an1sotropy wrote:
       | Seems important to point out that the announcement on this page
       | (https://githubcopilotlitigation.com/) is a followup to
       | https://githubcopilotinvestigation.com/ previously discussed
       | here: https://news.ycombinator.com/item?id=33240341 (with 1219
       | comments)
        
       | Entinel wrote:
       | I don't have a comment on this personally but I want to throw
       | this out there because every time I see people criticizing
       | Copilot or Dall-E someone always says "BUT ITS FAIR USE! Those
       | people don't seem to grasp that "Fair Use" is a defense. The
       | burden is not on me to prove what you are doing is not fair use;
       | the burden is on you to prove what you are doing is fair use
        
         | [deleted]
        
       | buzzy_hacker wrote:
       | Copilot has always seemed like a blatant GPL violation to me.
        
         | m00x wrote:
         | Care to explain in legal terms why this stance is qualified?
        
           | buzzy_hacker wrote:
           | You may convey a work based on the Program, or the
           | modifications to produce it from the Program, in the form of
           | source code under the terms of section 4, provided that you
           | also meet all of these conditions:
           | 
           | a) The work must carry prominent notices stating that you
           | modified it, and giving a relevant date. b) The work must
           | carry prominent notices stating that it is released under
           | this License and any conditions added under section 7. This
           | requirement modifies the requirement in section 4 to "keep
           | intact all notices". c) You must license the entire work, as
           | a whole, under this License to anyone who comes into
           | possession of a copy. This License will therefore apply,
           | along with any applicable section 7 additional terms, to the
           | whole of the work, and all its parts, regardless of how they
           | are packaged. This License gives no permission to license the
           | work in any other way, but it does not invalidate such
           | permission if you have separately received it.
           | 
           | ----
           | 
           | I don't see how one could argue that training on GPL code is
           | not "based on" GPL code.
        
       | xchip wrote:
       | LOL we look like taxi drivers fighting Uber.
       | 
       | If Kasparov uses chess programs to be better at chess maybe we
       | can use copilot to be better developers?
       | 
       | Also, anyone, either a person or a machine, is welcome to learn
       | from the code I wrote, actually that is how I learnt how to code,
       | so why would I stop others from doing the same?.
        
         | jacooper wrote:
         | No human perfectly reproduces the learning material they used.
         | If that was true, one might as well just higher engineers from
         | Twitter and make a new platform from the code they remember!
        
       | IceWreck wrote:
       | I am not against this lawsuit but I'm against the implications of
       | this because it can lead to disastrous laws.
       | 
       | A programmer can read available but not oss licensed code and
       | learn from it. Thats fair use. If a machine does it, is it wrong
       | ? What is the line between copying and machine learning ? Where
       | does overfitting come in ?
       | 
       | Today they're filing a lawsuit against copilot.
       | 
       | Tomorrow it will be against stable diffusion or (dall-e, gpt-3
       | whatever)
       | 
       | And then eventually against Wine/Proton and emulators (are APIs
       | copyrightable)
        
         | bawolff wrote:
         | > A programmer can read available but not oss licensed code and
         | learn from it. Thats fair use. If a machine does it, is it
         | wrong ?
         | 
         | You can learn from it, but if you start copying snippets or
         | base your code on it to such an extent that its clear your work
         | is based on it, things start to get risky.
         | 
         | For comparison, people have tried to get around copyright of
         | photos by hiring an illustrator to "draw" the photo, which
         | doesn't work legally. This situation seems similar.
        
           | michaelmrose wrote:
           | Why wouldn't drawing the photo be fair use can you cite a
           | case?
        
         | swhalen wrote:
         | > A programmer can read available but not oss licensed code and
         | learn from it. Thats fair use.
         | 
         | If a human programmer reads some else's copyrighted code, OSS
         | or otherwise, memorizes it and later reproduces it verbatim or
         | nearly so, that is copyright infringement. If it wasn't,
         | copyright would be meaningless.
         | 
         | The argument, so far as I understand it, is that Copilot is
         | essentially a compressed copy of some or all of the
         | repositories it was trained on. The idea that Copilot is
         | "learning from" and transforming its training corpus seems, to
         | me, like a fiction that has been created to excuse the
         | copyright infringement. I guess we will have to see how it
         | plays out in court.
         | 
         | As a non-lawyer it seems to me that stable diffusion is also on
         | pretty shaky ground.
         | 
         | APIs are not copyrightable (in the US), so Wine is safe (in the
         | US).
        
         | kmeisthax wrote:
         | Wine/Proton are safe because there is controlling 9th and
         | SCOTUS precedent in favor of reimplementation of APIs.
         | 
         | The reason why those wouldn't apply to Copilot is because they
         | aren't separating out APIs from implementation and just
         | implementing what they need for the goal of compatibility or
         | "programmer convenience". AI takes the whole work and shreds it
         | in a blender in the hopes of creating something new. The _hope_
         | of the AI community is that the fair use argument is more like
         | Authors Guild v. Google rather than Sony v. Connectix.
        
         | cromka wrote:
         | > A programmer can read available but not oss licensed code and
         | learn from it. Thats fair use. If a machine does it
         | 
         | Quite sure the issue at hand is about the code being copied
         | verbatim without the license terms, not "learning" from it.
        
         | chiefalchemist wrote:
         | Agreed. But it could go the other way as well. Let's say MS /
         | HB wins and the decision establishes and even less healthy /
         | profitable (?) outcome over the long term.
         | 
         | Remember when Napster was all the rage. And then Jobs and Apple
         | stepped in and set an expectation for the value of a song (at
         | 99 cents)? And that made music into the razor and the iPod the
         | much more profitable blades. Sure it pushed back Napster but
         | artists - as the creator of the goods - have yet to recover.
         | 
         | I'm not saying this is the same thing. It's not. Only noting
         | that today's "win" is tomorrow's loss. This very well could be
         | a case of be careful what you wish for.
        
         | [deleted]
        
         | belorn wrote:
         | It would be good to have a definitive and simple line for fair
         | use that could be applied to all forms of copyright. Right now
         | fair use is defined by four guidelines:
         | 
         |  _The purpose and character of the use, including whether such
         | use is of a commercial nature or is for nonprofit educational
         | purposes
         | 
         | The nature of the copyrighted work
         | 
         | The amount and substantiality of the portion used in relation
         | to the copyrighted work as a whole
         | 
         | The effect of the use upon the potential market for or value of
         | the copyrighted work._
         | 
         | A programmer who studied in school and learned to code did so
         | clearly for and educational purpose. The nature of the work is
         | primarily facts and ideas, while expression and fixation is
         | generally not what the school is focusing on (obviously some
         | copying of style and implementation could occur). The amount
         | and substantiality of the original works is likely to be so
         | minor as to be unrecognized, and the effect of the use upon the
         | potential market when student learn from existing works would
         | be very hard to measure (if it could be detected).
         | 
         | When a machine do this, are we going to give the same answers?
         | Their purpose is explicitly commercial. Machines operate on
         | expression and fixation, and the operators can't extract the
         | idea that a model should have learned in order to explain how a
         | given output is generated. Machines makes no distinction of the
         | amount and substantiality of the original works, with no
         | ability to argue for how they intentionally limited their use
         | of the original work. And finally, GitHub Copilot and other
         | tools like them do not consider the potential market of the
         | infringed work.
         | 
         | API's are generally covered by the interoperability exception.
         | I am unsure how that is related copilot or dall-e (and the
         | likes). In the Oracle v. Google case the court also found that
         | the API in question was neither an expression or fixation of an
         | idea. A co-pilot that only generated header code could in
         | theory be more likely to fall within fair use, but then the
         | scope of the project would be tiny compared to what exist now.
        
         | andrewmcwatters wrote:
         | GitHub Copilot has been proven to use code without license
         | attribution. This doesn't need to be as controversial as it is
         | today.
         | 
         | If you're using code and know that it will be output in some
         | form, just stick a license attribution in the autocomplete.
         | 
         | In fact, did you know this is what Apple Books does by default?
         | Say, for example, you copy and paste a code sample from _The C
         | Programming Language. 2nd Edition_. What comes out? The code
         | you copy and pasted, plus attribution.
        
         | TimTheTinker wrote:
         | At least in legal terms, the difference between humans and
         | machines couldn't be more clear.
        
         | arpowers wrote:
         | In some ways all these AIs are plagiarizing... I think creators
         | should opt-in to ai models, as no current license was developed
         | with this in mind.
        
           | grayfaced wrote:
           | Maybe its time for Creative Commons License to address this.
           | I'm curious if No-Derivative would already prohibit this?
           | Does the ND language need tweaking? Or do they need a whole
           | new clause.
           | 
           | Edit: I guess they do address it in their faq and I'd
           | summarize it "Depends if copyright law applies and depends if
           | it's considered derivative".
           | https://creativecommons.org/faq/#artificial-intelligence-
           | and...
        
         | Iv wrote:
         | AI companies are running against the clock to normalize
         | training against copyrighted data.
         | 
         | Let me tell you the story of Google Books, also known as
         | "Authors Guild Inc. v. Google Inc"
         | 
         | https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,...
         | .
         | 
         | In 2004, Google added copyrighted books to is Google Books
         | search engine, that does search among millions of book text and
         | shows full page results without any authors authorization. Any
         | sane lawyer of the time would have bet on this being illegal
         | because, well, it most certainly was. And you may be shocked to
         | learn that it is actually not.
         | 
         | in 2005 the Authors Guild sues for this pretty straightforward
         | copyright violation.
         | 
         | Now an important part of the story: IT TOOK 10 YEARS FOR THE
         | JUDGEMENT TO BE DECIDED (8 years + 2 years appeal) during
         | which, well, tech continued its little stroll. Ten year is a
         | lot in the web world, it is even more for ML.
         | 
         | The judgement decided Google use of the books was fair use.
         | Why? Not because of the law, silly. A common error we geeks do
         | is to believe that the law is like code and that it is an
         | invincible argument in court. No, the court was impressed by
         | the array of people who were supporting Google, calling it an
         | invaluable tool to find books, that actually caused many sales
         | to increase, and therefore the harm the laws were trying to
         | prevent was not happening while a lot of good came from it.
         | 
         | Now the second important part of the story: MOST OF THESE
         | USEFUL USES HAPPENED AFTER THE LITIGATION STARTS. That's the
         | kind of crazy world we are living in: the laws are badly
         | designed and badly enforced, so the way to get around them is
         | to disregard them for the greater good, and hope the tribunal
         | won't be competent enough to be fast but not incompetent enough
         | to fail and understand the greater picture.
         | 
         | Rants aside, I doubt training data use will be considered
         | copyright infringement if the courts have a similar mindset
         | than in 2005-2015. Copyright laws were designed to preserve the
         | authors right to profit from copies of their work, not to give
         | them absolute control on every possible use of every copy ever
         | made.
        
         | sedatk wrote:
         | > A programmer can read available but not oss licensed code and
         | learn from it
         | 
         | Actually, we were forbidden to look at open source code at
         | Microsoft (circa 2009) because it might influence our coding
         | and violate licenses.
        
           | EMIRELADERO wrote:
           | That was out of abundance of caution, not based on any legal
           | precedent.
           | 
           | In fact, the little precedent that exists over learning from
           | copyrightable code is _in favor_ of it.
           | 
           |  _More important, the rule urged by Sony would require that a
           | software engineer, faced with two engineering solutions that
           | each require intermediate copying of protected and
           | unprotected material, often follow the least efficient
           | solution (In cases in which the solution that required the
           | fewest number of intermediate copies was also the most
           | efficient, an engineer would pursue it, presumably, without
           | our urging.) This is precisely the kind of "wasted effort
           | that the proscription against the copyright of ideas and
           | facts . . . [is] designed to prevent."_ (Sony v. Connectix)
        
           | __alexs wrote:
           | Do the TypeScript team code with their eyes closed?
        
             | eddsh1994 wrote:
             | Have you seen some of that codebase? ;)
        
             | sedatk wrote:
             | Not sure, TypeScript didn't exist back then :)
        
           | kens wrote:
           | Way, way back in 1992, Unix Systems Laboratories sued BSDI
           | for copyright infringement. Among other things, they claimed
           | that since the BSD folks had seen the Unix source code, they
           | were "mentally contaminated" and their code would be a
           | copyright violation. This led to the BSD folks wearing
           | "mentally contaminated" buttons for a while.
        
           | elil17 wrote:
           | That demonstrates that copyright laws are already stifling
           | innovation.
        
             | josho wrote:
             | I don't quite agree. Msft took a conservative approach to
             | copyright to protect their own business.
             | 
             | Meanwhile open source software has had an immeasurable
             | benefit to society. My computer, tv, phone, light bulb, etc
             | all benefit from OSS--running various licenses, and only a
             | subset using a copyleft like license.
        
               | elil17 wrote:
               | The fact that the laws are inconsistent and expensive to
               | defend against leads companies like Microsoft to take
               | this conservative approach that slows down progress.
        
             | Someone wrote:
             | It demonstrates that it stifles copying. That may make it
             | easier for the copier to innovate, but doesn't dispute the
             | main argument for having copyright protection: that,
             | without the protection of copyright, the code wouldn't have
             | been written.
        
               | elil17 wrote:
               | I think in the case of open source code, most of it still
               | would have been written if no copyright protections
               | existed.
        
             | saghm wrote:
             | Sure, but given the timetable for changing the law, it
             | still seems pretty reasonable to apply the same standard to
             | Microsoft (and by extension Github) in the meantime
        
             | HWR_14 wrote:
             | That's the goal. To stifle using someone else's work.
             | 
             | Like, copyright laws are also stifling my innovative
             | business creating BluRays of Disney films and selling them
             | on Amazon.
        
               | elil17 wrote:
               | That sucks for little snippets of software though,
               | doesn't it? It's like copyrighting individual dance moves
               | (not allowed under the current system) and forcing
               | dancers to never watch each other to make sure they're
               | never stealing.
        
               | HWR_14 wrote:
               | I mean, it's not like the copyrights are keeping you from
               | doing things. It's stopping you from looking at someone
               | else's source. And it's not like source is easy to
               | accidentally see like dance moves are.
        
               | schleck8 wrote:
               | Copyright laws aren't preventing you from learning
               | cinematography by watching said Disney movies though, and
               | using all their techniques for your own project.
               | 
               | OpenAI did a dirty job though judging by the cases of the
               | model just reproducing code to the comment, so I can
               | understand why one would criticize this specific project.
        
             | m00x wrote:
             | Yeah, that's a good argument to fully disprove this as a
             | loss to society, and instead as a gain.
        
         | Barrin92 wrote:
         | >A programmer can read available but not oss licensed code and
         | learn from it. Thats fair use.
         | 
         | No it isn't, at least not automatically which is why
         | infringement of licenses exists at all, the fact that you have
         | a brain doesn't change that and never has. If you reproduce
         | someone's code you can be in hot water, and that should be the
         | case for an operator of a machine.
         | 
         | It's also why the concept of a clean room implementation exists
         | at all.
        
           | EMIRELADERO wrote:
           | I think the commenter you replied to was talking about using
           | the functional, non-copyrightable elements of the copyrighted
           | code. Clean-room is not even required by case law. There's
           | precedent that _explicitly_ calls it out as inefficient.
           | 
           |  _More important, the rule urged by Sony would require that a
           | software engineer, faced with two engineering solutions that
           | each require intermediate copying of protected and
           | unprotected material, often follow the least efficient
           | solution (In cases in which the solution that required the
           | fewest number of intermediate copies was also the most
           | efficient, an engineer would pursue it, presumably, without
           | our urging.) This is precisely the kind of "wasted effort
           | that the proscription against the copyright of ideas and
           | facts . . . [is] designed to prevent."_ (Sony v. Connectix)
        
         | bdcravens wrote:
         | In most copyright cases, exposure to the material in question
         | is always discussed.
        
         | mkeeter wrote:
         | Wine literally bans contributions from anyone that has seen
         | Microsoft Windows source code:
         | 
         | https://wiki.winehq.org/Developer_FAQ#Who_can.27t_contribute...
        
           | c0balt wrote:
           | Well they are a special case here however since they don't
           | solve a specific problem nor build a programm per se but
           | instead (re)build a programm after existing specs. Their
           | explicit goal is to match the behaviour of another piece of
           | software with a translation layer.
           | 
           | Forbidding people who have seen the "source" programm is most
           | likely to protect their version from going from "matching
           | behaviour" to "behaving like", as in the same code, point.
           | This might also be intended to build a safeguard for good
           | intentioned developers to not break their (most likely
           | existing) own NDAs accidently.
        
         | bogwog wrote:
         | > Today they're filing a lawsuit against copilot.
         | 
         | > Tomorrow it will be against stable diffusion or (dall-e,
         | gpt-3 whatever)
         | 
         | > And then eventually against Wine/Proton and emulators (are
         | APIs copyrightable)
         | 
         | Textbook definition of F.U.D.
        
         | laputan_machine wrote:
         | Genuinely one of the worst takes I've ever read. I'm not
         | against the 'slippery slope' argument in principle, but this
         | example is ridiculous.
        
           | mardifoufs wrote:
           | Slippery slope? Are you familiar with judicial precedent?
           | Being bound to precedents is central to common law legal
           | systems, so I don't think the GP's take was so outlandish.
           | "Slippery slopes" and "whataboutism" might be thought-
           | terminating buzzwords online, but not in front of a judge.
        
             | ImprobableTruth wrote:
             | In what way would this even remotely set a precedent for
             | APIs?
        
         | amelius wrote:
         | > If a machine does it, is it wrong ? What is the line between
         | copying and machine learning ?
         | 
         | What is the difference between a neighbor watching you leave
         | your home to visit the local grocery store and mass
         | surveillance? Where do you draw the line?
         | 
         | It is pretty simple, actually.
        
         | whateveracct wrote:
         | > A programmer can read available but not oss licensed code and
         | learn from it. Thats fair use. If a machine does it, is it
         | wrong ?
         | 
         | Just because both activities are calling "learning" does not
         | mean they are the same thing. They are fundamentally,
         | physically different activities.
        
       | adlpz wrote:
       | It feels weird saying this but, for once, I hope the big evil
       | corporation gets to keep selling their big bad product.
       | 
       | I find the pattern matching and repetitive code generation
       | _really_ helpful. And the library autocomplete on steroids, too.
       | 
       | Meh. Tricky subject.
        
         | nrb wrote:
         | Does anyone have a problem with it, so long as the material it
         | trained on was with explicit permission/license and not
         | potentially in violation of copyright?
         | 
         | That's where the line is for it to be suspect IMO.
        
           | adlpz wrote:
           | I guess I'm just afraid that it might not be as good as it is
           | that way.
           | 
           | It's a bit like how GPT-3, Stable Diffusion and all those
           | generative models use extensive amounts of copyrighted
           | material in training to get as good as they do.
           | 
           | In those cases however the output space is so vast that
           | plagiarism is _very_ unlikely.
           | 
           | With code, not so much.
        
             | jacobr1 wrote:
             | GPT-3 and Stable Diffusion might not copy things exactly -
             | but they certainly do copy "style" There are many articles
             | likes this:
             | 
             | https://hyperallergic.com/766241/hes-bigger-than-picasso-
             | on-...
             | 
             | The interesting thing is that the names get explicitly
             | attached to these styles. It isn't exactly a copyright
             | issue, but I'm sure it will get litigated regardless.
        
             | bjourne wrote:
             | I think the prompt "GPT-3, tell me what the lyrics for the
             | song Stan by Eminem is" is very likely to output
             | copyrighted material. The same copyrighted material is, of
             | course, already republished without permission on
             | google.com.
        
           | michaelmrose wrote:
           | It being permissively licensed is virtually irrelevant
           | because only a minority of code is so permissively licensed
           | you can just do what you like under any license. Far more is
           | do what you like within the scope of the license. For example
           | GPL do with it what you like so long as any derivative work
           | is also GPL.
        
           | bogwog wrote:
           | This is what I hope comes out of the lawsuit. If a company
           | wants to sell an AI model, they need to own all of the
           | training data. It can't be "fair use" to take other peoples'
           | works at zero cost, and use it to build a commercial product
           | without compensation.
           | 
           | And maybe models trained on public data should be in the
           | public domain, so that AI research can happen without
           | requiring massive investments to obtain the training data.
        
             | bpicolo wrote:
             | > It can't be "fair use" to take other peoples' works at
             | zero cost, and use it to build a commercial product without
             | compensation.
             | 
             | You just described open source software.
             | 
             | That's the whole heart of this lawsuit, and equally
             | Copilot. It was trained on OSS which is explicitly licensed
             | for free use.
        
               | bogwog wrote:
               | Ok you got me, that wording was lazy on my part. But
               | that's a really bad take on yours:
               | 
               | > It was trained on OSS which is explicitly licensed for
               | free use.
               | 
               | That's not what the lawsuit is about. It's not about
               | money, it's about licensing. OSS licenses have specific
               | requirements and restrictions for using them, and Copilot
               | explicitly ignores those requirements, thus violating the
               | license agreement.
               | 
               | The GPL, for example, requires you to release your own
               | source code if you use it in a publicly-released product.
               | If you don't do that, you're committing copyright
               | infringement, since you're copying someone's work without
               | permission.
        
               | bpicolo wrote:
               | Yeah, and I think that's fair re: licensing. Curious to
               | see how it pans out.
        
               | deathanatos wrote:
               | Most companies building commercial products on top of
               | FOSS _are_ obeying the license requirements. (I have been
               | through due diligence reviews where we had to demonstrate
               | that, for each library /tool/package.)
               | 
               | The same cannot be said for Copilot: there have been
               | prior examples here on HN showing that it can emit large
               | chunks of copyrighted code (without the license).
        
               | [deleted]
        
               | [deleted]
        
               | xigoi wrote:
               | > That's the whole heart of this lawsuit, and equally
               | Copilot. It was trained on OSS which is explicitly
               | licensed for free use.
               | 
               | Most open-source software is not licensed for free use.
               | MIT and GPL, the two most common licenses, both require
               | attribution.
        
         | MrStonedOne wrote:
        
         | dmix wrote:
         | TabNine has absolutely improved my life as a programmer.
         | There's something really rewarding about having a robot read
         | your mind for entire blocks of code.
         | 
         | It's not just functions either, one of the most common things
         | that it helps me with daily is simple stuff like this:
         | 
         | Typing                   const x = {             a: 'one',
         | b: 'two',             ...         }
         | 
         | And later I'll be typing                   y = [
         | a['one'],            b[' <-- it auto-completes the rest here
         | ]
         | 
         | It's really amazing the amount of busy-work typing in
         | programming that a smart pattern matching algo could help with.
        
           | bogwog wrote:
           | I don't think this is a good example of the value of these
           | things. You can just as easily do that same thing with
           | advanced text editor features. Sublime for example supports
           | multi-cursor editing. Just hold alt+shift+arrow keys to add a
           | cursor, then type in the brackets you want. Ctrl+D can be
           | used to select the next occurrence of the current selection
           | with multiple cursors, built-in commands from the command
           | pallete can do anything to your current selection (e.g.
           | convert case), etc.
           | 
           | All of that efficiency without having to pay a monthly
           | subscription, wasting electricity on some AI model, and
           | worrying about the legal/moral implications.
        
             | ChrisLTD wrote:
             | Multiple cursors wont do what the parent comment is talking
             | about without a lot more work.
        
               | bogwog wrote:
               | Why? You can copy and paste the entire section, and use
               | multiple cursors to add in the brackets.
               | 
               | going from                    a: 'one',
               | 
               | to                    a['one'],
               | 
               | just requires you to add two brackets and remove the
               | colon. With multiple cursors you can do that exact same
               | operation for all lines in a few keystrokes.
        
               | yamtaddle wrote:
               | It's having to go find the other block you want, copy and
               | paste it, and then set up the multiple cursors and type,
               | versus it just happening automatically without any of
               | that.
        
             | dmix wrote:
             | I've used Vim for over a decade I know what it can do.
             | 
             | This is automated and happens immediately without you even
             | thinking about it.
             | 
             | You only ever pull out the complicated Vim editing when you
             | have a particular hard task, I'm talking about the small
             | stuff many times a day.
        
       | Cloudef wrote:
       | Unless the copilot spits out complete programs or libraries that
       | are 1:1 to someone elses who cares? Caring about random small
       | code snippets is dumb.
        
       | [deleted]
        
         | [deleted]
        
       | [deleted]
        
       | hu3 wrote:
       | A a GitHub user, is there a way to support GitHub against this
       | lawsuit?
       | 
       | Obviously not financially as Microsoft has basically YES amounts
       | of money.
        
         | michaelmrose wrote:
         | If you had legal expertise and a strong opinion on the matter I
         | suppose you could write a persuasive brief for the
         | consideration of the court. If you have a strong opinion but
         | aren't a legal eagle you could write to your legislators in
         | support of legislation explicitly supporting this use case or
         | organize the support of people more capable in that arena.
         | 
         | If you are opinionated but lazy, no judgement here as I sit
         | here watching TV, you could add a notation at the top of your
         | repos explicitly supporting the usage of your code in such
         | tools as fair use.
         | 
         | Notably if your code is derivative of other works you have no
         | power to grant permission for such use for code you don't own
         | so best include some weasel words to that effect. Say.
         | 
         | I SUPPORT AND EXPLICITLY GRANT PERMISSION FOR THE USAGE OF THE
         | BELOW CODE TO TRAIN ML SYSTEMS TO PRODUCE USEFUL HIGH QUALITY
         | AUTOCOMPLETE FOR THE BETTERMENT AND UTILITY OF MY FELLOW
         | PROGRAMMERS TO THE EXTENT ALLOWABLE BY LICENSE AND LAW. NOTHING
         | ABOUT THIS GRANT SHALL BE CONSTRUED TO GRANT PERMISSION TO ANY
         | CODE I DO NOT OWN THE RIGHTS TO NOR ENCOURAGE ANY INFRINGING
         | USE OF SAID CODE.
         | 
         | Years from now when such cases are being heard and appealed ad
         | nauseam a large portion of repos bearing such notices may
         | persuade a judge that such use is a desired and normal use.
         | 
         | You could even make a GPLesque modification if you were so
         | included where you said. SO LONG AS THE RESULTING TOOLING AND
         | DATA IS MADE AVAILABLE TO ALL
         | 
         | Note not only am I not your lawyer, I am not a lawyer of any
         | sort so if you think you'll end up in court best buy the time
         | of an actual lawyer instead of a smart ass from the internet.
        
       | m00x wrote:
       | The only people who gain out of class lawsuits are the lawyers.
       | 
       | This person (a lawyer) saw an opportunity to make money and
       | jumped on it like a hungry tiger on fresh meat.
        
         | [deleted]
        
         | tasuki wrote:
         | I have quite a bit of respect for Matthew Butterick. I don't
         | think he's just a lawyer looking to earn a quick buck. He cares
         | about software and wants to make the world a better place.
         | 
         | > But neither Matthew Butterick nor anyone at the Joseph Saveri
         | Law Firm is your lawyer
         | 
         | This is curious. None of them are _my_ lawyers, but surely at
         | least some of them are _someone 's_ lawyers? Isn't it wrong to
         | put such a blanket disclaimer on a website which might well be
         | read by their clients?
        
         | alsodumb wrote:
         | This. I've seen so many class action lass suits where at the
         | end of the day the highest gain per Capita always ends up going
         | to the lawyers. Fuck this guy and everyone trying to make money
         | from this.
        
         | alpaca128 wrote:
         | So he gets to make money with his profession while defending
         | OSS licenses? I don't see the big problem.
        
       | cmrdporcupine wrote:
       | If Microsoft is so confident in the legality and ethics of
       | Copilot, and that it doesn't leak or steal proprietary IP... they
       | should go train it on the MS Word and Windows and Excel source
       | trees.
       | 
       | What's that? They don't want to do that? Why not?
        
       | atum47 wrote:
       | Forgive my ignorance, but who is going to benefit from this
       | lawsuit? I have a lot of code on GitHub, can I, for instance,
       | expect a check in the mail in case of a win?
        
         | gpm wrote:
         | (Not a lawyer, so this is really definitely absolutely not
         | legal advice and if you're looking to profit you should speak
         | to a lawyer... for instance the lawyers who just filed the
         | lawsuit)
         | 
         | They're asking for two things, injunctive relief (ordering
         | github/openai/microsoft to stop doing this) and damages.
         | 
         | I suppose the injunctive relief really benefits anyone who
         | doesn't want AI models to exist, because that's what it's
         | asking for.
         | 
         | The damages will go the members of the class certified for
         | damages, with more going to the lead plaintiffs (those actually
         | involved in the suit) and some going to the lawyers. They're
         | asking for the following class definition for damages
         | 
         | > All persons or entities domiciled in the United States that,
         | (1) owned an interest in at least one US copyright in any work;
         | (2) offered that work under one of GitHub's Suggested Licenses;
         | and (3) stored Licensed Materials in any public GitHub
         | repositories at any time during the Class Period.
        
       | Imnimo wrote:
       | On page 18, they show Copilot produces the following code:
       | 
       | >function isEven(n) {
       | 
       | > return n % 2 === 0;
       | 
       | >}
       | 
       | They then say, "Copilot's Output, like Codex's, is derived from
       | existing code. Namely, sample code that appears in the online
       | book Mastering JS, written by Valeri Karpov."
       | 
       | Surely everyone reading this has written that code verbatim at
       | some point in their lives. How can they assert that this code is
       | derived specifically from Mastering JS, or that Karpov has any
       | copyright to that code?
        
         | lelandfe wrote:
         | They determined the other `isEven()` function was cribbed from
         | Eloquent Javascript because of matching comments. I wonder if
         | the complaint just left off telltale comments from that
         | Mastering JS one?
        
           | Imnimo wrote:
           | Yeah, the other one I found much more persuasive. The extra
           | comments were unequivocally reproduced from the claimed
           | source. (although that output was from Codex, rather than
           | Copilot).
        
         | bogwog wrote:
         | That seems like a really bad choice of an example for this, but
         | as I haven't read the document I don't have any other context
         | beyond what you've posted here, I have to take your word for it
         | that that's the purpose of this snippet.
         | 
         | However, if you are looking to understand the reasoning behind
         | this lawsuit, there are lots of better examples online where
         | Copilot blatantly ripped off open source code.
        
         | counttheforks wrote:
         | I wrote that exact function the other day, and I've never even
         | heard of that book.
        
           | eddsh1994 wrote:
           | Yep, same. Not in JS, but in Haskell, for the Even Fib
           | project Euler problem. Something like a million people have
           | submitted right answers for that problem and assuming half
           | wrote their own filter rather than importing a isEven library
           | then that's half a million people there.
        
             | chowells wrote:
             | You don't need to write your own or import a library for
             | that in Haskell. It's in the Prelude.
        
           | moffkalast wrote:
           | I'd hire a legal team if I were you, the injunction is on the
           | way. /s
        
           | 0cf8612b2e1e wrote:
           | Should have used snake case. Would have avoided legal hot
           | water and established precedent.
        
         | williamcotton wrote:
         | There is no way in hell that isEven is covered by copyright.
         | 
         | "In computer programs, concerns for efficiency may limit the
         | possible ways to achieve a particular function, making a
         | particular expression necessary to achieving the idea. In this
         | case, the expression is not protected by copyright."
         | 
         | https://en.wikipedia.org/wiki/Abstraction-Filtration-Compari...
         | 
         | Think about how absurd this is. So if Microsoft was the first
         | company to write and publish an isEven function then no one
         | else can legally use it?
        
           | Phrodo_00 wrote:
           | > There is no way in hell that isEven is covered by
           | copyright.
           | 
           | Hey, I said the same thing about APIs, but here we are.
           | 
           | Edit: Actually, the Supreme Court declined ruling whether
           | APIs are copyrightable, but they did say that if they are,
           | reusing them like google reused the java apis in android
           | would fall under fair use. Given that lower courts did think
           | that APIs should be copyrightable, we don't know if they are
           | anymore.
        
           | kevin_thibedeau wrote:
           | There are software patents on bit twiddling operations that
           | people do end up having to work around.
        
             | tiahura wrote:
             | They do because it's cheaper to hire a coder to twiddle
             | than a lawyer to litigate.
        
             | CrazyStat wrote:
             | Patents and copyrights are completely different things.
        
           | eurasiantiger wrote:
           | Does that mean any perfectly optimal function is copyright-
           | free?
        
             | bawolff wrote:
             | Any function devoid of "creativity" is. No choices equal no
             | creativity.
             | 
             | As a note the same applies to logos. Very simple logos that
             | are only some lines and shapes, do not have copyright (in
             | usa)
        
               | squokko wrote:
               | Logos can still have trademark without having copyright
               | as creativity is not a requirement of trademarks.
        
         | leepowers wrote:
         | It's possible the complaint is using a trivial example to
         | illustrate the type of argument plaintiffs want to make during
         | any trial. A 200-line example is too unwieldy for non-
         | programmers to digest, especially given the formatting
         | constraints of a legal brief.
         | 
         | Look at paragraphs 90 and 91 on page 27 of the complaint[1]:
         | 
         | "90. GitHub concedes that in ordinary use, Copilot will
         | reproduce passages of code verbatim: "Our latest internal
         | research shows that about 1% of the time, a suggestion [Output]
         | may contain some code snippets longer than ~150 characters that
         | matches" code from the training data. This standard is more
         | limited than is necessary for copyright infringement. But even
         | using GitHub's own metric and the most conservative possible
         | criteria, Copilot has violated the DMCA at least tens of
         | thousands of times."
         | 
         | Does distributing licensed code without attribution on a mass
         | scale count as fair use?
         | 
         | If Copilot is inadvertently providing a programmer with
         | copyrighted code, is that programmer and/or their employer
         | responsible for copyright infringement?
         | 
         | There's a lot of interesting legal complications I think the
         | courts will want to adjudicate.
         | 
         | [1]
         | https://githubcopilotlitigation.com/pdf/1-0-github_complaint...
        
         | schleck8 wrote:
         | > Surely everyone reading this has written that code verbatim
         | at some point in their lives
         | 
         | Ironically their Twitter account uses a screenshot from a TV
         | series as profile picture. I wonder how legal that is, even if
         | meant as a joke.
         | 
         | https://twitter.com/saverlawfirm
         | 
         | Edit: It's been changed 2 minutes after I wrote this comment
        
           | zeven7 wrote:
           | This comment is 1 minute old and I only see a plain black
           | profile picture.
           | 
           | Or is your comment itself the joke?
        
             | schleck8 wrote:
             | They changed it, I'm 100 % sure. The profile picture was
             | Saul from Breaking Bad. I assume they read the comments
             | here and changed it in a matter of one or two minutes.
        
           | hdjjhhvvhga wrote:
           | Is there a Wayback Machine for Twitter?
        
         | [deleted]
        
         | nikanj wrote:
         | This reminds me of the SCO vs Linux lawsuits.
        
       | clusterhacks wrote:
       | Did Microsoft use the source code of Windows (in whole or in
       | part) as training input to Copilot?
        
       | renewiltord wrote:
       | It doesn't make sense. If I make a piece of software that curls a
       | random gist and then puts it into your editor am I infringing or
       | are you infringing when you run it or are you infringing when you
       | use that file and distribute it somewhere?
        
         | lbotos wrote:
         | > If I make a piece of software that curls a random gist and
         | then puts it into your editor am I infringing
         | 
         | Depends on the license. If it's MIT and you serve the license,
         | no, you are not infringing at all. A trimmed version of MIT for
         | the relevant bits:
         | 
         | Permission is hereby granted [...[ to any person obtaining a
         | copy of this software [..] to use, copy, modify, merge,
         | publish, distribute, sublicense, and/or sell copies of the
         | Software, [...] subject to the following conditions:
         | 
         | The above copyright notice and this permission notice shall be
         | included in all copies or substantial portions of the Software.
         | 
         | > are you infringing when you run it
         | 
         | Depends on the license
         | 
         | > are you infringing when you use that file and distribute it
         | somewhere
         | 
         | Depends on the license
         | 
         | ----
         | 
         | When copilot gives you code without the license, you can't even
         | know!
        
           | renewiltord wrote:
           | Well, `curl` will download a gist without checking its
           | license. So curl is infringing?
        
       | deanjones wrote:
       | This will fail very quickly. The licence that project owners
       | publish with their code on Github applies to third parties who
       | wish to use the code, but does not apply to Github. Authors who
       | publish their code on Github grant Github a licence under the
       | Github Terms: https://docs.github.com/en/site-policy/github-
       | terms/github-t...
       | 
       | Specifically, sections D.4 to D.7 grant Github the right to "to
       | store, archive, parse, and display Your Content, and make
       | incidental copies, as necessary to provide the Service, including
       | improving the Service over time. This license includes the right
       | to do things like copy it to our database and make backups; show
       | it to you and other users; parse it into a search index or
       | otherwise analyze it on our servers; share it with other users;
       | and perform it, in case Your Content is something like music or
       | video."
        
         | mldq wrote:
         | This is the standard content display license that everyone
         | uses. Even in your quoted text I don't see any hint that
         | snippets can be shown without attribution or the code license.
         | 
         | It also says they can't sell the code, which CoPilot is doing.
         | 
         | Also, in a very high number of cases it isn't the author who
         | uploads.
         | 
         | Repeating your line of argumentation (which occurs in every
         | CoPilot thread) does not make it true.
        
           | deanjones wrote:
           | It's irrelevant whether it's standard or not. Again, the
           | terms in the code licence (including attribution) do not
           | apply to Github, because that is not the licence under which
           | they are using the code. You grant them a separate licence
           | when you start using their service.
           | 
           | If someone who isn't the author has uploaded code which they
           | do not have a right to copy, they are liable, not Github.
           | This is also clear from the Github Terms: "If you're posting
           | anything you did not create yourself or do not own the rights
           | to, you agree that you are responsible for any Content you
           | post"
           | 
           | It's almost as if these highly paid lawyers know what they're
           | doing.
        
             | lpolk wrote:
             | You grant them a content display license, not a general
             | code license.
             | 
             | > It's almost as if these highly paid lawyers know what
             | they're doing.
             | 
             | Sure, they wrote the content display license long before
             | CoPilot even existed. Any court will see the intent and not
             | interpret these terms as a code re-licensing.
        
               | deanjones wrote:
               | There is no such thing as a "content display licence" or
               | "general code licence". There is copyright (literally,
               | the right to make copies) which broadly lies with the
               | author, who can then grant other parties a licence to
               | copy their content.
               | 
               | I'm afraid I do not believe your legal expertise is so
               | extensive that you are able to accurately predict the
               | judgement of "any court".
        
             | xigoi wrote:
             | > You grant them a separate licence when you start using
             | their service.
             | 
             | And that license explicitly states that it doesn't give
             | them the right to sell your code.
        
         | klabb3 wrote:
         | > Authors who publish their code on Github grant Github a
         | licence under the Github Terms:
         | https://docs.github.com/en/site-policy/github-terms/github-t...
         | 
         | This sounds unenforceable in the general case. How could github
         | know whether someone pushes their own code or not? Is it a
         | license violation to push someone's FOSS code to github because
         | the author didn't sign up with GH?
        
         | acdha wrote:
         | I don't see that being "quickly" - they'd have to get a judge
         | to agree that passing your code off without attribution for
         | other people to use as their own work is a normal service
         | improvement. Given that it's a separate feature with different
         | billing terms, I'm skeptical that it's anywhere near the given
         | that you're portraying it as.
        
           | deanjones wrote:
           | "Without attribution" is a condition of the licence that
           | applies to third-parties. It is not a condition of the
           | licence that applies to Github.
        
             | TAForObvReasons wrote:
             | It's worth reading the passage in its entirety and how a
             | court would interpret it:
             | 
             | > We need the legal right to do things like host Your
             | Content, publish it, and share it
             | 
             | > This license does not grant GitHub the right to sell Your
             | Content. It also does not grant GitHub the right to
             | otherwise distribute or use Your Content outside of our
             | provision of the Service, except that as part of the right
             | to archive Your Content, GitHub may permit our partners to
             | store and archive Your Content in public repositories in
             | connection with the GitHub Arctic Code Vault and GitHub
             | Archive Program.
             | 
             | If Copilot is straight-up reproducing work, and it is a
             | service that users have to pay to use, then it seems like
             | Copilot is "sell[ing] your content" and thus the license
             | does not apply.
             | 
             | More generally, a court is likely to look at the plain
             | English summary and judge. Copilot is not an integral part
             | of "the service" as developers understood it before Copilot
             | existed.
        
               | deanjones wrote:
               | "as necessary to provide the Service, including improving
               | the Service over time."
        
               | lamontcg wrote:
               | You're trying to play desperate semantic games.
               | 
               | "This license does not grant GitHub the right to sell
               | Your Content" is unambiguously clear.
        
               | deanjones wrote:
               | "desperate semantic games" is actually a reasonable
               | description of the legal process :-)
               | 
               | I'm not sure I agree that anything expressed in a legal
               | contract using natural language is "unambiguously clear".
               | MS / Gtihub's expensively-attired lawyers will not doubt
               | forcefully argue that they are not selling the YOUR
               | content, but a service based on a model generated from a
               | large collection of content, which they have been granted
               | a licence to "parse it into a search index or otherwise
               | analyze it on our servers". There may even be in-court
               | discussion of generalization, which will be exciting.
        
         | sigzero wrote:
         | If that is pretty much verbatim under their terms, then yes the
         | lawsuit is going nowhere.
        
       | nullc wrote:
       | I think if this is successful it will be very bad for the open
       | world.
       | 
       | Large platforms like github will just stick blanket agreements
       | into the TOS which grant them permission (and require you
       | indemnify them for any third party code you submit). By doing so
       | they'll gain a monopoly on comprehensively trained AI, and the
       | open world that doesn't have the lever of a TOS will not at all
       | be able to compete with that.
       | 
       | Copilot has seemed to have some outright copying problems,
       | presumably because its a bit over-fit. (perhaps to work at all it
       | must be because its just failing to generalize enough at the
       | current state of development) --- but I'm doubtful that this
       | litigation could distinguish the outright copying from training
       | in a way that doesn't substantially infringe any copyright
       | protected right (e.g. where the AI learns the 'ideas' rather than
       | verbatim reproducing their exact expressions).
       | 
       | The same goes for many other initiatives around AI training
       | material-- e.g. people not wanting their own pictures being used
       | to train facial recognition. Litigating won't be able to stop it
       | but it will be able to hand the few largest quasi-monopolisits
       | like facebook, google, and microsoft a near monopoly over new AI
       | tools when they're the only ones that can overcome the defaults
       | set by legislation or litigation.
       | 
       | It's particularly bad because the spectacular data requirements
       | and training costs already create big centralization pressures in
       | the control of the technology. We will not be better off if we
       | amplify these pressures further with bad legal precedents.
        
       | barelysapient wrote:
       | MSFT to $0 anyone?
        
       | EMIRELADERO wrote:
       | I think it's a great time to explain why this won't hit AI art
       | such as Stable Diffusion, even if GitHub loses this case.
       | 
       | The crux of the lawsuit's argument is that the AI unlawfully
       | _outputs copyrighted material_. This is evident in many tests
       | with many people here and on Twitter even getting _verbatim
       | comments_ out of it.
       | 
       | AI art, in the other hand, is not capable of outputting the
       | images from its training set, as it's not a collage-maker, but an
       | artificial brain with a paintbrush and virtual hand.
        
         | PuddleCheese wrote:
         | These models can actually output images that can be extremely
         | close to the material present in training models:
         | 
         | - https://i.imgur.com/VikPFDT.png
         | 
         | I also don't know if I would anthropomorphize ML to that
         | degree. It's a poor metaphor and isn't really analogous to a
         | human brain, especially considering our current understanding,
         | or lack thereof, of the brain, and even the limited insight we
         | have into how some of these models work from the people who
         | work on them.
        
         | jrochkind1 wrote:
         | Eh... I don't know. It sounds to me like you are saying because
         | the code example outputs _exact_ lines, it 's a copyright
         | violation; but the image AI's necessarily don't output exact
         | copies of even portions of pre-existing images, that's not how
         | they work.
         | 
         | But I don't think copyright on visual images actually works
         | like that, that it needs to be an _exact_ copy to infringe.
         | 
         | If I draw my own pictures of Mickey Mouse and Goofy having a
         | tea party, it's still a copyright infringement if it is
         | _substantially similar_ to copyright depictions of mickey mouse
         | and goofy. (subject to fair use defenses; I 'm allowed to do
         | what would otherwise have been a copyright infringement if it
         | meets a fair use defense, which is also not cut and dry, but if
         | it's, say, a parody it's likely to be fair use. There is
         | probably a legal argument that Copilot is fair use.... the more
         | money Github makes on it, the harder it is though, but making
         | money off something is not relevant to whether it's a copyright
         | violation in the first place, but is to fair use defense).
         | 
         | (yes, it might also be a trademark infringement; but there's a
         | reason Disney is so concerned with copyright on mickey
         | expiring, and it's not that they think there's lots of money to
         | be spent on selling copies of the specific Steamboat Willy
         | movie...)
         | 
         | > There is actually no percentage by which you must change an
         | image to avoid copyright infringement. While some say that you
         | have to change 10-30% of a copyrighted work to avoid
         | infringement, that has been proven to be a myth. The standard
         | is whether the artworks are "substantially similar," or a
         | "substantial part" has been changed, which of course is
         | subjective.
         | 
         | https://www.epgdlaw.com/how-can-my-artwork-steer-clear-of-co...
         | 
         | I think Stable Diffusion etc are quite capable of creating art
         | that is "substantially similar" to pre-existing art.
        
           | EMIRELADERO wrote:
           | I believe fair use is the way to go then. SD would definitely
           | be so, in my opinion.
        
         | solomatov wrote:
         | IMO, the case is exactly the same for copilot and generative
         | models for images. That's why it's so important to have some
         | precedent as a guide for future products.
         | 
         | P.S. I am not a lawyer.
        
         | kmnc wrote:
         | I don't understand this argument... if image AI gets good
         | enough then generating exact copies of its training model seems
         | trivial.
        
       | warbler73 wrote:
       | It seems obvious that AI models are derivative works of the works
       | they are trained on but it also seems obvious that it is totally
       | legally untested whether they are derivative works in the formal
       | legal sense of copyright law. So it should be a good case
       | _assuming_ we have wise and enlightened judges who understand all
       | nuances and can guide us into the future.
        
       | elcomet wrote:
       | This is why we can't have nice things. Copilot is the best thing
       | that happened in developper tools since a long time, it increased
       | a lot my productivity. Please don't ruin it.
        
       | rafaelturk wrote:
       | Like everything legally related: This is not about open source
       | fairness, protecting innovation, it's all about making money.
        
       | awestroke wrote:
       | If this leads anywhere I'll be pissed. I love CoPilot.
        
         | yamtaddle wrote:
         | I expect I'd love it but I've been holding off until I find out
         | whether MS lets devs on their core products use it.
         | 
         | If not, it's a pretty clear sign they consider it radioactive.
        
         | an1sotropy wrote:
         | copilot is great, and ignorance is bliss, isn't it
         | 
         | The situation that this lawsuit is trying to save you from is
         | this: (1) copilot blurps out some code X that you use, and then
         | redistribute in some form (monetized or not); (2) it turns out
         | company C owns copyright on something Y that copilot was
         | trained on, and then (3) C makes a strong case that X is part
         | of Y, and that your use of X does not fall under "fair use",
         | i.e. you infringed on the licensing terms that C set for Y.
         | 
         | You are now in legal trouble, and copilot put you there,
         | because it never warned that you X is part of Y, and that Y
         | comes with such and such licensing terms.
         | 
         | Whether we like copilot or not, we should be grateful that this
         | case is seeking to clarify some things are currently legally
         | untested. Microsoft's assertions may muddy the waters, but that
         | doesn't make law.
        
       | foooobaba wrote:
       | It seems like we should come to agreement on what the license is
       | intended for, given that when the licenses were created in a time
       | before AI like this existed. If the authors did not intend their
       | code to be used like this, should we not respect it? Also, does
       | it make sense to create new licenses which explicitly state
       | whether using it for AI training is acceptable or not - or are
       | our current licenses good enough?
        
       | herpderperator wrote:
       | The title of the submitted PDF document: "Microsoft Word -
       | 2022-11-02 Copilot Complaint (near final)"[0]
       | 
       | I've noticed this a lot and it's quite funny seeing what the
       | actual filename of the document was. Does this just get included
       | as metadata by default when you export to PDF?
       | 
       | [0]
       | https://githubcopilotlitigation.com/pdf/1-0-github_complaint...
        
         | mirekrusin wrote:
         | They should use github instead of sending "(final, 2nd
         | revision, really final, amended)" emails.
        
           | D13Fd wrote:
           | If only you could, with Word docs. Sadly you can't in any
           | meaningful way.
        
         | tasuki wrote:
         | The typography on that document is not great. Perhaps they
         | should read Matthew Butterick's book?
        
         | senkora wrote:
         | It does, yes. It's very annoying and I have occasionally
         | stripped it off of PDFs I've made, using exiftool.
        
         | bombcar wrote:
         | In word you can go to document properties or whatever and set
         | the Title and some other fields to control what gets into the
         | PDF.
        
       | SurgeArrest wrote:
       | I hope this case will fail and establish a good precedent for all
       | future AI litigations and may be even prevent new ones. Your code
       | is open source - irregardless of license, one might read it as a
       | text book and then remember or even copy snippets and re-use this
       | somewhere else unrelated to the original application. If you
       | don't like this, don't make your code open source. This was
       | happening and is happening independent of any license all over
       | the world by majority of developers. What Copilot and similar
       | tools did was to make those snippets accessible for extrapolation
       | in new applications.
       | 
       | If these folks win - we again throw progress under the bus.
        
         | humanwhosits wrote:
         | > irregardless of license
         | 
         | Hard no. Please stop using open source code if this is how you
         | think of it.
         | 
         | Without licenses being respected, we don't get open source
         | communities.
        
         | vesinisa wrote:
         | Open source does not mean public domain. Open source
         | specifically attaches limitations on how the code may be
         | reused.
        
           | elcomet wrote:
           | There are no limitations on reading the code to learn from
           | it.
        
             | MontagFTB wrote:
             | Perhaps the lawsuit contends that Copilot isn't in fact
             | learning how to code, but is rather regurgitating
             | information it has managed to glean and statistically
             | categorize, without any real understanding as to what it
             | was doing?
        
         | simion314 wrote:
         | > Your code is open source ....
         | 
         | So why MS can screw only with some licenses that you call "open
         | source". Your example with a human reading a book would also
         | work with code available licenses or decompiled binaries.
         | 
         | I would have been fine if the open source code was used to
         | create an open model or if MS would have put his ass on the
         | line and also train the model with all the GitHub code because
         | they claim there is no copyright issue.
        
         | tfsh wrote:
         | If organisations are going to ignore the licenses attached to
         | my OOS and that's legimitised in the law, then that's a
         | surefire way to irreparably damage the open source ecosystem
        
         | solomatov wrote:
         | The problem is that copyright laws were introduced for a
         | reason, and with a thinking similar to yours we might decide to
         | get rid of copyright altogether, which I think is a bad idea.
         | 
         | P.S. I am not a lawyer.
        
         | [deleted]
        
         | Etheryte wrote:
         | > Your code is open source - irregardless of license, one might
         | read it as a text book and then remember or even copy snippets
         | and re-use this somewhere else unrelated to the original
         | application.
         | 
         | Yes, but attribution should still be given. Just because you
         | don't copy-paste someone else's creation doesn't mean you're
         | licensed to use it.
        
           | shagie wrote:
           | Is it the role of the tool (in this case copilot) to include
           | the license information? Or is it the responsibility of the
           | organization using the code to make sure that it wasn't
           | copied from somewhere?
           | 
           | What if, instead of a tool, you had a random consultant do
           | some work, and it was found out that he asked a ton of stuff
           | on Stack Overflow and copied the CC-BY-SA 4.0 answers into
           | his work? What if it was then found out that one of _those_
           | answers was based on copying something from the Linux kernel?
           | Who is responsible for doing the license check on the code
           | before releasing the product?
        
             | alpaca128 wrote:
             | > Or is it the responsibility of the organization using the
             | code to make sure that it wasn't copied from somewhere?
             | 
             | Do you know whether the code you got from Copilot has an
             | incompatible license? No, so if you plan to use Copilot for
             | serious projects you need it to include sources/licenses
             | either way. In fact that would be a very helpful feature as
             | it would let you filter licenses.
        
         | jacooper wrote:
         | No thank you. I put a license to be followed, not to just be
         | disregarded by an AI as "Learning material". No human perfectly
         | reproduces their learning material no matter what, but Copilot
         | does.
        
           | mcluck wrote:
           | You mean to tell me that no one has ever perfectly replicated
           | an example that they read somewhere? There's only so many
           | ways to write AABB collision, fibonacci, or any number of
           | other common algorithms. I'm not saying there aren't things
           | to consider but I'm sure I've perfectly replicated something
           | I read somewhere whether I'm actively aware of it or not
        
           | IshKebab wrote:
           | So are you ok with it being illegal for humans to learn from
           | copyrighted books unless they have a license that explicitly
           | allows learning? That does not sound like a pleasant
           | consequence.
        
             | alpaca128 wrote:
             | Would you use an AI text generator to write a thesis? No,
             | there's a risk a whole chunk of it will be considered
             | plagiarism because you have no idea what the source of the
             | AI output is, but you know it was trained with unknown
             | copyrighted material. This has nothing to do with the way
             | humans learn, it's about correct attribution.
             | 
             | There is no technical reason why Microsoft can't respect
             | licenses with Copilot. But that would mean more work and
             | less training input, so they do code laundering and excuse
             | it with comparisons to human learning because making AI
             | seem more advanced than it is has always worked well in
             | marketing.
             | 
             | Edit: And where do you draw the line between "learning" and
             | copying? I can train a network to exactly reproduce
             | licensed code (or books, or movies) just like a human can
             | memorize it given enough time - and both of those would be
             | considered a copyright violation if used without correct
             | attribution. If you trained an AI model with copyrighted
             | data you will get copyrighted results with random variation
             | which might be enough to become unrecognizable if you're
             | lucky.
        
             | codyb wrote:
             | I doubt it, but they'd probably be against people quoting
             | copyrighted material verbatim without attribution in their
             | own work after.
        
             | Veen wrote:
             | It's a pleasant consequence for the person who spent years
             | becoming an expert and then writing the book. It's also a
             | pleasant consequence for the people who buy the book, which
             | might not have existed without a copyright system to
             | protect the writer's interests.
        
             | MontagFTB wrote:
             | I think they're taking issue with the unauthorized
             | duplication of copyrighted code. That's distinct from
             | learning how to code (which I don't think anyone would
             | claim Copilot is doing) which people get from reading a
             | book. If you were to read the book only to copy it verbatim
             | and resell it, you're going to have a bad time.
        
             | test098 wrote:
             | Here's the thing - the US has well-established laws around
             | copyright that don't consider learning from books a
             | violation of those copyrights. This lawsuit is intended to
             | challenge Copilot as a violation of licensing and isn't a
             | litigation of "how people learn." Your program stole my
             | code in violation of my license - there's a clear legal
             | issue here.
             | 
             | I'd pose a question to you - would it be okay for me to
             | copy/paste your code verbatim into my paid product in
             | violation of your license and claim that I'm just using it
             | for "learning"?
        
             | bun_at_work wrote:
             | AI are not humans, no human can read _all_ the code on
             | Github. They certainly can't read _all_ the code on Github
             | at the scale that MS can, and are unlikely to be able to
             | extract profits directly from that code, in violation of
             | the licensing.
        
       | celestialcheese wrote:
       | Maybe I'm being too cynical, but this feels like it's more a law
       | firm and individual looking to profit and make their mark in
       | legal history rather than an aggrieved individual looking for
       | justice.
       | 
       | Programmer/Lawyer Plaintiff + upstart SF Based Law Firm + novel
       | technology = a good shot at a case that'll last a long time, and
       | fertile ground to establish yourself as experts in what looks to
       | be a heavily litigated area over the next decade+.
        
         | squokko wrote:
         | Just like good people can try to do good things and end up
         | screwing things up badly, bad people can do bad things that
         | have positive effects.
        
           | efitz wrote:
           | I fail to see the positive effect here.
           | 
           | Just like Google's noble but misguided attempt to make all
           | the world's books searchable a few years back, what we have
           | here is IP law getting in the way of a societal goodness.
           | 
           | Copyright and patent are not natural; they're granted by law
           | "to promote progress in the useful arts". At first glance
           | here it appears that GitHub is promoting progress and the
           | plaintiffs are just rent-seeking.
        
         | undoware wrote:
         | If it wasn't Butterick I wouldn't be interested.
         | 
         | But I write this to you in Hermes Maia
        
         | jedberg wrote:
         | As my lawyer friend told me, a class action lawsuit is a
         | lawyer's startup. A lot of work for little pay with the chance
         | of a huge payout.
        
         | dkjaudyeqooe wrote:
         | But who cares? Who else is willing to fund litigation on this
         | important legal question? The real justice here is declarative
         | and benefits everyone.
         | 
         | No matter who litigates and for what reasons it will be
         | extremely valuable for good precedents to be set around the
         | question of things like Copilot and DALL-E with respect to
         | copyright and ownership. I'd rather have self interested
         | lawyers dedicated to winning their case than self interested
         | corporations fighting this out.
        
         | sam345 wrote:
         | yes, of course that's what it is. plaintiffs if they win will
         | get a few pennies, lawyers will get a lot.
        
         | AuryGlenz wrote:
         | I brought a class action suit against Sharp and I was the class
         | representative. They settled. The judge awarded me a whopping
         | $1,000 from the settlement money. From the time I put into it,
         | including 3 or 4 full days in NYC because my deposition
         | coincided with a snowstorm, I didn't exactly come out ahead
         | financially.
         | 
         | Obviously this is different for the reasons you stated, but I
         | didn't want people to think bringing a class action lawsuit
         | forward is a way to get rich. It's a bit of a joke, really.
        
         | varispeed wrote:
         | > rather than an aggrieved individual looking for justice.
         | 
         | How an aggravated individual can seek justice from a big
         | multinational corporation? That's not possible unless that
         | individual is a retired billionaire wanting to become a
         | millionaire.
        
         | grogenaut wrote:
         | I have a friend from highschool who does class action lawsuits.
         | He spends a very large amount of money funding his suits on
         | things like expert witnesses among other things, only 1 in 5
         | pays off, so it has to pay off well. His model is similar to
         | venture capitalism. Most of these cases take 5-7 years to
         | execute. So he basically takes out loans from another laywer to
         | fund them. His average pay for the last 10 years has been
         | around $140k/year. Some years he makes nothing and pays out a
         | lot, others he makes several million and pays back all the
         | loans. Another way to think of it is like giving money to tax
         | fraud wistleblowers.
         | 
         | Yes he does think of it somewhat like that, establishing
         | himself in an area. However a lot of his work comes from
         | finding people aggrieved by something not them finding him.
        
         | [deleted]
        
         | iudqnolq wrote:
         | One of the core principles of the American system of government
         | is that we outsource enforcement to private parties. Instead of
         | the public needing to fund enforcement with tax dollars private
         | parties undertake risky litigation in exchange for the chance
         | of a big payoff.
         | 
         | There is a reasonable argument that's a horrible system. But it
         | doesn't make sense to criticize the plaintiff looking for a
         | profit - the entire system has been set up such that that's
         | what they're supposed to do. If you're angry about it lobby for
         | either no rules or properly funded government enforcement of
         | rules.
        
           | thaumasiotes wrote:
           | > If you're angry about it lobby for either no rules or
           | properly funded government enforcement of rules.
           | 
           | No, there are plenty of other changes you might want to see.
           | 
           | For example, in the American system, judges are generally not
           | allowed to be aware of anything not mentioned by a party to
           | the case. There is no good reason for this.
        
           | onlycoffee wrote:
           | It's the two words, "government enforcement", that bothers
           | me. If your party is in control the words sound fine,
           | otherwise, they sound ominous.
        
             | nicoburns wrote:
             | Are you against policing? Because that's government
             | enforcement. Admittedly policing in the US is god awful,
             | but I still think most people would rather have it than no
             | police force at all.
             | 
             | Government enforcement of this kind of law is really no
             | different. It wouldn't be the legislature doing it.
        
             | falcolas wrote:
             | In an ideal situation, the enforcement would be managed by
             | boring employees who don't much care who's in power, since
             | they're not appointed.
             | 
             | AKA a vast majority of the non-legislative government
             | workers.
        
           | celestialcheese wrote:
           | That's entirely fair - and I'm not angry, just not convinced
           | in their arguments, especially when the motive is likely not
           | genuine.
           | 
           | As an aside - I'm almost positive MSFT/Github expected this
           | and their legal teams have been prepping for this moment.
           | Copyright Law and Fair Use in the US is so nuanced and vague
           | that anything created involving prior art by big-pocket
           | individuals or corporations will be litigated swiftly.
           | 
           | I expected one of these lawsuits to come first from Getty or
           | one of the big money artist estates against OpenAI or
           | Stability.ai, but Getty and OpenAI seem to be partnering
           | instead of litigating.
        
           | cube00 wrote:
           | Sounds like healthcare
        
           | lovich wrote:
           | > But it doesn't make sense to criticize the plaintiff
           | looking for a profit...
           | 
           | I don't know man, I can simultaneously see the systemic issue
           | that needs to be solved and also critique someone for
           | subcoming to base needs like greed when they don't have the
           | need.
        
             | CobrastanJorji wrote:
             | What they're doing is a service, though. Say that $10
             | million worth of damage against others has been done. If
             | the law firm does not act, the villainous curs who caused
             | that damage get to keep their money and are incentivized to
             | do it again. If the law firm does act and prevails, then
             | the villains lose their ill-gotten gains (in favor of the
             | law firm and, sometimes, to an extent, the injured
             | parties). That's preferable. Not ideal, but certainly
             | better than nothing.
        
               | lovich wrote:
               | That implies it's a service I want, which I have not
               | decided on in this situation. Either way I was more
               | arguing with the other posters claim that it "didn't make
               | sense" to critique this move, which I think is factually
               | incorrect since I can come up with a few plausible
               | situations where it does make sense
        
               | ImPostingOnHN wrote:
               | it doesn't imply it's a service you want, but rather, if
               | you do want it, you can opt-in to the service by joining
               | the lawsuit when the time comes
               | 
               | if you feel the class doesn't represent you, you can just
               | not opt-in
        
               | lovich wrote:
               | I perhaps wasn't clear, I meant that I am not sure I want
               | copilot constrained in this way. If I solidify that
               | belief into definitely not wanting copilot constrained,
               | then this would be a negative suit for me
        
             | ssteeper wrote:
             | Is a startup founder looking for a big payout succumbing to
             | greed?
             | 
             | These people are just following incentives.
        
               | lovich wrote:
               | People following financial incentives are being greedy,
               | this is how we got "greed is good" as a phrase
        
             | MikePlacid wrote:
             | But the need is obviously there. Everyone who produces the
             | following code in a non-university environment - for a fee!
             | - _needs_ to be punished quickly and severely:
             | 
             |  _Based on the given prompt, [Codex] produced the following
             | response:                    function isEven(n) {
             | if (n == 0)                   return true;
             | else if (n == 1)                   return false;
             | else if (n < 0)                   return isEven(-n);
             | else                   return isEven(n - 2);
             | }               console.log(isEven(50));               // -
             | true               console.log(isEven(75));
             | // - false               console.log(isEven(-1));
             | // - ??**
             | 
             | _
        
         | glerk wrote:
         | Correct. This is no different than patent trolls weaponizing
         | the justice system for personal gain. Nothing they claim or do
         | is in good faith and they should be treated as bad actors.
        
           | jacobr1 wrote:
           | It is a little different. The first patent troll that blazed
           | the trail gets both more credit (for ingenuity) and blame
           | (for the deleterious impact) in my opinion. I'll give the
           | same internet points to these guys.
        
           | vesinisa wrote:
           | How come? When people contributed code publicly they attached
           | a license how the code may be used. Is training an AI model
           | on this allowed? I think there's a fair, important and novel
           | legal question to be examined here.
           | 
           | Patent trolls usually file lawsuits that are just unmerited,
           | but rely simply on the fact that mounting a defence is more
           | expensive than settling.
        
         | henryfjordan wrote:
         | It can be and is both what you describe and a necessary feature
         | of our adversarial legal system.
         | 
         | Github can't really go to a court by themselves and ask "is
         | this legal?". There is the concept declaratory relief but you
         | need to be at least threatened with a lawsuit before that's on
         | the table.
         | 
         | So Github kinda just has to try releasing CoPilot and get sued
         | to find out. The legal system is setup to reward the lawyer who
         | will go to bat against them to find out if it is legal. The
         | plantiff (and maybe lawyer, depending on how the case is
         | financed) take the risk they are wrong just as Github had to.
         | 
         | It is setup this way to incentivize lawyers to protect
         | everyone's rights.
        
         | heavyset_go wrote:
         | This is a classic example of the ad hominem fallacy. Stating
         | that "they are no angels" doesn't detract from whether they're
         | right or capable of effecting positive legal change.
         | 
         | Frankly, I don't care if anyone makes a name for themselves for
         | doing this. In fact, I applaud them and would happily give them
         | recognition should they be successful.
         | 
         | Similarly, I'd hope that there are opportunties for profit in
         | this space, given that I don't want cheap lawyers botching this
         | case and setting terrible legal precedent for the rest of us.
         | Microsoft has a billion dollar legal team and they will do
         | everything they can to protect their bottom line.
        
       | Cort3z wrote:
       | I'm not a lawyer, but here is why I believe a class action
       | lawsuit is correct;
       | 
       | "AI" is just fancy speak for "complex math program". If I make a
       | program that's simply given an arbitrary input then, thought math
       | operations, outputs Microsoft copyright code, am I in the clear
       | just because it's "AI"? I think they would sue the heck out of me
       | if I did that, and I believe the opposite should be true as well.
       | 
       | I'm sure my own open source code is in that thing. I did not see
       | any attributions, thus they break the fundamentals of open
       | source.
       | 
       | In the spirit of Rick Sanchez; It's just compression with extra
       | steps.
        
         | njharman wrote:
         | Say you read a bunch of code, say over years of developer
         | career. What you write is influenced by all that. Will include
         | similar patterns, similar code and identical snippets,
         | knowingly or not. How large does snippet have to be before it's
         | copyright? "x"? "x==1"? "if x==1\n print('x is one')"?
         | [obviously, replace with actual common code like if not found
         | return 404].
         | 
         | Do you want to be vulnerable to copyright litigation for code
         | you write? Can you afford to respond to every lawsuit filed by
         | disgruntled wingbat, large corp wanting to shut down open
         | source / competing project?
        
         | rowanG077 wrote:
         | The brain is also just a "complex math program". Since math is
         | just the language we use to describe the world. I don't feel
         | this argument has any weight at all.
        
           | Supermancho wrote:
           | > The brain is also just a "complex math program".
           | 
           | This is not a fact.
        
             | rowanG077 wrote:
             | Explain yourself. There is not a understood natural
             | phenomenon which we could not capture in math. If you argue
             | behavior of the brain cannot be modeled using a complex
             | math program you are claiming the brain is qualitative
             | different then any mechanism known to man since the dawn of
             | time.
             | 
             | The physics that gives rise to the brain is pretty much
             | known. We can model all the protons, electrons and photons
             | incredibly accurately. It's an extraordinary claim you say
             | the brain doesn't function according to these known
             | mechanisms.
        
               | moralestapia wrote:
               | >Explain yourself.
               | 
               | Why? Burden of proof is on you.
        
               | heavyset_go wrote:
               | > _We can model all the protons, electrons and photons
               | incredibly accurately._
               | 
               | We can't even accurately model a receptor protein on a
               | cell or the binding of its ligands, nor can we accurately
               | simulate a single neuron.
               | 
               | This is one of those hard problems in computing and
               | medicine. It is very much an open question about how or
               | if we can model complex biology accurately like that.
        
               | rowanG077 wrote:
               | I didn't say we can simulate it. There is a massive leap
               | from what I said to being able to simulate it.
        
               | Supermancho wrote:
               | > There is not a understood natural phenomenon which we
               | could not capture in math.
               | 
               | This is a belief about our ability to construct models,
               | not a fact. Models are leaky abstractions, by nature.
               | Models using models are exponentially leaky.
               | 
               | > I didn't say we can simulate it.
               | 
               | Mathematics (at large) is descriptive. We describe matter
               | mathematically, as it's convenient to make predictions
               | with a shared modeling of the world, but the quantum of
               | matter is not an equation. f() at any scale of
               | complexity, does not transmute.
        
               | CogitoCogito wrote:
               | > There is not a understood natural phenomenon which we
               | could not capture in math.
               | 
               | Does the brain fall in into the category of "understood
               | natural phenomenon"? Is it "understood"? What does
               | "understood" mean in this context?
        
               | layer8 wrote:
               | You are confusing the nondiscrete math of physics with
               | the discrete math of computation. Even with unlimited
               | computational resources, we can't simulate arbitrary
               | physical systems exactly, or even with limited error
               | bounds. What a program (mathematical or not) in the
               | turing-machine sense can do is only a tiny, tiny subset
               | of what physics can do.
               | 
               | Personally I believe it's likely that the brain can be
               | reduced to a computation, but we have no proof of that.
        
               | bqmjjx0kac wrote:
               | > There is not a understood natural phenomenon which we
               | could not capture in math.
               | 
               | If all you have is a hammer...
               | 
               | The nature of consciousness is an open question. We don't
               | know whether the brain is equivalent to a Turing machine.
        
           | lisper wrote:
           | Somewhere in the complex math is the origin of whatever it is
           | in intellectual property that we deem worthy of protection.
           | Because we are humans, we take the complex math done by human
           | brains as worthy of protection _by fiat_. When a painter
           | paints a tree, we assign the property interest in the
           | painting to the human painter, not the tree, notwithstanding
           | that the tree made an essential contribution to the content.
           | The _whole point_ is to protect the interests of humans (to
           | give them an incentive to work). There is no other reason to
           | even entertain the _concept_ of  "property".
        
             | rowanG077 wrote:
             | Creations by AI should obviously be protected by fiat as
             | well. Anything else is a ridiculous double standard that
             | will stifle progress.
        
           | kadoban wrote:
           | The legal world tends to be less interested in these kind of
           | logical gotchas than engineering types would like. I don't
           | see a judge caring about that brain framing at all.
           | 
           | Not to mention, if your brain starts outputting Microsoft
           | copyright code, they're going to sue the shit out of you and
           | win, so I'm not sure how that would help even so.
        
           | yoyohello13 wrote:
           | So if I read the windows explorer source code, then later
           | produced a line for line copy (without referring back to the
           | source). Microsoft couldn't sue me?
        
           | bombolo wrote:
           | > The brain is also just a "complex math program"
           | 
           | Source?
        
             | rowanG077 wrote:
             | The physics that gives rise to the brain is pretty much
             | known. We can model all the protons, electrons and photons
             | incredibly accurately.
        
               | iampuero wrote:
               | I feel like this is a massive oversimplification...
               | 
               | In this answer, you're completely ignoring the massive
               | fact that we cannot create a human brain. Having
               | mathematical models about particles does not mean we have
               | "solved" the brain. Unless you're also believe that these
               | LLMs are actually behaving just like human brains, in
               | that have consciousness, they have logic, they dream,
               | they have nightmares, they produce emotions such as fear,
               | love, anger, that they grow and change over time, that
               | they controls body, your lungs, heart, etc...
               | 
               | You see my point, right? Surely you see that the
               | statement 'The brain is also just a "complex math
               | program"' is at best extremely over-simplistic.
        
               | bqmjjx0kac wrote:
               | > The physics that gives rise to the brain is pretty much
               | known
               | 
               | There is a gaping chasm between observing known physics,
               | and saying it is the _cause_ of consciousness.
               | 
               | You should read this:
               | https://en.wikipedia.org/wiki/Philosophy_of_mind
               | 
               | [ Edit: better link: https://en.wikipedia.org/wiki/Hard_p
               | roblem_of_consciousness ]
        
           | fsflover wrote:
           | It might be. If your brain generated verbatim someone's code
           | without following its license, you would also break
           | copyright, wouldn't you?
        
           | kyruzic wrote:
           | No it's actually not.
        
         | ugh123 wrote:
         | Attributions are fundamental to open source? I thought having
         | source openly available was fundamental to open source (and
         | allowed use without liability/warranty) as per apache, mit, and
         | other licenses.
         | 
         | If they just stick to using permissive-licensed source code
         | then i'm not sure what the actual 'harm' is with co-pilot.
         | 
         | If they auto-generate an acknowledgement file for all source
         | repos used in co-pilot, and then asked clients of co-pilot to
         | ship that file with their product, would that be enough? Call
         | it "The Extended Github Co-Pilot Derivative Use License" or
         | something.
        
           | heavyset_go wrote:
           | Attribution and inclusion of copies of licenses are
           | stipulations in almost all of the popular open source
           | licenses, including BSD and MIT licenses.
        
           | Cort3z wrote:
           | People would likely not share any code if they could not
           | trust that their work would be respected, and attributed. So
           | yes, I believe it to be fundamental to open source.
        
             | Aeolun wrote:
             | Maybe researchers that are used to hunting for publications
             | and attributions.
             | 
             | If I'm sharing my code publicly, it's because I want it to
             | be _used_.
        
           | TAForObvReasons wrote:
           | Attributions are fundamental to permissive licenses as well.
           | It's worth reading the licenses in question. MIT:
           | 
           | > The above copyright notice and this permission notice shall
           | be included in all copies or substantial portions of the
           | Software.
           | 
           | This is the "attribution" requirement that even a Copilot
           | trained on only-MIT code would miss.
           | 
           | If it were just about sharing code, there are public domain
           | declarations and variants like CC0 licenses
        
           | neongreen wrote:
           | Apparently they are using GPL-licensed code as well, see
           | https://twitter.com/DocSparse/status/1581461734665367554
           | 
           | After five minutes of googling I'm still not sure if using
           | MIT code requires an attribution, but many people claim it
           | does, see https://opensource.stackexchange.com/a/8163 as one
           | example
        
             | xigoi wrote:
             | From GitHub itself (emphasis mine):
             | 
             | > A short and simple permissive license with conditions
             | only _requiring preservation of copyright and license
             | notices_. Licensed works, modifications, and larger works
             | may be distributed under different terms and without source
             | code.
        
         | drvortex wrote:
         | Your code is not in that thing. That thing has merely read your
         | code and adjusted its own generative code.
         | 
         | It is not directly using your code any more than programmers
         | are using print statements. A book can be copyrighted, the
         | vocabulary of language cannot. A particular program can be
         | copyrighted, but snippets of it cannot, especially when they
         | are used in a different context.
         | 
         | And that is why this lawsuit is dead on arrival.
        
           | Cort3z wrote:
           | Just to be clear; I cannot prove that they have used my code,
           | but for the sake of argument, lets assume so.
           | 
           | They would have directly used my code when they trained the
           | thing. I see it as an equivalent of creating a zip-file. My
           | code is not directly in the zip file either. Only by the act
           | of un-zipping does it come back, which requires a sequence of
           | math-steps.
        
           | andrewmcwatters wrote:
           | This is demonstrably false. It is a system outputting
           | character-for-character repository code.[1]
           | 
           | [1]: https://news.ycombinator.com/item?id=33457517
        
             | Aeolun wrote:
             | Ok, cool. Presumably that is because it's smart enough to
             | know that there is only one (public) solution to the
             | constraints you set (like asking it to reproduce licensed
             | code).
             | 
             | Now, while you may be able to get it to reproduce one
             | function. One file, and definitely the whole repository
             | seems extremely unlikely.
        
             | naikrovek wrote:
        
               | xigoi wrote:
               | Individual words can't be copyrighted.
        
             | adriand wrote:
             | If I use Photoshop to create an image that is identical to
             | a registered trademark, is the rights violation my fault or
             | Adobe's fault?
        
               | xigoi wrote:
               | Photoshop can't produce copyrighted images on its own.
        
               | metadat wrote:
               | To play devil's advocate: Co-Pilot can't reproduce
               | copyrighted work without appropriate user input.
               | 
               | Just trying to demonstrate a point- this analogy seems
               | flawed.
        
               | heavyset_go wrote:
               | If I draw some eyes in Photoshop, it won't automatically
               | draw the Mona Lisa around it for me.
        
               | metadat wrote:
               | Until you sprinkle a bit of Stable Diffusion V2 or 3 on
               | it..
        
               | kyruzic wrote:
               | No because that's not a trademark violation in anyway.
               | Using GPL code in a non GPL project is a violation of
               | copyright law though.
        
             | pmarreck wrote:
             | It can be modified to not do that (example: mutating the
             | code to a "synonym" that is functionally but not visually
             | identical).
             | 
             | It can also be modified to be opt-in-only (only peoples'
             | code that they permit to be learned on, can use the
             | product)
        
               | falcolas wrote:
               | Perhaps you are right, and it could be so modified.
               | 
               | Could be, but isn't. And that matters.
        
           | lamontcg wrote:
           | > but snippets of it cannot
           | 
           | Yeah they can, and the whole functions that Copilot spits out
           | are quite obviously covered by copyright.
           | 
           | > especially when they are used in a different context.
           | 
           | That doesn't matter.
        
           | heavyset_go wrote:
           | Neutral nets can and do encode and compress the information
           | they're trained on, and can regurgitate it given the right
           | inputs. It is very likely that someone's code is in that
           | neural net, encoded/compressed/however you want to look at
           | it, which Copilot doesn't have a license to distribute.
           | 
           | You can easily see this happen, the regurgitation of training
           | data, in an over fitted neural net.
        
             | naikrovek wrote:
             | > which Copilot doesn't have a license to distribute
             | 
             | when you upload code to a public repository on github.com,
             | you necessarily grant GitHub the right to host that code
             | and serve it to other users. the methods used for serving
             | are not specified. This is above and beyond the license
             | specified by the license you choose for your own code.
             | 
             | you also necessarily grant other GitHub users the right to
             | view this code, if the code is in a public repository.
        
               | eropple wrote:
               | Host _that_ code. Serve _that_ code to other users. It
               | does not grant the right to create _derivative works of
               | that code_ outside the purview of the code 's license.
               | That would be a non-starter in practice; see every
               | repository with GPL code not written by the repository
               | creator.
               | 
               | Whether the results of these programs is somehow Not A
               | Derivative Work is the question at hand here, not
               | "sharing". I think (and I hope) that the answer to that
               | question won't go the way the AI folks want it to go; the
               | amount of circumlocution needed to excuse that the _not
               | actually thinking and perceiving program_ is deriving
               | data changes from its copyright-protected inputs is a
               | tell that the folks pushing it know it 's silly.
        
               | naikrovek wrote:
               | copilot isn't creating derivative works: copilot users
               | are.
               | 
               | the human at the keyboard is responsible for what goes
               | into the source code being written.
               | 
               | to aid copilot users here, they are creating tools to
               | give users more info about the code they are seeing:
               | https://github.blog/2022-11-01-preview-referencing-
               | public-co...
        
               | heavyset_go wrote:
               | It's served under the terms of my licenses when viewed on
               | GitHub. Both attribution and licenses are shared.
               | 
               | This is like saying GitHub is free to do whatever they
               | want with copyrighted code that's uploaded to their
               | servers, even use it for profit while violating its
               | licenses. According to this logic, Microsoft can
               | distribute software products based on GPL code to users
               | without making the source available to them in violation
               | of the terms of the GPL. Given that Linux is hosted on
               | GitHub, this logic would say that Microsoft is free to
               | base their next version of Windows on Linux without
               | adhering to the GPL and making their source code
               | available to users, which is clearly a violation of the
               | GPL. Copilot doing the same is no different.
        
             | CuriouslyC wrote:
             | This is not necessarily true, the function space defined by
             | the hidden layers might not contain an exact duplicate of
             | the original training input for all (or even most) of the
             | training inputs. Things that are very well represented in
             | the training data probably have a point in the function
             | space that is "lossy compression" level close to the
             | original training image though, not so much in terms of
             | fidelity as in changes to minor details.
        
               | heavyset_go wrote:
               | When I say encoded or compressed, I do not mean verbatim
               | copies. That can happen, but I wouldn't say it's likely
               | for every piece of training data Copilot was trained on.
               | 
               | Pieces of that data are encoded/compressed/transformed,
               | and given the right incantation, a neutral net can put
               | them together to produce a piece of code that is
               | substantially the same as the code it was trained on.
               | Obviously not for every piece of code it was trained on,
               | but there's enough to see this effect in action.
        
           | xtracto wrote:
           | Say you publish a song and copyright it. Then I record it and
           | save it in a .xz format. It's not an MP3, it is not an audio
           | file. Say I _split it_ into N several chunks and I share it
           | with N different people. Or with the same people, but I share
           | it at N different dates. Say I charge them $10 a month for
           | doing that, and I don 't pay you anything.
           | 
           | Am I violating your copyright? Are you entitled to do that?
           | 
           | To make it funnier: Say instead of the .xz, I "compress" it
           | via p compression [1]. So what I share with you is a pair of
           | p indices and data lengths for each of them, from which you
           | can "reconstruct" the audio. Am I illegally violating your
           | copyrights by sharing that?
           | 
           | [1] https://github.com/philipl/pifs
        
             | 2muchcoffeeman wrote:
             | I was thinking of something similar as a counter argument
             | and lo and behold, it's a real thing maths has solved with
             | a real implementation.
        
             | Aeolun wrote:
             | What you are actually giving people is a set of chords that
             | happen to show up in your song, the machine can suggest an
             | appropriate next chord.
             | 
             | It's also smart enough to rebuild your song from the chords
             | _if you ask it to_.
        
               | varajelle wrote:
               | I take your code and I compress it in a tar.gz file. Il
               | call that file "the model". Then I ask an algorithm
               | (Gzip) to infer some code using "the model". The
               | algorithm (gzip) just learned how to code by reading your
               | code. It just happened to have it memorized in its model.
        
           | moralestapia wrote:
           | Whatever you say man :^)
           | 
           | https://twitter.com/docsparse/status/1581461734665367554
        
           | klabb3 wrote:
           | > Your code is not in that thing. That thing has merely read
           | your code and adjusted its own generative code.
           | 
           | This is kinda smug, because it overcomplicates things for no
           | reason, and only serves as a faux technocentric strawman. It
           | just muddies the waters for a sane discussion of the topic,
           | which people can participate in without a CS degree.
           | 
           | The AI models of today are very simple to explain: its a
           | product built from code (already regulated, produced by the
           | implementors) and source data (usually works that are
           | protected by copyright and produced by other people). It
           | would be a different product if it didn't have used the
           | training data.
           | 
           | The fact that some outputs are similar enough to source data
           | is circumstantial, and not important other than for small
           | snippets. The elephant in the room is the _act of using_
           | source data to produce the product, and whether the right to
           | decide that lies with the (already copyright protected)
           | creator or not. That 's not something to dismiss.
        
             | [deleted]
        
           | NicoleJO wrote:
           | You're wrong. See exposed code.
           | https://justoutsourcing.blogspot.com/2022/03/gpts-
           | plagiarism...
        
         | smoldesu wrote:
         | > "AI" is just fancy speak for "complex math program"
         | 
         | Not really? It's less about arithmetic and more about
         | inferencing data in higher dimensions than we can understand.
         | Comparing it to traditional computation is a trap, same as
         | treating it like a human mind. They've very different, under
         | the surface.
         | 
         | IMO, if this is a data problem then we should treat it like
         | one. Simple fix - find a legal basis for which licenses are
         | permissive enough to allow for ML training, and train your
         | models on that. The problem here isn't developers crying out in
         | fear of being replaced by robots, it's more that the code that
         | it _is_ reproducing is not licensed for reproduction (and the
         | AI doesn 't know that). People who can prove that proprietary
         | code made it into Copilot deserve a settlement. Schlubs like me
         | who upload my dotfiles under BSD don't fall under the same
         | umbrella, at least the way I see it.
        
           | Cort3z wrote:
           | Who decides what constitutes an "AI program" vs just a
           | "program"? What heuristic do we look at? At the end of the
           | day, they have an equivalent of a .exe which runs, and
           | outputs code that has a license attached to it.
        
           | heavyset_go wrote:
           | I've been saying AI is computational statistics on steroids
           | for a while, and I think that's an apt generalization of what
           | ML is.
        
           | 2muchcoffeeman wrote:
           | But it all runs on hardware we created and we know exactly
           | what operations were implemented in that hardware. How is it
           | not just math?
        
         | sigzero wrote:
         | > I'm not a lawyer, but
         | 
         | Should have stopped there.
        
           | Cort3z wrote:
           | Why?
        
             | sigzero wrote:
             | Dang it. I was coming back to delete that comment. It was a
             | stupid one.
        
           | operatingthetan wrote:
           | This is not a thread for lawyers to discuss only.
        
         | benlivengood wrote:
         | Humans are just compression with extra steps by that logic.
         | 
         | There's a fairly simple technical fix for codex/copilot anyway;
         | stick a search engine on the back end and index the training
         | data and don't output things found in the search engine.
        
         | cdrini wrote:
         | I haven't heard anyone saying that copilot is legal "just
         | because it's AI." That's a pretty bad faith, reductive, and
         | disingenuous representation. The core argument I've seen is
         | that the output is sufficiently transformative and not straight
         | up copying.
        
         | spiralpolitik wrote:
         | At this point we are back in the territory that the idea and
         | the expression of the idea are inseparable, therefore the
         | conclusion will be that copyright protection does not apply to
         | code.
         | 
         | Personally I think this has the potential to blow up in
         | everyones faces.
        
           | pevey wrote:
           | If it does end up that way, I feel like the trickle away from
           | github will become a stampede. And that would be unfortunate.
           | Having such a good hub for sharing and learning code is
           | useful, but only if licenses are respected. If not, people
           | will just hunker down and treat code like the Coke secret
           | recipe. That benefits no one.
        
       | VoodooJuJu wrote:
       | As celestialcheese says [1], it seems like a manufactured case
       | for the purpose of furthering someone's legal career rather than
       | seeking remittance for any violations made by Copilot.
       | 
       | But I like to put on my conspiracy hat from time to time, and
       | right now is one such time, so let's begin...
       | 
       | Though the motivations behind this case are uncertain, what is
       | certain is that this case will establish a precedent. As we know,
       | precedents are very important for any further rulings on cases of
       | a similar nature.
       | 
       | Could it be the case that Microsoft has a hand in this, in trying
       | to preempt a precedent that favors Copilot in any further
       | litigation against it?
       | 
       | Wouldn't put it past a company like Microsoft.
       | 
       | Just a wild thought I had.
       | 
       | [1] https://news.ycombinator.com/item?id=33457826
        
         | [deleted]
        
       | [deleted]
        
       | 60secs wrote:
       | This is why we can't have nice dystopias.
        
         | [deleted]
        
       | fancyfredbot wrote:
       | If a software developer learns how to code better by reading GPL
       | software and then later uses the skills they developed to build
       | closed source for profit software should they be sued?
        
         | Phrodo_00 wrote:
         | Depends on how closely they reuse the code. Writing it verbatim
         | or nearly? Yes.
        
         | jacooper wrote:
         | A human doesn't perfectly reproduce the same code he learned
         | from.
        
         | buzzy_hacker wrote:
         | Copilot is not a person, it is a piece of software.
        
         | thomastjeffery wrote:
         | If a software developer writes a program to remember a million
         | lines of GPL code, then uses that dataset to "generate" some of
         | that code, then they are essentially violating that license
         | with extra steps.
         | 
         | The extra steps aren't enough to exhonorate them. It's just a
         | convoluted copy operation.
         | 
         | Is just like how a lossy encoding of a song is still - with
         | respect to copyright - a copy of that song. The data is totally
         | different, and some of the original is missing. It's still a
         | derivative work. So is a remix. So is a reperformance.
        
       | protomyth wrote:
       | I really feel that Andy Warhol Foundation for the Visual Arts,
       | Inc. v. Goldsmith[0] is going to have a big effect on this type
       | of thing. They are basically relying on their AI magic to make it
       | transformative. I'm starting to think the era of learning from
       | material other people own without a license / permission is going
       | to end quickly.
       | 
       | 0) https://www.scotusblog.com/case-files/cases/andy-warhol-
       | foun...
        
       | sensanaty wrote:
       | I personally hope they win, and win big. Anything that ruins
       | Micro$oft's day is a boon to mine.
        
       | cothrowaway88 wrote:
       | Made a throwaway since I guess this stance is controversial. I
       | could not care less about how copilot was made and what kind of
       | code it outputs. It's useful and was inevitable.
       | 
       | I'm 1000% on team open source and have had to refer to things
       | like tldrlegal.com many times to make sure I get all my software
       | licensing puzzle pieces right. Totally get the argument for why
       | this litigation exists in the present.
       | 
       | Just saying in general my friends I hope you have an absolutely
       | great day. Someone will be wrong on the internet tomorrow, no
       | doubt about it. Worry about something productive instead.
       | 
       | This one has the feel of being nothing more than tilting at
       | windmills in the long run.
        
       | eurasiantiger wrote:
       | Maybe we just need to prompt it to include the proper licenses
       | and attributions. /s
        
         | tmtvl wrote:
         | Eh, I don't mind Copilot being trained on my code as long as it
         | and all projects made using it are licensed under the AGPL.
        
       | karaterobot wrote:
       | Does everybody credit the author when using Stack Overflow code?
       | I have, but don't always. Not that I'm trying to steal, I just
       | don't take the time, especially in personal projects.
       | 
       | This isn't exactly the same thing, but it seems to me that three
       | of the biggest differences are:
       | 
       | 1. Stack Overflow code is posted for people to use it (fair
       | enough, but they do have a license that requires attribution
       | anyway, so that's not an escape)
       | 
       | 2. Scale (true; but is it a fundamental difference?)
       | 
       | 3. People are paying attention in this case. Nobody is scanning
       | my old code, or yours, but if they did, would they have a case?
       | 
       | I dunno. I'm more sympathetic to visual artists who have their
       | work slurped up to be recapitulated as someone else's work via
       | text to image models. Code, especially if it is posted publicly,
       | doesn't feel like it needs to be guarded. I'm not saying this is
       | _correct_ , just saying that's my reaction, and I wonder why it's
       | wrong.
        
       | pmarreck wrote:
       | This will fail. Copilot is too good, and only suggests snippets
       | or small functions, not entire classes for example.
        
       | naillo wrote:
       | I'm kinda sceptical that this goes anywhere given that basically
       | they say that whatever copilot outputs is your responsibility to
       | vet that it doesn't break any copyright (obviously that goes
       | against the promise of it and the PR but that's the small print
       | that gets them out of trouble).
        
         | heavyset_go wrote:
         | Saying "it's your responsibility to not breach licenses or
         | violate copyright" doesn't absolve your service from breaching
         | licenses and violating copyright itself.
        
           | mdaEyebot wrote:
           | "It is the customer's responsibility to ensure that they only
           | drink the water molecules which come out of their tap, and
           | not the lead ones."
        
           | golemotron wrote:
           | Yet we all use web browsers that copy copyrighted text from
           | buffer to buffer all the time. This doesn't even include all
           | of the copying that ISPs perform.
           | 
           | It might be fair to say that the read performed in training
           | has the same character since no human is involved.
           | 
           | The real copyright violation would be using a derived work.
        
             | heavyset_go wrote:
             | A browser isn't a amalgamation of billions of pieces of
             | other works. A browser executes and renders code it's
             | served.
             | 
             | Copilot's corpus is quite literally tomes of copyrighted
             | work that are encoded and compressed in its neural network,
             | from which it launders that work to create similar works.
             | Copilot itself, the neutral network, is that corpus of
             | encoded and compressed information, you can't separate the
             | two. Copilot stores and distributes that work without any
             | input from rightsholders, and it does it for profit.
             | 
             | A better analogy would be between a browser and a file
             | server filled with copyrighted movies whose operator
             | charges $10/mo for access. The browser is just a browser in
             | this analogy, where the file server is the corpus that
             | forms Copilot itself.
        
             | ginsider_oaks wrote:
             | the actual copying isn't a problem, it's distribution. if i
             | buy access to a PDF i'm not going to get in trouble for
             | duplicating the file unless i send it to someone else.
             | 
             | when someone uploads their copyrighted text to a web page
             | they are distributing it to whoever visits that page. the
             | browser is just the medium.
        
               | golemotron wrote:
               | Is that the legal standard in copyright cases?
        
           | [deleted]
        
         | shoshoshosho wrote:
         | You could argue that it's the individual projects using copilot
         | that are violating here, I guess? Like you can use curl or git
         | to dump some AGPL code into your commercial closed software but
         | no one would (hopefully) blame those tools.
         | 
         | So copilot is fine but anyone using it must abide by the
         | collective set of licenses that it used to write code for
         | you...?
        
         | BeefWellington wrote:
         | If a license requires attribution, and you reproduce the code
         | without attribution using your editor plugin, it seems to me
         | the infringement is on the editor plugin.
         | 
         | Note that even licenses like MIT ostensibly require
         | attribution.
        
         | dmitrygr wrote:
         | So, if i made napster 2.0 and said that it is your job to make
         | sure that you do not download anything copyrighted, that would
         | be ok?
        
           | charcircuit wrote:
           | Yes that would be okay. It would also be okay to create
           | Internet 2.0.
        
           | nicolashahn wrote:
           | That's basically the situation for any torrent client
        
             | yamtaddle wrote:
             | Well, if the trackers also hosted mixed-up blocks of data
             | for all the torrents they tracked and their protection was
             | "LOL make sure you don't accidentally download any of these
             | tiny data blocks in the correct order to reconstruct the
             | copyrighted material they may be parts of _wink_ "
        
           | eurasiantiger wrote:
           | Isn't that already how everything on the internet works?
        
             | donatj wrote:
             | I think it's arguably how anything works. You can have a
             | fork, but if you stab it in someones eye that's on you.
        
           | donatj wrote:
           | Yep. That's exactly why Bittorrent clients can exist.
        
           | dmalik wrote:
           | You mean like every torrent client that currently exists?
        
           | ketralnis wrote:
           | I think you're looking for consistency that the legal system
           | just doesn't provide. The music industry is more organised
           | and litigious than the software industry and that gives them
           | power that you and I don't have. If you called it "Napster
           | 2.0" specifically you'd probably be prevented from shipping
           | by a preliminary injunction. Is that fair or consistent? No.
           | But it's the world we live in. Programmers want laws to be
           | irrefutable and executable logic but they just aren't.
        
           | [deleted]
        
           | brookst wrote:
           | The legal system takes intent into account.
           | 
           | So if you produce napster 2.0 to be the best music piracy
           | tool, and you test it for piracy, and you promote it for
           | piracy... you're going to have trouble.
           | 
           | If you produce napster 2.0 as a general purpose file sharing
           | system, let's call it a torrent client, and you can claim no
           | ill intent... you may have trouble but it's a lot more
           | defensible in court.
           | 
           | I would find it a big stretch to say Github's intent here is
           | to illegally distribute copyrighted code. No judgment on
           | whether the class action has any merit, just saying I would
           | be very surprised if discovery turns up lots of emails where
           | Github execs are saying "this is great, it'll let people
           | steal code."
        
             | kube-system wrote:
             | > I would find it a big stretch to say Github's intent here
             | is to illegally distribute copyrighted code.
             | 
             | Almost everything on GitHub is subject to copyright, except
             | for some very old works (maybe something written by Ada
             | Lovelace?), and US government works not eligible for
             | copyright.
             | 
             | Now, many of the works there are also licensed under
             | permissive licenses, but that is only a defense to
             | copyright infringement if the terms of those licenses are
             | being adequately fulfilled.
        
               | brookst wrote:
               | > Almost everything on GitHub is subject to copyright,
               | 
               | Agreed. Like I said, it's about intent. Can anyone say
               | with a straight face that copilot is an elaborate scheme
               | to profit by duplicating copyrighted work?
               | 
               | I don't think the defense is that it wasn't trained on
               | copyrighted data. It obviously was.
               | 
               | I think the defense is that anything, including a person,
               | that learns from a large corpus of copyrighted data will
               | sometimes produce verbatim snippets that reflect their
               | training data.
               | 
               | So when it comes to copyright infringement, are we moving
               | the goalposts to where merely learning from copyrighted
               | material is already infringement? I'm not sure I want to
               | go there.
        
           | jasonlotito wrote:
           | Now, IANAL, but iirc, that is all 100% okay and legal. In
           | fact, I can even download copyrighted music and movies
           | without issue. So, I don't even need to make sure I don't
           | download anything under copyright.
           | 
           | The issue isn't downloading copyrighted stuff.
           | 
           | Rather, it's making available and letting others download it.
           | That was where you got in trouble.
        
             | heavyset_go wrote:
             | Knowingly downloading copyrighted material, say to get it
             | for free, still violates the rights of the copyright
             | holders. It's just that litigating against members of the
             | public is bad PR and not exactly lucrative, especially when
             | it's likely that kids downloaded the content.
             | 
             | People used to get busted from buying bootleg VHS and DVDs
             | on the street before P2P filesharing was a common thing.
             | Then, early on, people were sued for downloading
             | copyrighted files before rightsholders decided to take a
             | different legal strategy to go after sharers and
             | bootleggers.
        
           | heavyset_go wrote:
           | This is a bad analogy because P2P networks exist that are
           | legal to operate, because Section 230 of the CDA prevents
           | interactive computer services from being held responsible for
           | user generated content.
           | 
           | What made Napster illegal is that the company did not create
           | their network for fair use of content, but to explicitly
           | violate copyright for profit.
           | 
           | Copilot is like Napster in this case, in that both services
           | launder copyrighted data and distributed it to users for
           | profit.
           | 
           | Copilot is not like other P2P networks that exist to share
           | data that is either free to distribute or can be used under
           | the fair use doctrine. Copilot explicitly takes copyrighted
           | content and distributes it to users in violation of licenses,
           | that's its explicit purpose.
           | 
           | It's entirely possible to make a Copilot-like product that
           | was trained on data that doesn't have restrictive licensing
           | in the same way it's entirely possible to create a P2P
           | network for sharing files that you have the right to share
           | legally.
        
         | stonemetal12 wrote:
         | If I remember correctly that only works if you can prove that
         | your system has "substantial non infringing use".
        
       | foooobaba wrote:
       | If github or google indexes source code using a neural net to
       | help you find it, given a query, is that also illegal? If you
       | think of copilot as something that helps you find code you're
       | looking for, is it all that different, and if so, why?
       | 
       | In this case, wouldn't the users of copilot be the ones
       | responsible for any copyrighted code they may have accessed using
       | copilot?
        
         | leni536 wrote:
         | Both services already accept DMCA notices to take content down.
        
           | foooobaba wrote:
           | True, that's another good point.
        
         | lbotos wrote:
         | The crux of the issue: Is the code that is being generated
         | being used in a way that it's license allows? That's it. I'm
         | confident that this problem would go away if copilot said:
         | 
         | //below output code is MIT licensed (source: github/repo/blah)
         | 
         | And yes, the "users" are responsible, but it's possible that
         | copilot could be implicated in a case depending on how it's
         | access is licensed.
         | 
         | Stable diffusion has this same problem btw, but in visual arts
         | "fair use" is even murkier.
         | 
         | For code, if you could use the code and respect the license,
         | why wouldn't you? Copilot takes away that opportunity and
         | replaces it with "trust us".
        
           | foooobaba wrote:
           | This makes sense, it produces chunks not the whole source
           | where a search engine would also give you the license.
        
       | arpowers wrote:
       | The proper way to think about these LLM is similar to plagiarism.
       | 
       | Seems to me the underlying data should be opt-in from creators
       | and licenses should be developed that take AI into consideratiin.
        
       | thesuperbigfrog wrote:
       | How original is the generated code?
       | 
       | Can the generated code be traced back to the code used for
       | training and the original copyrights and licenses for that code?
       | 
       | If so, what attribution(s) and license(s) should apply to the
       | generated code?
        
         | dmitrygr wrote:
         | They demonstrate generated code being _identical_ to some
         | training code.
        
           | Swizec wrote:
           | How many ways are there to write many of the basic algorithms
           | we all use though? Can I copyright "({ item }) =>
           | <li>{item.label}</li>"?
           | 
           | Because I sure have seen that exact code written, from
           | scratch, in many _many_ places.
           | 
           | I guess my question boils down to _" What is the smallest
           | copyrightable unit of code?"_. Because I'm certain suing a
           | novelist for copyright infringement on a character that says
           | "Hi, how are you?" would be considered absurd.
        
             | googlryas wrote:
             | No specific sources to provide, but a lot of analyses were
             | written about this question regarding the Google v Oracle
             | java API lawsuit.
        
           | avian wrote:
           | There were well known examples of copilot reproducing exact
           | code snippets well before this lawsuit (e.g. the Quake's fast
           | inverse square root function). Microsoft dealt with them by
           | simply adding the offending function names to a blocklist.
           | 
           | In other words, if your open source project doesn't have such
           | immediately recognizable code and didn't cause a shitstorm on
           | Twitter, chances are copilot is still happily spewing out
           | your exact code, sans the copyright and license info.
        
           | m00x wrote:
           | Just like developers have _never_ copy-pasted code from stack
           | overflow or Github :):):)
        
       | ggerganov wrote:
        
         | omnimus wrote:
         | Always consider that maybe you don't fully understand what it
         | actually does.
        
         | [deleted]
        
         | pvg wrote:
         | _Please don 't sneer, including at the rest of the community._
         | 
         | https://news.ycombinator.com/newsguidelines.html
        
         | sirsinsalot wrote:
         | That's not really right.
         | 
         | Copilot isn't just "displaying" something. Copilot has mined
         | the collective efforts of developers in an effort to produce
         | derivative works, without permission, re-distributing that
         | value without giving anything back.
         | 
         | It'd be like suing Adobe because photoshop comes bundled with a
         | your holiday photos, without permission, and uses those to in a
         | "family photos" filter.
         | 
         | Large scale mining of value and then selling it without due
         | credit or reward for those you stole that value from is plain
         | theft.
        
       | finneganscat wrote:
        
       | spir wrote:
       | The part of GitHub Copilot to which I object is that it's trained
       | on private repos. Where does GitHub get off consuming explicitly
       | private intellectual property for their own purposes?
        
       | RamblingCTO wrote:
       | lol @ "open-source software piracy"
       | 
       | If I'm being honest I'm a bit annoyed at this. What's the problem
       | and what's the point of this?
        
         | opine-at-random wrote:
         | If you'd ever read even a single one of the licenses to the
         | software I'm sure you use everyday, you'd understand. This is
         | such an obvious and pathetic strawman.
         | 
         | I notice often on hackernews that people don't seem to
         | understand anything about free or open-source software outside
         | of the pragmatics of whether they can abuse the work for free.
        
         | bpodgursky wrote:
         | Lawyers want $$$$.
        
           | RamblingCTO wrote:
           | Yeah I guess so. This website reads like bullshit bingo from
           | some weird twitter dude trying to sell you his newest
           | product:
           | 
           | "AI needs to be fair & ethical for everyone. If it's not,
           | then it can never achieve its vaunted aims of elevating
           | humanity. It will just become another way for the privileged
           | few to profit from the work of the many."
           | 
           | Blah blah. Can we get back to the hacking on stuff mentality?
        
             | gcmrtc wrote:
             | Looks like that lawyer guy is not new on hacking stuff:
             | https://matthewbutterick.com/
             | 
             | Not exactly the curriculum of a twitter weirdo.
        
               | RamblingCTO wrote:
               | Hah, funny. I've used Pollen before and think I've had
               | contact with him a few years ago! The blah blah about AI
               | elevating the world is still bs imho. I still disagree
               | with his views (https://matthewbutterick.com/chron/this-
               | copilot-is-stupid-an...) and this law suit.
               | 
               | I wasn't actually talking about him specifically btw when
               | saying "this sounds like a crypto bro from twitter". The
               | overly enthusiastic AI talk reminded me of that, that's
               | what I wanted to say.
        
         | finneganscat wrote:
        
       | albertzeyer wrote:
       | I really don't understand how there can be a problem with how
       | Copilot works. Any human just works in the same way. A human is
       | trained on lots and lots of of copyrighted material. Still, what
       | a human produces in the end is not automatically derived work
       | from all the human has seen in his life before.
       | 
       | So, why should an AI be treated different here? I don't
       | understand the argument for this.
       | 
       | I actually see quite some danger in this line of thinking, that
       | there are different copyright rules for an AI compared to a human
       | intelligence. Once you allow for such arbitrary distinction, it
       | will get restricted more and more, much more than humans are, and
       | that will just arbitrarily restrict the usefulness of AI, and
       | effectively be a net negative for the whole humanity.
       | 
       | I think we must really fight against such undertaking, and better
       | educate people on how Copilot actually works, such that no such
       | misunderstanding arises.
        
       ___________________________________________________________________
       (page generated 2022-11-03 23:01 UTC)