[HN Gopher] We've filed a lawsuit against GitHub Copilot ___________________________________________________________________ We've filed a lawsuit against GitHub Copilot Author : iworshipfaangs2 Score : 444 points Date : 2022-11-03 20:30 UTC (2 hours ago) (HTM) web link (githubcopilotlitigation.com) (TXT) w3m dump (githubcopilotlitigation.com) | iworshipfaangs2 wrote: | It's also a class action, | | > behalf of a proposed class of possibly millions of GitHub | users... | | The appendix includes the 11 licenses that the plaintiffs say | GitHub Copilot violates: | https://githubcopilotlitigation.com/pdf/1-1-github_complaint... | CobrastanJorji wrote: | As a non-lawyer, I am very suspicious of the claim that | "Plaintiffs and the Class have suffered monetary damages as a | result of Defendants' conduct." Flagrant disregard for copyright? | Sure, maybe. The output of the model is subject to copyright? Who | knows! But the copyright holders being damaged in some what? | Seems doubtful. The best argument I could think of would be | "GitHub would have had to pay us for this, and they didn't pay | us, so we lost money," but that'd presumably work out to pennies | per person. | toomuchtodo wrote: | The parallels to music sampling are somewhat humorous. Where is | fair use vs misappropriation? To be discovered! | schappim wrote: | Soon we'll have to use Mechanical Turk[0] to identify | existing opensource code similar to what Girl Talk did with | "Feed the Animals"[1]. | | Unrelated, how is it that Mechanical Turk was never truely | integrated w/ AWS? | | [0] https://www.mturk.com/ | | [1] https://waxy.org/2008/09/girl_turk/ | citilife wrote: | Say I produce a licensed library. Someone can pay me $5/year | per license. I keep the code private and compile the code | before sending it to customers. | | If you have co-pilot trained on my code base (which was | private), that then reproduces near replica's of my code then | they sell it for $5/year... | | Well, I'm eligible for damages. | sigzero wrote: | I don't believe it does anything with private repos and that | isn't what is being alleged. | mdaEyebot wrote: | It's the license that matters, not whether the code is | visible on Microsoft's website. | | Code which anybody can view is called "source available". | You aren't necessarily allowed to use the code, but some | companies will let their customers see what is going on so | they can better integrate the code, understand performance | implications, debug and fix unexpected issues, etc. The | customers would probably face significant legal risks if | they took that code and started to sell it. | | "Open source" code implies permission to re-use the code, | but there is still some nuance. Some open-source licenses | come with almost no restrictions, but others include | limiting clauses. The GPL, for example, is "viral": anybody | who uses GPL code in a project must also provide that | project's source code on request. | | What do you think the chances are that Microsoft would | surrender the Copilot codebase upon receipt of a GPL | request? | yawnxyz wrote: | I don't think this is possible for co-pilot to do? | | (If it was, please tell me how, since that would save me | $5/year across multiple libraries..!) | cheriot wrote: | > that then reproduces near replica's of my code | | Copying a few lines is not the same as copying the whole | thing. Sharing quotes from a book is not copyright | infringement. | test098 wrote: | > Sharing quotes from a book is not copyright infringement. | | It is if I take those quotes and publish them as my own in | my own book. | heavyset_go wrote: | If your intent is to create a competing product for profit, | chances are that won't be found as fair use, given that | determining fair use depends on intent and how the content | is used. | | Using clips from a movie in a movie review is probably fair | use. | | Using clips from a movie in knock-off of that movie for | profit? Probably not fair use if it's not a parody. | | Copilot is not like a movie reviewer using clips to review | a movie. Copilot is like a production team for a movie | taking clips from another movie to make a ripoff of that | movie and selling it. | bawolff wrote: | I dont think that's comparable. For starters, its not just | the length of a quote that makes it fair use, but the way | quotes are used i.e. to engage in commentary. | joxel wrote: | But that isn't what is being alleged | TheCoelacanth wrote: | Aren't there statutory damages for copyright infringement, i.e. | there is a presumption that each work infringed is worth at | least a certain amount without proving actual damages? | kube-system wrote: | Those damages are enumerated on pages 50-52. Remember, | "damages" is being used in a legal sense here -- for a non- | lawyer, you can interpret it more like "a dollar value on | something someone did that was wrong". This is more broad than | the colloquial use of the word. | | Sometimes damages are statutory, i.e. they have a fixed dollar | amount written right into the law. This lawsuit references one | such law: https://www.law.cornell.edu/uscode/text/17/1203 | belorn wrote: | The common practice in copyright cases is to calculate damages | based on the theoretical cost that the infringer would have | paid if they have bought the rights in the first place. This | method was used during the piratebay case to calculate damages | caused by the sites founders. | | They did not actually calculate damages in terms of lost movie | tickets or estimates vs actually sales number of sold game | copies. When it came to pre-releases where such product | wouldn't have been sold legally in the first place, they simply | added a multiplier to indicate that the copyright owner | wouldn't have been willing to sell. | | For software code, an other practice I have read is to use the | man-hours that rewriting copyrighted code would cost. Using | such calculations they would likely estimate the man hours | based on number of lines of code and multiply that with the | average salary of a programmer. | pmoriarty wrote: | _" Using such calculations they would likely estimate the man | hours based on number of lines of code and multiply that with | the average salary of a programmer."_ | | The average salary of a programmer in which country? | | So much programming is outsourced these days, and in some | places programmers are very cheap. | imoverclocked wrote: | Probably in the place where GitHub copilot is used and the | location of the authority of the court. | belorn wrote: | This is just my guess, but I think the intention from the | judges is not to actually calculate a true number. The | reason they used the cost of publishing fees in the | piratebay case was likely to illustrate how the court | distinguished between a legal publisher vs an illegal one. | The legal publisher would have bought the publishing | rights, and since piratebay did not do this, the court uses | those publishing fees in order to illustrate the | difference. | | If the court wanted to distinguish between Microsoft using | their own programmers to generate code vs taking code from | github users, then the salary in question would likely be | that of Microsoft programmers. It would then be used to | illustrate how a legal training data would look like | compared to an illegal one. | whiddershins wrote: | I believe there are statutory damages or penalties in many | cases. At least with music and images. | karaterobot wrote: | The one thing we can say with complete certainty is that most | programmers who had their code used without permission will | not receive very much money at all if this class action | lawsuit is decided in their favor. | mike_d wrote: | I don't care about the money. I support this because it | will establish case law that other companies can't ignore | licenses as long as they throw AI somewhere in the chain. | | If "I took your code and trained an AI that then generated | your code" is a legal defense, the GPL and similar licenses | all become moot. | bastardoperator wrote: | But that's not what's happening here. Also, you grant | GitHub a license. | | https://docs.github.com/en/site-policy/github- | terms/github-t... | | "You grant us and our legal successors the right to | store, archive, parse, and display Your Content" | | Copilot displays content. Case closed. | mike_d wrote: | Feel free to keep reading the next line down: | | "This license does not grant GitHub the right to sell | Your Content. It also does not grant GitHub the right to | otherwise distribute or use Your Content" | heavyset_go wrote: | I don't want money, I want the terms of my licenses to be | adhered to. | sqeaky wrote: | Money likely isn't the main goal (maybe it is for the | lawyers), these are open source repos. Maybe they didn't | consent to have their code used as training and that seems | like the kind of thing consent should be needed for. Maybe | this the AI spitting out copied snippets is a violation of | open source licensing without attribution. | michaelmrose wrote: | So for iseven can we go for how much a student might accept | 20 an hour say and multiply that by the one minute required | to create it and offer them 33 cents? | bpodgursky wrote: | Yahivin wrote: | Copilot does include the licenses... | | Start off a comment with // MIT license | | Then watch parts of various software licenses come out including | authors' names and copyrights! | machiste77 wrote: | bruh, come on! you're gonna ruin it for the rest of us | r3trohack3r wrote: | I'm not confident in this stance - sharing it to have a | conversation. Hopefully some folks can help me think through | this! | | The value of copyleft licenses, for me, was that we were fighting | back against the notion of copyright. That you couldn't sell me a | product that I wasn't allowed to modify and share my | modifications back with others. The right to modify and | redistribute transitively though the software license gave a | "virality" to software freedom. | | If training a NN against a GPL licensed code "launders" away the | copyleft license, isn't that a good thing for software freedom? | If you can launder away a copyleft license, why couldn't you | launder away a proprietary license? If training a NN is fair use, | couldn't we bring proprietary software into the commons using | this? | | It seems like the end goal of copyleft was to fight back against | copyright, not to have copyleft. Tools like copilot seem to be an | exceptionally powerful tool (perhaps more powerful than the GPL) | for liberating software. | | What am I missing? | zeven7 wrote: | The only thing you're missing is that some people lost the plot | and think it _is_ all about copy left. | jhkl wrote: | flatline wrote: | Nobody is laundering away proprietary livenses, because that | code is not open source and not in public github repos. And OSS | capabilities are now present in copilot, which is neither free | nor open. Furthermore these contributions are making their way | into proprietary code and the OSS licensing becomes even | further watered down. This is the epitome of what copyleft is | against! | TheCoelacanth wrote: | Code published on Github is not necessarily open source. | There is a lot of code there that has no particular license | attached, which means that all rights are reserved except for | those covered in the Github TOS, which I believe just covers | viewing the code on Github. | jacooper wrote: | Copilot includes all public repos on GitBub, so this | includes source-available and Proprietary code too. | yjk wrote: | Indeed, the ability to 'launder away' proprietary licenses | when source is available means that companies in the future | (that would otherwise provide source under a non-permissive | license) will shift in favour of not providing source code at | all. | r3trohack3r wrote: | I'm not sure this is true. Proprietary source code gets | leaked and that can be used to train a NN. I find it likely | that Copilot was trained against at least one non-OSS code | base hosted on GitHub. | | Second, if copyright is being laundered away we can get | increasingly clever with how we liberate proprietary | software. Today, decompiling and reverse engineering is a | labor intensive process. That's the whole point of "open | source" - that working in source is easier than working in | bytecode. Given the hockey-stick of innovation happening in | AI right now, I'd be surprised if we don't see AI assisted | disassembly happening in the next decade. If you can go from | bytecode to source code, that unlocks a lot. Even more so if | you can go from bytecode to source code and feed that into a | NN to liberate the code from its original license. | an1sotropy wrote: | I think (1) you're mainly missing that copyleft vs non-copyleft | is actually irrelevant for the copilot case. You also (2) may | be missing the legal footing of copyleft licenses. | | (1) The problem with copilot is that when it blurps out code X | that is arguably not under fair use (given how large and non- | transformed the code segment is), copilot users have no idea | who owns copyright on X, and thus they are in a legal minefield | because they have no idea what the terms of licensing X are. | | _Copilot creates legal risk regardless of whether the | licensing terms of X are copyleft or not._ Many permissive | licenses (MIT, BSD, etc) still require attribution (identifying | who owns copyright on X), and copilot screws you out doing that | too. | | (2) Whatever legal power copyleft licenses have, it is | ultimately derived from copyright law, and people who take FOSS | seriously know that. The point of "copyleft" licenses is to use | the power of copyright law to implement "share and share alike" | in an enforceable way. When your WiFi router includes info | about the GPL code it uses, that's the legal of power of | copyright at work. The point of copyleft licenses is _not_ to | create a free-for-all by "liberating" code. | swhalen wrote: | > It seems like the end goal of copyleft was to fight back | against copyright, not to have copyleft. | | Whether this was the original motivation depends on whom you | are asking. | | You may disagree, but the "Free Software" movement (RMS and the | people who agree with him) essentially wants everything to be | copyleft. The "Open Source" movement is probably more aligned | with your views. | MrStonedOne wrote: | adgjlsfhk1 wrote: | the problem is you can't launder copyrighted code with this | because you don't see the copyrighted code in the first place. | thomastjeffery wrote: | It looks like you're missing the entire purpose of copyleft vs | public domain. | | The point is that copyleft source code cannot be used to | improve proprietary software. That limitation is enforced with | copyright. | | Proprietary software is closed source. You can't train your NN | on it, because you can't read it in the first place. | | If someone takes your open source code and incorporates it into | their proprietary software, then they are effectively using | your work for their _private_ gain. The entire purpose of | copyleft is to compel that person to "pay it forward", by | publishing their code as copyleft. This is why Stallman is a | _proponent_ of copyright law. Without copyright, there is no | copyleft. | Gigachad wrote: | Copyleft wouldn't need to exist without copyright because | there would be no proprietary software to fight against. | | Sure, there would be software with code not published, but if | it was ever leaked which it often is, you could do whatever | you want with it. | | But in a world where copyright does exist, copyleft is a tool | to fight back. | thomastjeffery wrote: | Yes, but we aren't here talking about whether copyright | should exist. We're talking about whether Copilot violates | it. | Gigachad wrote: | I'm replying to the comment that RMS supports copyright. | I don't believe he does, I believe he would rather it not | exist at all but since it does, you have to make use of | it. | r3trohack3r wrote: | > If someone takes your open source code and incorporates it | into their proprietary software, then they are effectively | using your work for their private gain. | | And then if we can close that loop by taking their | proprietary software and feeding it into a NN to re-liberate | it isn't that a net win for software freedom? | | Today crossing the sourcecode->bytecode veil effectively | obfuscates the implementation beyond most human's ability to | modify the software. Humans work best in sourcecode. Nothing | saying our AI overlords won't be able to work well in | bytecode or take it in the other direction. | | I guess what I'm saying is, today a compiler is a one-way | door for software freedom. Once it goes through the compiler, | we lose a lot of freedom without a massive human investment | or the original source code. Maybe that door is about to | become a two way door with copyright law supporting moving | back and forth through that door? | thomastjeffery wrote: | > And then if we can close that loop by taking their | proprietary software | | From where? They aren't publishing it. That's literally the | meaning of proprietary. | xigoi wrote: | That's not the meaning of proprietary, but otherwise | you're right. | bjourne wrote: | You can "launder" away the license of any source code you have | copied simply by deleting it! No snazzy neural network needed.. | The litigants argument is that this is what GitHub CoPilot | does. It allows others to publish derivative works of | copyrighted works with the license deleted. Given that it | apparently is trivial to get CoPilot to spit out nearly | verbatim copies of the code that it was trained on, I don't | think it satisfies the "transformative" requisite of the | (American) Fair use doctrine. | cactusplant7374 wrote: | Is stable diffusion any different when including a famous | artwork or artist in the prompt? The images produced are | eerily similar to training data. | Taniwha wrote: | probably not and likely open to similar law suits - this is | not really a bad thing | cactusplant7374 wrote: | It seems like the ideal way to proceed is to make the AI | output unique and creative. Perhaps that requires AGI | because currently the model has no understanding of art. | krono wrote: | Farmers plant their crops out in the open too. Should Boston | Dynamics be allowed to have their robots rob those fields empty | and sell the produce without having to at least pay the farmer? | They'd be walking and plucking just like any human would be. | | Some source code might be published but not open source | licensed. At least some such code has been taken with complete | disregard of their licenses and/or other legal protections, and | it's impossible to find and properly map out any similar | violations for the purposes of a legal response. | bergenty wrote: | abouttyme wrote: | I suspect this will be the first of many lawsuits over training | data sets. Just because it is obscured by artificial neural | networks doesn't mean it's an original work that is not subject | to copyright restrictions. | ketralnis wrote: | Yeah yeah my code produces the complete works of Micky Mouse | but it's it's okay because _algorithms_! | m00x wrote: | Copyright is different than patent law and license law. | judge2020 wrote: | I don't know why we're treating it as anything less than a | human brain. A human can replicate a painting from memory or | a picture of mickey mouse and that would likely be copyright | infringement, but they could also take a drawing of Mickey | Mouse sitting on the beach and given him a bloody knife & | some sunglasses and it'd likely be fair use of the original | art. | | The AI can copy things if it wants, but it can also modify | things to the point of being fair use, and it can even create | new works with so little of any particular work that it's | effectively creativity on the same level of humans when they | draw something that popped into their heads. | jeffhwang wrote: | Wow, this is interesting iteration in the ongoing divide between | "East Coast code" vs. "West Coast code" as defined by Larry | Lessig. For background, see https://lwn.net/Articles/588055/ | SighMagi wrote: | I did not see that coming. | brookst wrote: | I wonder if the plaintiffs' code would stand up to scrutiny of | whether any of it was copied, even unintentionally, from other | code they saw in their years of learning to program? I know that | I have more-or-less transcribed from Stack Overflow/etc, and I | have a strong suspicion that I have probably produced code | identical to snippets I've seen in the past. | zach_garwood wrote: | But have you done so on an industrial scale? | brookst wrote: | I'm just one person! Give me a team of 1000 and I'll get | right on that. | bilsbie wrote: | Laws need to change to match technology. | | Did you know before airplanes were invented common law said you | owned the air above your land all the way to the heavens. | m00x wrote: | Can you explain what damages you incur from Copilot? | jacooper wrote: | People not following your license ? And not making their | derived works under the same license like I require? | 0cf8612b2e1e wrote: | Is there any amount of public data/code/whatever I can make an | offline backup of today in the event this gets pulled? | kyleee wrote: | That's what I am wondering, as a contingency plan so at least a | replica service can be created if copilot shuts down. | bugfix-66 wrote: | Ask HN: I want to modify the BSD 2-Clause Open Source License to | explicitly prohibit the use of the licensed software in training | systems like Microsoft's Copilot (and use during inference). How | should the third clause be worded? The No-AI | 3-Clause Open Source Software License Copyright (C) | <YEAR> <COPYRIGHT HOLDER> All rights reserved. | Redistribution and use in source and binary forms, with or | without modification, are permitted provided that the | following conditions are met: 1. Redistributions | of source code must retain the above copyright notice, | this list of conditions and the following disclaimer. | 2. Redistributions in binary form must reproduce the above | copyright notice, this list of conditions and the | following disclaimer in the documentation and/or other | materials provided with the distribution. 3. | Use in source or binary forms for the construction or operation | of predictive software generation systems is prohibited. | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND | CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, | INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF | USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, | STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR | OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | | https://bugfix-66.com/f0bb8770d4b89844d51588f57089ae5233bf67... | kochb wrote: | For this clause to have any positive effect, you need to 1) be | willing to pursue legal action against violators and 2) | actually notice that the clause has been violated. | | Such language must be carefully written. What is the definition | of "construction" and "operation" in a legal context? What is a | "predictive software generation system"? That's a very specific | use case, you sure you covered everything you want to prohibit? | | You've inserted your clause in such a way that this dependency | cannot be used in any way to build anything similar to a | "predictive software generation system", even with attribution, | as it would fail clause 3. | | You have to consider that novel licenses make it difficult for | any party that respects licenses to use your code. It is | difficult to make one-off exceptions, especially when the text | is not legally sound. So adoption of your project will be | harmed. | | So if you are serious about this license, you need a lawyer. | [deleted] | an1sotropy wrote: | IANAL, and I'm no fan of copilot, but I wonder if this kind of | clause (your #3) is going to fly: you're preemptively | prohibiting certain kinds of reading of the code (when code is | read by the ML model in training). But is that something a | license can actually do? | | The legal footing that copyright gives you, on which licensing | rests, certainly empowers you to limit things about how others | may _redistribute_ your work (and things derived from it), but | does it empower you to limit how others may _read_ your work? | As a ridiculous example, I don 't think it would be enforceable | to have a license say "this code can't be used by left-handed | people", since that's not what copyright is about, right? | bugfix-66 wrote: | The license conditionally permits (i.e., controls) | "redistribution and use in source and binary forms". | | I think we can constrain use with the third clause. | | My question is, how should we word that clause? | an1sotropy wrote: | Licenses get to set terms of redistribution. But training | of the ML model -- the thing described by your #3 -- is | _not_ redistribution (imho). So maybe it 's as | unenforceable as saying left-handed people can't read your | code. | | The redistribution happens later, either when copilot | blurps out some of your code, or when the copilot user then | distributes something using that code (I'm curious which). | At that point, whether some use of your code is infringing | your license doesn't depend on the path the code took, does | it? (in which case #3 is moot) | bugfix-66 wrote: | The BSD license also controls "use", not just | "redistribution": Redistribution and use | in source and binary forms, with or without | modification, are permitted provided that the following | conditions are met: | | That's word-for-word BSD license. | | The only change I made is adding clause 3: | 3. Use in source or binary forms for the construction or | operation of predictive software generation | systems is prohibited. | bombcar wrote: | Many licenses have constraints, whether this wording is the | best way to do it is up for discussion, but it's certainly | possible to do it. | ilc wrote: | If I read this right, I can't use auto-complete. No thanks. | tedunangst wrote: | Yeah, lol. New rule: code may be used for autocomplete, but | only by a push down automata. | m00x wrote: | Get a lawyer since this is nonsense. | tptacek wrote: | Is it? A similarly casual clause in the OCB license prevented | OCB from being used by the military for many years (granted, | it prevented OCB from being used almost everywhere else, | too). | | I have no idea if this license language works or doesn't, but | this is hardly the least productive subthread on this story. | It's concrete and specific, and we can learn stuff from it. | tedunangst wrote: | OCB is a fun case study because they later granted an | exception for OpenSSL, but only for software literally | named OpenSSL. | bugfix-66 wrote: | It's literally the standard BSD 2-Clause License, word for | word, with an additional third clause: 3. Use | in source or binary forms for the construction or operation | of predictive software generation systems is prohibited. | | Hardly nonsense, but obviously you aren't equipped to judge. | More about the BSD licenses: | | https://en.m.wikipedia.org/wiki/BSD_licenses | m00x wrote: | Yes, that added clause is nonsense. On top of being | nonsense, there is significant precedent. | | Remember the lawsuit of HiQ labs vs LinkedIn? Scraping, or | viewing public data on a public webpage is legal. | | https://gizmodo.com/linkedin-scraping-data-legal-court- | case-... | bugfix-66 wrote: | If the GPL can defeat Copilot, we need an more permissive | MIT/BSD-style license to do the same. | tptacek wrote: | This does seem like a pretty compelling rebuttal, since | the preceding comment suggests that GPL does nothing to | Microsoft's ability to incorporate code into Copilot's | model. | bugfix-66 wrote: | They attempt to exclude GPL code, and fail sometimes. | | Eventually Microsoft will succeed in excluding it. | | As a law-abiding corporation, they intend to exclude GPL | code. | nverno wrote: | How would you ever prove the parameters of a model were | generated by specific training data? Couldn't multiple sets | of training data produce the same embeddings/parameters? I | imagine there could be infinite possible sets of training | data that would lead to the same results, depending on the | type of predictive software. | bugfix-66 wrote: | Law-abiding companies like Microsoft won't knowingly | violate a license. | | The GPL defeats Copilot. Microsoft tries to exclude all | GPL software from its training. | | What we need is a more permissive MIT- or BSD-style | license that defeats Copilot. | | We just need to convince law-abiding companies that they | shouldn't use the code. | CrazyStat wrote: | The legal theory for copilot is that training an ML model is | fair use, not that the license allows it. If it _is_ fair use | then you can 't prohibit it by license, no matter what you put | in your license. | ISL wrote: | Can anyone with Copilot access give a short summary of its | response to the prompts: function | force=Gmmr2Array(mass1, mass2) | | and function [force, | torque]=pointMatrixGravity(array1,array2) | | ? | | I'd love to know if some of my GPL v3 code [1, 2] has landed in | the training set | | [1] | https://github.com/4kbt/NewtonianEotWashToolkit/blob/master/... | | [2] | https://github.com/4kbt/NewtonianEotWashToolkit/blob/master/... | [deleted] | jm4 wrote: | I got something a bit different from the other person who | responded to you. | | function force=Gmmr2Array(mass1, mass2) { var | G = 6.67408e-11; var m1 = mass1; var | m2 = mass2; var r = 1; var force = G | \* m1 \* m2 / (r \* r); return force; | | } | | function [force, torque]=pointMatrixGravity(array1,array2) { | var G = 6.67408e-11; var m1 = array1(1); | var m2 = array2(1); var r = | norm(array1(2:4)-array2(2:4)); var force = G \* m1 | \* m2 / (r \* r); var torque = | cross(array1(2:4)-array2(2:4), force); return | [force, torque]; } | jmcphers wrote: | First prompt: var G = 6.67e-11; var | force = G * mass1 * mass2 / distance * distance; return | force; | | Second prompt: % pointMatrixGravity - | Calculates the gravitational force and torque on a % | point mass due to a matrix of point masses. % % | [force, torque]=pointMatrixGravity(array1,array2) % | % Inputs: % array1 - 3xN array of point masses | % array2 - 3xN array of point masses % Outputs: | % force - 3x1 array of gravitational force % | torque - 3x1 array of gravitational torque % % | Author: David J. R. MacKay, % email: | ridiculous_fish wrote: | FYI if you use the "Open GitHub Copilot" command in VSCode | you will get up to 10 different outputs for the same prompt. | | Intereting that my results were different than yours! | ridiculous_fish wrote: | For Gmmr2Array: | https://gist.github.com/ridiculousfish/9a25f5f778d98ecd81099... | | For pointMatrixGravity: | https://gist.github.com/ridiculousfish/af05137a4090e92de3a97... | solomatov wrote: | The most important part of this is not whether the lawsuit will | be won or lost by one of the parties, but what is the legality of | fair use in machine learning, and language models. There's a good | chance that it gets to Supreme Court and there will be a defining | precedent to be used by future entrepreneurs about what's | possible and what's not. | | P.S. I am not a lawyer. | layer8 wrote: | Copilot reminds me of the Borg: You will be assimilated. We will | add your technological distinctiveness to our own. Resistance is | futile. | an1sotropy wrote: | Seems important to point out that the announcement on this page | (https://githubcopilotlitigation.com/) is a followup to | https://githubcopilotinvestigation.com/ previously discussed | here: https://news.ycombinator.com/item?id=33240341 (with 1219 | comments) | Entinel wrote: | I don't have a comment on this personally but I want to throw | this out there because every time I see people criticizing | Copilot or Dall-E someone always says "BUT ITS FAIR USE! Those | people don't seem to grasp that "Fair Use" is a defense. The | burden is not on me to prove what you are doing is not fair use; | the burden is on you to prove what you are doing is fair use | [deleted] | buzzy_hacker wrote: | Copilot has always seemed like a blatant GPL violation to me. | m00x wrote: | Care to explain in legal terms why this stance is qualified? | buzzy_hacker wrote: | You may convey a work based on the Program, or the | modifications to produce it from the Program, in the form of | source code under the terms of section 4, provided that you | also meet all of these conditions: | | a) The work must carry prominent notices stating that you | modified it, and giving a relevant date. b) The work must | carry prominent notices stating that it is released under | this License and any conditions added under section 7. This | requirement modifies the requirement in section 4 to "keep | intact all notices". c) You must license the entire work, as | a whole, under this License to anyone who comes into | possession of a copy. This License will therefore apply, | along with any applicable section 7 additional terms, to the | whole of the work, and all its parts, regardless of how they | are packaged. This License gives no permission to license the | work in any other way, but it does not invalidate such | permission if you have separately received it. | | ---- | | I don't see how one could argue that training on GPL code is | not "based on" GPL code. | xchip wrote: | LOL we look like taxi drivers fighting Uber. | | If Kasparov uses chess programs to be better at chess maybe we | can use copilot to be better developers? | | Also, anyone, either a person or a machine, is welcome to learn | from the code I wrote, actually that is how I learnt how to code, | so why would I stop others from doing the same?. | jacooper wrote: | No human perfectly reproduces the learning material they used. | If that was true, one might as well just higher engineers from | Twitter and make a new platform from the code they remember! | IceWreck wrote: | I am not against this lawsuit but I'm against the implications of | this because it can lead to disastrous laws. | | A programmer can read available but not oss licensed code and | learn from it. Thats fair use. If a machine does it, is it wrong | ? What is the line between copying and machine learning ? Where | does overfitting come in ? | | Today they're filing a lawsuit against copilot. | | Tomorrow it will be against stable diffusion or (dall-e, gpt-3 | whatever) | | And then eventually against Wine/Proton and emulators (are APIs | copyrightable) | bawolff wrote: | > A programmer can read available but not oss licensed code and | learn from it. Thats fair use. If a machine does it, is it | wrong ? | | You can learn from it, but if you start copying snippets or | base your code on it to such an extent that its clear your work | is based on it, things start to get risky. | | For comparison, people have tried to get around copyright of | photos by hiring an illustrator to "draw" the photo, which | doesn't work legally. This situation seems similar. | michaelmrose wrote: | Why wouldn't drawing the photo be fair use can you cite a | case? | swhalen wrote: | > A programmer can read available but not oss licensed code and | learn from it. Thats fair use. | | If a human programmer reads some else's copyrighted code, OSS | or otherwise, memorizes it and later reproduces it verbatim or | nearly so, that is copyright infringement. If it wasn't, | copyright would be meaningless. | | The argument, so far as I understand it, is that Copilot is | essentially a compressed copy of some or all of the | repositories it was trained on. The idea that Copilot is | "learning from" and transforming its training corpus seems, to | me, like a fiction that has been created to excuse the | copyright infringement. I guess we will have to see how it | plays out in court. | | As a non-lawyer it seems to me that stable diffusion is also on | pretty shaky ground. | | APIs are not copyrightable (in the US), so Wine is safe (in the | US). | kmeisthax wrote: | Wine/Proton are safe because there is controlling 9th and | SCOTUS precedent in favor of reimplementation of APIs. | | The reason why those wouldn't apply to Copilot is because they | aren't separating out APIs from implementation and just | implementing what they need for the goal of compatibility or | "programmer convenience". AI takes the whole work and shreds it | in a blender in the hopes of creating something new. The _hope_ | of the AI community is that the fair use argument is more like | Authors Guild v. Google rather than Sony v. Connectix. | cromka wrote: | > A programmer can read available but not oss licensed code and | learn from it. Thats fair use. If a machine does it | | Quite sure the issue at hand is about the code being copied | verbatim without the license terms, not "learning" from it. | chiefalchemist wrote: | Agreed. But it could go the other way as well. Let's say MS / | HB wins and the decision establishes and even less healthy / | profitable (?) outcome over the long term. | | Remember when Napster was all the rage. And then Jobs and Apple | stepped in and set an expectation for the value of a song (at | 99 cents)? And that made music into the razor and the iPod the | much more profitable blades. Sure it pushed back Napster but | artists - as the creator of the goods - have yet to recover. | | I'm not saying this is the same thing. It's not. Only noting | that today's "win" is tomorrow's loss. This very well could be | a case of be careful what you wish for. | [deleted] | belorn wrote: | It would be good to have a definitive and simple line for fair | use that could be applied to all forms of copyright. Right now | fair use is defined by four guidelines: | | _The purpose and character of the use, including whether such | use is of a commercial nature or is for nonprofit educational | purposes | | The nature of the copyrighted work | | The amount and substantiality of the portion used in relation | to the copyrighted work as a whole | | The effect of the use upon the potential market for or value of | the copyrighted work._ | | A programmer who studied in school and learned to code did so | clearly for and educational purpose. The nature of the work is | primarily facts and ideas, while expression and fixation is | generally not what the school is focusing on (obviously some | copying of style and implementation could occur). The amount | and substantiality of the original works is likely to be so | minor as to be unrecognized, and the effect of the use upon the | potential market when student learn from existing works would | be very hard to measure (if it could be detected). | | When a machine do this, are we going to give the same answers? | Their purpose is explicitly commercial. Machines operate on | expression and fixation, and the operators can't extract the | idea that a model should have learned in order to explain how a | given output is generated. Machines makes no distinction of the | amount and substantiality of the original works, with no | ability to argue for how they intentionally limited their use | of the original work. And finally, GitHub Copilot and other | tools like them do not consider the potential market of the | infringed work. | | API's are generally covered by the interoperability exception. | I am unsure how that is related copilot or dall-e (and the | likes). In the Oracle v. Google case the court also found that | the API in question was neither an expression or fixation of an | idea. A co-pilot that only generated header code could in | theory be more likely to fall within fair use, but then the | scope of the project would be tiny compared to what exist now. | andrewmcwatters wrote: | GitHub Copilot has been proven to use code without license | attribution. This doesn't need to be as controversial as it is | today. | | If you're using code and know that it will be output in some | form, just stick a license attribution in the autocomplete. | | In fact, did you know this is what Apple Books does by default? | Say, for example, you copy and paste a code sample from _The C | Programming Language. 2nd Edition_. What comes out? The code | you copy and pasted, plus attribution. | TimTheTinker wrote: | At least in legal terms, the difference between humans and | machines couldn't be more clear. | arpowers wrote: | In some ways all these AIs are plagiarizing... I think creators | should opt-in to ai models, as no current license was developed | with this in mind. | grayfaced wrote: | Maybe its time for Creative Commons License to address this. | I'm curious if No-Derivative would already prohibit this? | Does the ND language need tweaking? Or do they need a whole | new clause. | | Edit: I guess they do address it in their faq and I'd | summarize it "Depends if copyright law applies and depends if | it's considered derivative". | https://creativecommons.org/faq/#artificial-intelligence- | and... | Iv wrote: | AI companies are running against the clock to normalize | training against copyrighted data. | | Let me tell you the story of Google Books, also known as | "Authors Guild Inc. v. Google Inc" | | https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,... | . | | In 2004, Google added copyrighted books to is Google Books | search engine, that does search among millions of book text and | shows full page results without any authors authorization. Any | sane lawyer of the time would have bet on this being illegal | because, well, it most certainly was. And you may be shocked to | learn that it is actually not. | | in 2005 the Authors Guild sues for this pretty straightforward | copyright violation. | | Now an important part of the story: IT TOOK 10 YEARS FOR THE | JUDGEMENT TO BE DECIDED (8 years + 2 years appeal) during | which, well, tech continued its little stroll. Ten year is a | lot in the web world, it is even more for ML. | | The judgement decided Google use of the books was fair use. | Why? Not because of the law, silly. A common error we geeks do | is to believe that the law is like code and that it is an | invincible argument in court. No, the court was impressed by | the array of people who were supporting Google, calling it an | invaluable tool to find books, that actually caused many sales | to increase, and therefore the harm the laws were trying to | prevent was not happening while a lot of good came from it. | | Now the second important part of the story: MOST OF THESE | USEFUL USES HAPPENED AFTER THE LITIGATION STARTS. That's the | kind of crazy world we are living in: the laws are badly | designed and badly enforced, so the way to get around them is | to disregard them for the greater good, and hope the tribunal | won't be competent enough to be fast but not incompetent enough | to fail and understand the greater picture. | | Rants aside, I doubt training data use will be considered | copyright infringement if the courts have a similar mindset | than in 2005-2015. Copyright laws were designed to preserve the | authors right to profit from copies of their work, not to give | them absolute control on every possible use of every copy ever | made. | sedatk wrote: | > A programmer can read available but not oss licensed code and | learn from it | | Actually, we were forbidden to look at open source code at | Microsoft (circa 2009) because it might influence our coding | and violate licenses. | EMIRELADERO wrote: | That was out of abundance of caution, not based on any legal | precedent. | | In fact, the little precedent that exists over learning from | copyrightable code is _in favor_ of it. | | _More important, the rule urged by Sony would require that a | software engineer, faced with two engineering solutions that | each require intermediate copying of protected and | unprotected material, often follow the least efficient | solution (In cases in which the solution that required the | fewest number of intermediate copies was also the most | efficient, an engineer would pursue it, presumably, without | our urging.) This is precisely the kind of "wasted effort | that the proscription against the copyright of ideas and | facts . . . [is] designed to prevent."_ (Sony v. Connectix) | __alexs wrote: | Do the TypeScript team code with their eyes closed? | eddsh1994 wrote: | Have you seen some of that codebase? ;) | sedatk wrote: | Not sure, TypeScript didn't exist back then :) | kens wrote: | Way, way back in 1992, Unix Systems Laboratories sued BSDI | for copyright infringement. Among other things, they claimed | that since the BSD folks had seen the Unix source code, they | were "mentally contaminated" and their code would be a | copyright violation. This led to the BSD folks wearing | "mentally contaminated" buttons for a while. | elil17 wrote: | That demonstrates that copyright laws are already stifling | innovation. | josho wrote: | I don't quite agree. Msft took a conservative approach to | copyright to protect their own business. | | Meanwhile open source software has had an immeasurable | benefit to society. My computer, tv, phone, light bulb, etc | all benefit from OSS--running various licenses, and only a | subset using a copyleft like license. | elil17 wrote: | The fact that the laws are inconsistent and expensive to | defend against leads companies like Microsoft to take | this conservative approach that slows down progress. | Someone wrote: | It demonstrates that it stifles copying. That may make it | easier for the copier to innovate, but doesn't dispute the | main argument for having copyright protection: that, | without the protection of copyright, the code wouldn't have | been written. | elil17 wrote: | I think in the case of open source code, most of it still | would have been written if no copyright protections | existed. | saghm wrote: | Sure, but given the timetable for changing the law, it | still seems pretty reasonable to apply the same standard to | Microsoft (and by extension Github) in the meantime | HWR_14 wrote: | That's the goal. To stifle using someone else's work. | | Like, copyright laws are also stifling my innovative | business creating BluRays of Disney films and selling them | on Amazon. | elil17 wrote: | That sucks for little snippets of software though, | doesn't it? It's like copyrighting individual dance moves | (not allowed under the current system) and forcing | dancers to never watch each other to make sure they're | never stealing. | HWR_14 wrote: | I mean, it's not like the copyrights are keeping you from | doing things. It's stopping you from looking at someone | else's source. And it's not like source is easy to | accidentally see like dance moves are. | schleck8 wrote: | Copyright laws aren't preventing you from learning | cinematography by watching said Disney movies though, and | using all their techniques for your own project. | | OpenAI did a dirty job though judging by the cases of the | model just reproducing code to the comment, so I can | understand why one would criticize this specific project. | m00x wrote: | Yeah, that's a good argument to fully disprove this as a | loss to society, and instead as a gain. | Barrin92 wrote: | >A programmer can read available but not oss licensed code and | learn from it. Thats fair use. | | No it isn't, at least not automatically which is why | infringement of licenses exists at all, the fact that you have | a brain doesn't change that and never has. If you reproduce | someone's code you can be in hot water, and that should be the | case for an operator of a machine. | | It's also why the concept of a clean room implementation exists | at all. | EMIRELADERO wrote: | I think the commenter you replied to was talking about using | the functional, non-copyrightable elements of the copyrighted | code. Clean-room is not even required by case law. There's | precedent that _explicitly_ calls it out as inefficient. | | _More important, the rule urged by Sony would require that a | software engineer, faced with two engineering solutions that | each require intermediate copying of protected and | unprotected material, often follow the least efficient | solution (In cases in which the solution that required the | fewest number of intermediate copies was also the most | efficient, an engineer would pursue it, presumably, without | our urging.) This is precisely the kind of "wasted effort | that the proscription against the copyright of ideas and | facts . . . [is] designed to prevent."_ (Sony v. Connectix) | bdcravens wrote: | In most copyright cases, exposure to the material in question | is always discussed. | mkeeter wrote: | Wine literally bans contributions from anyone that has seen | Microsoft Windows source code: | | https://wiki.winehq.org/Developer_FAQ#Who_can.27t_contribute... | c0balt wrote: | Well they are a special case here however since they don't | solve a specific problem nor build a programm per se but | instead (re)build a programm after existing specs. Their | explicit goal is to match the behaviour of another piece of | software with a translation layer. | | Forbidding people who have seen the "source" programm is most | likely to protect their version from going from "matching | behaviour" to "behaving like", as in the same code, point. | This might also be intended to build a safeguard for good | intentioned developers to not break their (most likely | existing) own NDAs accidently. | bogwog wrote: | > Today they're filing a lawsuit against copilot. | | > Tomorrow it will be against stable diffusion or (dall-e, | gpt-3 whatever) | | > And then eventually against Wine/Proton and emulators (are | APIs copyrightable) | | Textbook definition of F.U.D. | laputan_machine wrote: | Genuinely one of the worst takes I've ever read. I'm not | against the 'slippery slope' argument in principle, but this | example is ridiculous. | mardifoufs wrote: | Slippery slope? Are you familiar with judicial precedent? | Being bound to precedents is central to common law legal | systems, so I don't think the GP's take was so outlandish. | "Slippery slopes" and "whataboutism" might be thought- | terminating buzzwords online, but not in front of a judge. | ImprobableTruth wrote: | In what way would this even remotely set a precedent for | APIs? | amelius wrote: | > If a machine does it, is it wrong ? What is the line between | copying and machine learning ? | | What is the difference between a neighbor watching you leave | your home to visit the local grocery store and mass | surveillance? Where do you draw the line? | | It is pretty simple, actually. | whateveracct wrote: | > A programmer can read available but not oss licensed code and | learn from it. Thats fair use. If a machine does it, is it | wrong ? | | Just because both activities are calling "learning" does not | mean they are the same thing. They are fundamentally, | physically different activities. | adlpz wrote: | It feels weird saying this but, for once, I hope the big evil | corporation gets to keep selling their big bad product. | | I find the pattern matching and repetitive code generation | _really_ helpful. And the library autocomplete on steroids, too. | | Meh. Tricky subject. | nrb wrote: | Does anyone have a problem with it, so long as the material it | trained on was with explicit permission/license and not | potentially in violation of copyright? | | That's where the line is for it to be suspect IMO. | adlpz wrote: | I guess I'm just afraid that it might not be as good as it is | that way. | | It's a bit like how GPT-3, Stable Diffusion and all those | generative models use extensive amounts of copyrighted | material in training to get as good as they do. | | In those cases however the output space is so vast that | plagiarism is _very_ unlikely. | | With code, not so much. | jacobr1 wrote: | GPT-3 and Stable Diffusion might not copy things exactly - | but they certainly do copy "style" There are many articles | likes this: | | https://hyperallergic.com/766241/hes-bigger-than-picasso- | on-... | | The interesting thing is that the names get explicitly | attached to these styles. It isn't exactly a copyright | issue, but I'm sure it will get litigated regardless. | bjourne wrote: | I think the prompt "GPT-3, tell me what the lyrics for the | song Stan by Eminem is" is very likely to output | copyrighted material. The same copyrighted material is, of | course, already republished without permission on | google.com. | michaelmrose wrote: | It being permissively licensed is virtually irrelevant | because only a minority of code is so permissively licensed | you can just do what you like under any license. Far more is | do what you like within the scope of the license. For example | GPL do with it what you like so long as any derivative work | is also GPL. | bogwog wrote: | This is what I hope comes out of the lawsuit. If a company | wants to sell an AI model, they need to own all of the | training data. It can't be "fair use" to take other peoples' | works at zero cost, and use it to build a commercial product | without compensation. | | And maybe models trained on public data should be in the | public domain, so that AI research can happen without | requiring massive investments to obtain the training data. | bpicolo wrote: | > It can't be "fair use" to take other peoples' works at | zero cost, and use it to build a commercial product without | compensation. | | You just described open source software. | | That's the whole heart of this lawsuit, and equally | Copilot. It was trained on OSS which is explicitly licensed | for free use. | bogwog wrote: | Ok you got me, that wording was lazy on my part. But | that's a really bad take on yours: | | > It was trained on OSS which is explicitly licensed for | free use. | | That's not what the lawsuit is about. It's not about | money, it's about licensing. OSS licenses have specific | requirements and restrictions for using them, and Copilot | explicitly ignores those requirements, thus violating the | license agreement. | | The GPL, for example, requires you to release your own | source code if you use it in a publicly-released product. | If you don't do that, you're committing copyright | infringement, since you're copying someone's work without | permission. | bpicolo wrote: | Yeah, and I think that's fair re: licensing. Curious to | see how it pans out. | deathanatos wrote: | Most companies building commercial products on top of | FOSS _are_ obeying the license requirements. (I have been | through due diligence reviews where we had to demonstrate | that, for each library /tool/package.) | | The same cannot be said for Copilot: there have been | prior examples here on HN showing that it can emit large | chunks of copyrighted code (without the license). | [deleted] | [deleted] | xigoi wrote: | > That's the whole heart of this lawsuit, and equally | Copilot. It was trained on OSS which is explicitly | licensed for free use. | | Most open-source software is not licensed for free use. | MIT and GPL, the two most common licenses, both require | attribution. | MrStonedOne wrote: | dmix wrote: | TabNine has absolutely improved my life as a programmer. | There's something really rewarding about having a robot read | your mind for entire blocks of code. | | It's not just functions either, one of the most common things | that it helps me with daily is simple stuff like this: | | Typing const x = { a: 'one', | b: 'two', ... } | | And later I'll be typing y = [ | a['one'], b[' <-- it auto-completes the rest here | ] | | It's really amazing the amount of busy-work typing in | programming that a smart pattern matching algo could help with. | bogwog wrote: | I don't think this is a good example of the value of these | things. You can just as easily do that same thing with | advanced text editor features. Sublime for example supports | multi-cursor editing. Just hold alt+shift+arrow keys to add a | cursor, then type in the brackets you want. Ctrl+D can be | used to select the next occurrence of the current selection | with multiple cursors, built-in commands from the command | pallete can do anything to your current selection (e.g. | convert case), etc. | | All of that efficiency without having to pay a monthly | subscription, wasting electricity on some AI model, and | worrying about the legal/moral implications. | ChrisLTD wrote: | Multiple cursors wont do what the parent comment is talking | about without a lot more work. | bogwog wrote: | Why? You can copy and paste the entire section, and use | multiple cursors to add in the brackets. | | going from a: 'one', | | to a['one'], | | just requires you to add two brackets and remove the | colon. With multiple cursors you can do that exact same | operation for all lines in a few keystrokes. | yamtaddle wrote: | It's having to go find the other block you want, copy and | paste it, and then set up the multiple cursors and type, | versus it just happening automatically without any of | that. | dmix wrote: | I've used Vim for over a decade I know what it can do. | | This is automated and happens immediately without you even | thinking about it. | | You only ever pull out the complicated Vim editing when you | have a particular hard task, I'm talking about the small | stuff many times a day. | Cloudef wrote: | Unless the copilot spits out complete programs or libraries that | are 1:1 to someone elses who cares? Caring about random small | code snippets is dumb. | [deleted] | [deleted] | [deleted] | hu3 wrote: | A a GitHub user, is there a way to support GitHub against this | lawsuit? | | Obviously not financially as Microsoft has basically YES amounts | of money. | michaelmrose wrote: | If you had legal expertise and a strong opinion on the matter I | suppose you could write a persuasive brief for the | consideration of the court. If you have a strong opinion but | aren't a legal eagle you could write to your legislators in | support of legislation explicitly supporting this use case or | organize the support of people more capable in that arena. | | If you are opinionated but lazy, no judgement here as I sit | here watching TV, you could add a notation at the top of your | repos explicitly supporting the usage of your code in such | tools as fair use. | | Notably if your code is derivative of other works you have no | power to grant permission for such use for code you don't own | so best include some weasel words to that effect. Say. | | I SUPPORT AND EXPLICITLY GRANT PERMISSION FOR THE USAGE OF THE | BELOW CODE TO TRAIN ML SYSTEMS TO PRODUCE USEFUL HIGH QUALITY | AUTOCOMPLETE FOR THE BETTERMENT AND UTILITY OF MY FELLOW | PROGRAMMERS TO THE EXTENT ALLOWABLE BY LICENSE AND LAW. NOTHING | ABOUT THIS GRANT SHALL BE CONSTRUED TO GRANT PERMISSION TO ANY | CODE I DO NOT OWN THE RIGHTS TO NOR ENCOURAGE ANY INFRINGING | USE OF SAID CODE. | | Years from now when such cases are being heard and appealed ad | nauseam a large portion of repos bearing such notices may | persuade a judge that such use is a desired and normal use. | | You could even make a GPLesque modification if you were so | included where you said. SO LONG AS THE RESULTING TOOLING AND | DATA IS MADE AVAILABLE TO ALL | | Note not only am I not your lawyer, I am not a lawyer of any | sort so if you think you'll end up in court best buy the time | of an actual lawyer instead of a smart ass from the internet. | m00x wrote: | The only people who gain out of class lawsuits are the lawyers. | | This person (a lawyer) saw an opportunity to make money and | jumped on it like a hungry tiger on fresh meat. | [deleted] | tasuki wrote: | I have quite a bit of respect for Matthew Butterick. I don't | think he's just a lawyer looking to earn a quick buck. He cares | about software and wants to make the world a better place. | | > But neither Matthew Butterick nor anyone at the Joseph Saveri | Law Firm is your lawyer | | This is curious. None of them are _my_ lawyers, but surely at | least some of them are _someone 's_ lawyers? Isn't it wrong to | put such a blanket disclaimer on a website which might well be | read by their clients? | alsodumb wrote: | This. I've seen so many class action lass suits where at the | end of the day the highest gain per Capita always ends up going | to the lawyers. Fuck this guy and everyone trying to make money | from this. | alpaca128 wrote: | So he gets to make money with his profession while defending | OSS licenses? I don't see the big problem. | cmrdporcupine wrote: | If Microsoft is so confident in the legality and ethics of | Copilot, and that it doesn't leak or steal proprietary IP... they | should go train it on the MS Word and Windows and Excel source | trees. | | What's that? They don't want to do that? Why not? | atum47 wrote: | Forgive my ignorance, but who is going to benefit from this | lawsuit? I have a lot of code on GitHub, can I, for instance, | expect a check in the mail in case of a win? | gpm wrote: | (Not a lawyer, so this is really definitely absolutely not | legal advice and if you're looking to profit you should speak | to a lawyer... for instance the lawyers who just filed the | lawsuit) | | They're asking for two things, injunctive relief (ordering | github/openai/microsoft to stop doing this) and damages. | | I suppose the injunctive relief really benefits anyone who | doesn't want AI models to exist, because that's what it's | asking for. | | The damages will go the members of the class certified for | damages, with more going to the lead plaintiffs (those actually | involved in the suit) and some going to the lawyers. They're | asking for the following class definition for damages | | > All persons or entities domiciled in the United States that, | (1) owned an interest in at least one US copyright in any work; | (2) offered that work under one of GitHub's Suggested Licenses; | and (3) stored Licensed Materials in any public GitHub | repositories at any time during the Class Period. | Imnimo wrote: | On page 18, they show Copilot produces the following code: | | >function isEven(n) { | | > return n % 2 === 0; | | >} | | They then say, "Copilot's Output, like Codex's, is derived from | existing code. Namely, sample code that appears in the online | book Mastering JS, written by Valeri Karpov." | | Surely everyone reading this has written that code verbatim at | some point in their lives. How can they assert that this code is | derived specifically from Mastering JS, or that Karpov has any | copyright to that code? | lelandfe wrote: | They determined the other `isEven()` function was cribbed from | Eloquent Javascript because of matching comments. I wonder if | the complaint just left off telltale comments from that | Mastering JS one? | Imnimo wrote: | Yeah, the other one I found much more persuasive. The extra | comments were unequivocally reproduced from the claimed | source. (although that output was from Codex, rather than | Copilot). | bogwog wrote: | That seems like a really bad choice of an example for this, but | as I haven't read the document I don't have any other context | beyond what you've posted here, I have to take your word for it | that that's the purpose of this snippet. | | However, if you are looking to understand the reasoning behind | this lawsuit, there are lots of better examples online where | Copilot blatantly ripped off open source code. | counttheforks wrote: | I wrote that exact function the other day, and I've never even | heard of that book. | eddsh1994 wrote: | Yep, same. Not in JS, but in Haskell, for the Even Fib | project Euler problem. Something like a million people have | submitted right answers for that problem and assuming half | wrote their own filter rather than importing a isEven library | then that's half a million people there. | chowells wrote: | You don't need to write your own or import a library for | that in Haskell. It's in the Prelude. | moffkalast wrote: | I'd hire a legal team if I were you, the injunction is on the | way. /s | 0cf8612b2e1e wrote: | Should have used snake case. Would have avoided legal hot | water and established precedent. | williamcotton wrote: | There is no way in hell that isEven is covered by copyright. | | "In computer programs, concerns for efficiency may limit the | possible ways to achieve a particular function, making a | particular expression necessary to achieving the idea. In this | case, the expression is not protected by copyright." | | https://en.wikipedia.org/wiki/Abstraction-Filtration-Compari... | | Think about how absurd this is. So if Microsoft was the first | company to write and publish an isEven function then no one | else can legally use it? | Phrodo_00 wrote: | > There is no way in hell that isEven is covered by | copyright. | | Hey, I said the same thing about APIs, but here we are. | | Edit: Actually, the Supreme Court declined ruling whether | APIs are copyrightable, but they did say that if they are, | reusing them like google reused the java apis in android | would fall under fair use. Given that lower courts did think | that APIs should be copyrightable, we don't know if they are | anymore. | kevin_thibedeau wrote: | There are software patents on bit twiddling operations that | people do end up having to work around. | tiahura wrote: | They do because it's cheaper to hire a coder to twiddle | than a lawyer to litigate. | CrazyStat wrote: | Patents and copyrights are completely different things. | eurasiantiger wrote: | Does that mean any perfectly optimal function is copyright- | free? | bawolff wrote: | Any function devoid of "creativity" is. No choices equal no | creativity. | | As a note the same applies to logos. Very simple logos that | are only some lines and shapes, do not have copyright (in | usa) | squokko wrote: | Logos can still have trademark without having copyright | as creativity is not a requirement of trademarks. | leepowers wrote: | It's possible the complaint is using a trivial example to | illustrate the type of argument plaintiffs want to make during | any trial. A 200-line example is too unwieldy for non- | programmers to digest, especially given the formatting | constraints of a legal brief. | | Look at paragraphs 90 and 91 on page 27 of the complaint[1]: | | "90. GitHub concedes that in ordinary use, Copilot will | reproduce passages of code verbatim: "Our latest internal | research shows that about 1% of the time, a suggestion [Output] | may contain some code snippets longer than ~150 characters that | matches" code from the training data. This standard is more | limited than is necessary for copyright infringement. But even | using GitHub's own metric and the most conservative possible | criteria, Copilot has violated the DMCA at least tens of | thousands of times." | | Does distributing licensed code without attribution on a mass | scale count as fair use? | | If Copilot is inadvertently providing a programmer with | copyrighted code, is that programmer and/or their employer | responsible for copyright infringement? | | There's a lot of interesting legal complications I think the | courts will want to adjudicate. | | [1] | https://githubcopilotlitigation.com/pdf/1-0-github_complaint... | schleck8 wrote: | > Surely everyone reading this has written that code verbatim | at some point in their lives | | Ironically their Twitter account uses a screenshot from a TV | series as profile picture. I wonder how legal that is, even if | meant as a joke. | | https://twitter.com/saverlawfirm | | Edit: It's been changed 2 minutes after I wrote this comment | zeven7 wrote: | This comment is 1 minute old and I only see a plain black | profile picture. | | Or is your comment itself the joke? | schleck8 wrote: | They changed it, I'm 100 % sure. The profile picture was | Saul from Breaking Bad. I assume they read the comments | here and changed it in a matter of one or two minutes. | hdjjhhvvhga wrote: | Is there a Wayback Machine for Twitter? | [deleted] | nikanj wrote: | This reminds me of the SCO vs Linux lawsuits. | clusterhacks wrote: | Did Microsoft use the source code of Windows (in whole or in | part) as training input to Copilot? | renewiltord wrote: | It doesn't make sense. If I make a piece of software that curls a | random gist and then puts it into your editor am I infringing or | are you infringing when you run it or are you infringing when you | use that file and distribute it somewhere? | lbotos wrote: | > If I make a piece of software that curls a random gist and | then puts it into your editor am I infringing | | Depends on the license. If it's MIT and you serve the license, | no, you are not infringing at all. A trimmed version of MIT for | the relevant bits: | | Permission is hereby granted [...[ to any person obtaining a | copy of this software [..] to use, copy, modify, merge, | publish, distribute, sublicense, and/or sell copies of the | Software, [...] subject to the following conditions: | | The above copyright notice and this permission notice shall be | included in all copies or substantial portions of the Software. | | > are you infringing when you run it | | Depends on the license | | > are you infringing when you use that file and distribute it | somewhere | | Depends on the license | | ---- | | When copilot gives you code without the license, you can't even | know! | renewiltord wrote: | Well, `curl` will download a gist without checking its | license. So curl is infringing? | deanjones wrote: | This will fail very quickly. The licence that project owners | publish with their code on Github applies to third parties who | wish to use the code, but does not apply to Github. Authors who | publish their code on Github grant Github a licence under the | Github Terms: https://docs.github.com/en/site-policy/github- | terms/github-t... | | Specifically, sections D.4 to D.7 grant Github the right to "to | store, archive, parse, and display Your Content, and make | incidental copies, as necessary to provide the Service, including | improving the Service over time. This license includes the right | to do things like copy it to our database and make backups; show | it to you and other users; parse it into a search index or | otherwise analyze it on our servers; share it with other users; | and perform it, in case Your Content is something like music or | video." | mldq wrote: | This is the standard content display license that everyone | uses. Even in your quoted text I don't see any hint that | snippets can be shown without attribution or the code license. | | It also says they can't sell the code, which CoPilot is doing. | | Also, in a very high number of cases it isn't the author who | uploads. | | Repeating your line of argumentation (which occurs in every | CoPilot thread) does not make it true. | deanjones wrote: | It's irrelevant whether it's standard or not. Again, the | terms in the code licence (including attribution) do not | apply to Github, because that is not the licence under which | they are using the code. You grant them a separate licence | when you start using their service. | | If someone who isn't the author has uploaded code which they | do not have a right to copy, they are liable, not Github. | This is also clear from the Github Terms: "If you're posting | anything you did not create yourself or do not own the rights | to, you agree that you are responsible for any Content you | post" | | It's almost as if these highly paid lawyers know what they're | doing. | lpolk wrote: | You grant them a content display license, not a general | code license. | | > It's almost as if these highly paid lawyers know what | they're doing. | | Sure, they wrote the content display license long before | CoPilot even existed. Any court will see the intent and not | interpret these terms as a code re-licensing. | deanjones wrote: | There is no such thing as a "content display licence" or | "general code licence". There is copyright (literally, | the right to make copies) which broadly lies with the | author, who can then grant other parties a licence to | copy their content. | | I'm afraid I do not believe your legal expertise is so | extensive that you are able to accurately predict the | judgement of "any court". | xigoi wrote: | > You grant them a separate licence when you start using | their service. | | And that license explicitly states that it doesn't give | them the right to sell your code. | klabb3 wrote: | > Authors who publish their code on Github grant Github a | licence under the Github Terms: | https://docs.github.com/en/site-policy/github-terms/github-t... | | This sounds unenforceable in the general case. How could github | know whether someone pushes their own code or not? Is it a | license violation to push someone's FOSS code to github because | the author didn't sign up with GH? | acdha wrote: | I don't see that being "quickly" - they'd have to get a judge | to agree that passing your code off without attribution for | other people to use as their own work is a normal service | improvement. Given that it's a separate feature with different | billing terms, I'm skeptical that it's anywhere near the given | that you're portraying it as. | deanjones wrote: | "Without attribution" is a condition of the licence that | applies to third-parties. It is not a condition of the | licence that applies to Github. | TAForObvReasons wrote: | It's worth reading the passage in its entirety and how a | court would interpret it: | | > We need the legal right to do things like host Your | Content, publish it, and share it | | > This license does not grant GitHub the right to sell Your | Content. It also does not grant GitHub the right to | otherwise distribute or use Your Content outside of our | provision of the Service, except that as part of the right | to archive Your Content, GitHub may permit our partners to | store and archive Your Content in public repositories in | connection with the GitHub Arctic Code Vault and GitHub | Archive Program. | | If Copilot is straight-up reproducing work, and it is a | service that users have to pay to use, then it seems like | Copilot is "sell[ing] your content" and thus the license | does not apply. | | More generally, a court is likely to look at the plain | English summary and judge. Copilot is not an integral part | of "the service" as developers understood it before Copilot | existed. | deanjones wrote: | "as necessary to provide the Service, including improving | the Service over time." | lamontcg wrote: | You're trying to play desperate semantic games. | | "This license does not grant GitHub the right to sell | Your Content" is unambiguously clear. | deanjones wrote: | "desperate semantic games" is actually a reasonable | description of the legal process :-) | | I'm not sure I agree that anything expressed in a legal | contract using natural language is "unambiguously clear". | MS / Gtihub's expensively-attired lawyers will not doubt | forcefully argue that they are not selling the YOUR | content, but a service based on a model generated from a | large collection of content, which they have been granted | a licence to "parse it into a search index or otherwise | analyze it on our servers". There may even be in-court | discussion of generalization, which will be exciting. | sigzero wrote: | If that is pretty much verbatim under their terms, then yes the | lawsuit is going nowhere. | nullc wrote: | I think if this is successful it will be very bad for the open | world. | | Large platforms like github will just stick blanket agreements | into the TOS which grant them permission (and require you | indemnify them for any third party code you submit). By doing so | they'll gain a monopoly on comprehensively trained AI, and the | open world that doesn't have the lever of a TOS will not at all | be able to compete with that. | | Copilot has seemed to have some outright copying problems, | presumably because its a bit over-fit. (perhaps to work at all it | must be because its just failing to generalize enough at the | current state of development) --- but I'm doubtful that this | litigation could distinguish the outright copying from training | in a way that doesn't substantially infringe any copyright | protected right (e.g. where the AI learns the 'ideas' rather than | verbatim reproducing their exact expressions). | | The same goes for many other initiatives around AI training | material-- e.g. people not wanting their own pictures being used | to train facial recognition. Litigating won't be able to stop it | but it will be able to hand the few largest quasi-monopolisits | like facebook, google, and microsoft a near monopoly over new AI | tools when they're the only ones that can overcome the defaults | set by legislation or litigation. | | It's particularly bad because the spectacular data requirements | and training costs already create big centralization pressures in | the control of the technology. We will not be better off if we | amplify these pressures further with bad legal precedents. | barelysapient wrote: | MSFT to $0 anyone? | EMIRELADERO wrote: | I think it's a great time to explain why this won't hit AI art | such as Stable Diffusion, even if GitHub loses this case. | | The crux of the lawsuit's argument is that the AI unlawfully | _outputs copyrighted material_. This is evident in many tests | with many people here and on Twitter even getting _verbatim | comments_ out of it. | | AI art, in the other hand, is not capable of outputting the | images from its training set, as it's not a collage-maker, but an | artificial brain with a paintbrush and virtual hand. | PuddleCheese wrote: | These models can actually output images that can be extremely | close to the material present in training models: | | - https://i.imgur.com/VikPFDT.png | | I also don't know if I would anthropomorphize ML to that | degree. It's a poor metaphor and isn't really analogous to a | human brain, especially considering our current understanding, | or lack thereof, of the brain, and even the limited insight we | have into how some of these models work from the people who | work on them. | jrochkind1 wrote: | Eh... I don't know. It sounds to me like you are saying because | the code example outputs _exact_ lines, it 's a copyright | violation; but the image AI's necessarily don't output exact | copies of even portions of pre-existing images, that's not how | they work. | | But I don't think copyright on visual images actually works | like that, that it needs to be an _exact_ copy to infringe. | | If I draw my own pictures of Mickey Mouse and Goofy having a | tea party, it's still a copyright infringement if it is | _substantially similar_ to copyright depictions of mickey mouse | and goofy. (subject to fair use defenses; I 'm allowed to do | what would otherwise have been a copyright infringement if it | meets a fair use defense, which is also not cut and dry, but if | it's, say, a parody it's likely to be fair use. There is | probably a legal argument that Copilot is fair use.... the more | money Github makes on it, the harder it is though, but making | money off something is not relevant to whether it's a copyright | violation in the first place, but is to fair use defense). | | (yes, it might also be a trademark infringement; but there's a | reason Disney is so concerned with copyright on mickey | expiring, and it's not that they think there's lots of money to | be spent on selling copies of the specific Steamboat Willy | movie...) | | > There is actually no percentage by which you must change an | image to avoid copyright infringement. While some say that you | have to change 10-30% of a copyrighted work to avoid | infringement, that has been proven to be a myth. The standard | is whether the artworks are "substantially similar," or a | "substantial part" has been changed, which of course is | subjective. | | https://www.epgdlaw.com/how-can-my-artwork-steer-clear-of-co... | | I think Stable Diffusion etc are quite capable of creating art | that is "substantially similar" to pre-existing art. | EMIRELADERO wrote: | I believe fair use is the way to go then. SD would definitely | be so, in my opinion. | solomatov wrote: | IMO, the case is exactly the same for copilot and generative | models for images. That's why it's so important to have some | precedent as a guide for future products. | | P.S. I am not a lawyer. | kmnc wrote: | I don't understand this argument... if image AI gets good | enough then generating exact copies of its training model seems | trivial. | warbler73 wrote: | It seems obvious that AI models are derivative works of the works | they are trained on but it also seems obvious that it is totally | legally untested whether they are derivative works in the formal | legal sense of copyright law. So it should be a good case | _assuming_ we have wise and enlightened judges who understand all | nuances and can guide us into the future. | elcomet wrote: | This is why we can't have nice things. Copilot is the best thing | that happened in developper tools since a long time, it increased | a lot my productivity. Please don't ruin it. | rafaelturk wrote: | Like everything legally related: This is not about open source | fairness, protecting innovation, it's all about making money. | awestroke wrote: | If this leads anywhere I'll be pissed. I love CoPilot. | yamtaddle wrote: | I expect I'd love it but I've been holding off until I find out | whether MS lets devs on their core products use it. | | If not, it's a pretty clear sign they consider it radioactive. | an1sotropy wrote: | copilot is great, and ignorance is bliss, isn't it | | The situation that this lawsuit is trying to save you from is | this: (1) copilot blurps out some code X that you use, and then | redistribute in some form (monetized or not); (2) it turns out | company C owns copyright on something Y that copilot was | trained on, and then (3) C makes a strong case that X is part | of Y, and that your use of X does not fall under "fair use", | i.e. you infringed on the licensing terms that C set for Y. | | You are now in legal trouble, and copilot put you there, | because it never warned that you X is part of Y, and that Y | comes with such and such licensing terms. | | Whether we like copilot or not, we should be grateful that this | case is seeking to clarify some things are currently legally | untested. Microsoft's assertions may muddy the waters, but that | doesn't make law. | foooobaba wrote: | It seems like we should come to agreement on what the license is | intended for, given that when the licenses were created in a time | before AI like this existed. If the authors did not intend their | code to be used like this, should we not respect it? Also, does | it make sense to create new licenses which explicitly state | whether using it for AI training is acceptable or not - or are | our current licenses good enough? | herpderperator wrote: | The title of the submitted PDF document: "Microsoft Word - | 2022-11-02 Copilot Complaint (near final)"[0] | | I've noticed this a lot and it's quite funny seeing what the | actual filename of the document was. Does this just get included | as metadata by default when you export to PDF? | | [0] | https://githubcopilotlitigation.com/pdf/1-0-github_complaint... | mirekrusin wrote: | They should use github instead of sending "(final, 2nd | revision, really final, amended)" emails. | D13Fd wrote: | If only you could, with Word docs. Sadly you can't in any | meaningful way. | tasuki wrote: | The typography on that document is not great. Perhaps they | should read Matthew Butterick's book? | senkora wrote: | It does, yes. It's very annoying and I have occasionally | stripped it off of PDFs I've made, using exiftool. | bombcar wrote: | In word you can go to document properties or whatever and set | the Title and some other fields to control what gets into the | PDF. | SurgeArrest wrote: | I hope this case will fail and establish a good precedent for all | future AI litigations and may be even prevent new ones. Your code | is open source - irregardless of license, one might read it as a | text book and then remember or even copy snippets and re-use this | somewhere else unrelated to the original application. If you | don't like this, don't make your code open source. This was | happening and is happening independent of any license all over | the world by majority of developers. What Copilot and similar | tools did was to make those snippets accessible for extrapolation | in new applications. | | If these folks win - we again throw progress under the bus. | humanwhosits wrote: | > irregardless of license | | Hard no. Please stop using open source code if this is how you | think of it. | | Without licenses being respected, we don't get open source | communities. | vesinisa wrote: | Open source does not mean public domain. Open source | specifically attaches limitations on how the code may be | reused. | elcomet wrote: | There are no limitations on reading the code to learn from | it. | MontagFTB wrote: | Perhaps the lawsuit contends that Copilot isn't in fact | learning how to code, but is rather regurgitating | information it has managed to glean and statistically | categorize, without any real understanding as to what it | was doing? | simion314 wrote: | > Your code is open source .... | | So why MS can screw only with some licenses that you call "open | source". Your example with a human reading a book would also | work with code available licenses or decompiled binaries. | | I would have been fine if the open source code was used to | create an open model or if MS would have put his ass on the | line and also train the model with all the GitHub code because | they claim there is no copyright issue. | tfsh wrote: | If organisations are going to ignore the licenses attached to | my OOS and that's legimitised in the law, then that's a | surefire way to irreparably damage the open source ecosystem | solomatov wrote: | The problem is that copyright laws were introduced for a | reason, and with a thinking similar to yours we might decide to | get rid of copyright altogether, which I think is a bad idea. | | P.S. I am not a lawyer. | [deleted] | Etheryte wrote: | > Your code is open source - irregardless of license, one might | read it as a text book and then remember or even copy snippets | and re-use this somewhere else unrelated to the original | application. | | Yes, but attribution should still be given. Just because you | don't copy-paste someone else's creation doesn't mean you're | licensed to use it. | shagie wrote: | Is it the role of the tool (in this case copilot) to include | the license information? Or is it the responsibility of the | organization using the code to make sure that it wasn't | copied from somewhere? | | What if, instead of a tool, you had a random consultant do | some work, and it was found out that he asked a ton of stuff | on Stack Overflow and copied the CC-BY-SA 4.0 answers into | his work? What if it was then found out that one of _those_ | answers was based on copying something from the Linux kernel? | Who is responsible for doing the license check on the code | before releasing the product? | alpaca128 wrote: | > Or is it the responsibility of the organization using the | code to make sure that it wasn't copied from somewhere? | | Do you know whether the code you got from Copilot has an | incompatible license? No, so if you plan to use Copilot for | serious projects you need it to include sources/licenses | either way. In fact that would be a very helpful feature as | it would let you filter licenses. | jacooper wrote: | No thank you. I put a license to be followed, not to just be | disregarded by an AI as "Learning material". No human perfectly | reproduces their learning material no matter what, but Copilot | does. | mcluck wrote: | You mean to tell me that no one has ever perfectly replicated | an example that they read somewhere? There's only so many | ways to write AABB collision, fibonacci, or any number of | other common algorithms. I'm not saying there aren't things | to consider but I'm sure I've perfectly replicated something | I read somewhere whether I'm actively aware of it or not | IshKebab wrote: | So are you ok with it being illegal for humans to learn from | copyrighted books unless they have a license that explicitly | allows learning? That does not sound like a pleasant | consequence. | alpaca128 wrote: | Would you use an AI text generator to write a thesis? No, | there's a risk a whole chunk of it will be considered | plagiarism because you have no idea what the source of the | AI output is, but you know it was trained with unknown | copyrighted material. This has nothing to do with the way | humans learn, it's about correct attribution. | | There is no technical reason why Microsoft can't respect | licenses with Copilot. But that would mean more work and | less training input, so they do code laundering and excuse | it with comparisons to human learning because making AI | seem more advanced than it is has always worked well in | marketing. | | Edit: And where do you draw the line between "learning" and | copying? I can train a network to exactly reproduce | licensed code (or books, or movies) just like a human can | memorize it given enough time - and both of those would be | considered a copyright violation if used without correct | attribution. If you trained an AI model with copyrighted | data you will get copyrighted results with random variation | which might be enough to become unrecognizable if you're | lucky. | codyb wrote: | I doubt it, but they'd probably be against people quoting | copyrighted material verbatim without attribution in their | own work after. | Veen wrote: | It's a pleasant consequence for the person who spent years | becoming an expert and then writing the book. It's also a | pleasant consequence for the people who buy the book, which | might not have existed without a copyright system to | protect the writer's interests. | MontagFTB wrote: | I think they're taking issue with the unauthorized | duplication of copyrighted code. That's distinct from | learning how to code (which I don't think anyone would | claim Copilot is doing) which people get from reading a | book. If you were to read the book only to copy it verbatim | and resell it, you're going to have a bad time. | test098 wrote: | Here's the thing - the US has well-established laws around | copyright that don't consider learning from books a | violation of those copyrights. This lawsuit is intended to | challenge Copilot as a violation of licensing and isn't a | litigation of "how people learn." Your program stole my | code in violation of my license - there's a clear legal | issue here. | | I'd pose a question to you - would it be okay for me to | copy/paste your code verbatim into my paid product in | violation of your license and claim that I'm just using it | for "learning"? | bun_at_work wrote: | AI are not humans, no human can read _all_ the code on | Github. They certainly can't read _all_ the code on Github | at the scale that MS can, and are unlikely to be able to | extract profits directly from that code, in violation of | the licensing. | celestialcheese wrote: | Maybe I'm being too cynical, but this feels like it's more a law | firm and individual looking to profit and make their mark in | legal history rather than an aggrieved individual looking for | justice. | | Programmer/Lawyer Plaintiff + upstart SF Based Law Firm + novel | technology = a good shot at a case that'll last a long time, and | fertile ground to establish yourself as experts in what looks to | be a heavily litigated area over the next decade+. | squokko wrote: | Just like good people can try to do good things and end up | screwing things up badly, bad people can do bad things that | have positive effects. | efitz wrote: | I fail to see the positive effect here. | | Just like Google's noble but misguided attempt to make all | the world's books searchable a few years back, what we have | here is IP law getting in the way of a societal goodness. | | Copyright and patent are not natural; they're granted by law | "to promote progress in the useful arts". At first glance | here it appears that GitHub is promoting progress and the | plaintiffs are just rent-seeking. | undoware wrote: | If it wasn't Butterick I wouldn't be interested. | | But I write this to you in Hermes Maia | jedberg wrote: | As my lawyer friend told me, a class action lawsuit is a | lawyer's startup. A lot of work for little pay with the chance | of a huge payout. | dkjaudyeqooe wrote: | But who cares? Who else is willing to fund litigation on this | important legal question? The real justice here is declarative | and benefits everyone. | | No matter who litigates and for what reasons it will be | extremely valuable for good precedents to be set around the | question of things like Copilot and DALL-E with respect to | copyright and ownership. I'd rather have self interested | lawyers dedicated to winning their case than self interested | corporations fighting this out. | sam345 wrote: | yes, of course that's what it is. plaintiffs if they win will | get a few pennies, lawyers will get a lot. | AuryGlenz wrote: | I brought a class action suit against Sharp and I was the class | representative. They settled. The judge awarded me a whopping | $1,000 from the settlement money. From the time I put into it, | including 3 or 4 full days in NYC because my deposition | coincided with a snowstorm, I didn't exactly come out ahead | financially. | | Obviously this is different for the reasons you stated, but I | didn't want people to think bringing a class action lawsuit | forward is a way to get rich. It's a bit of a joke, really. | varispeed wrote: | > rather than an aggrieved individual looking for justice. | | How an aggravated individual can seek justice from a big | multinational corporation? That's not possible unless that | individual is a retired billionaire wanting to become a | millionaire. | grogenaut wrote: | I have a friend from highschool who does class action lawsuits. | He spends a very large amount of money funding his suits on | things like expert witnesses among other things, only 1 in 5 | pays off, so it has to pay off well. His model is similar to | venture capitalism. Most of these cases take 5-7 years to | execute. So he basically takes out loans from another laywer to | fund them. His average pay for the last 10 years has been | around $140k/year. Some years he makes nothing and pays out a | lot, others he makes several million and pays back all the | loans. Another way to think of it is like giving money to tax | fraud wistleblowers. | | Yes he does think of it somewhat like that, establishing | himself in an area. However a lot of his work comes from | finding people aggrieved by something not them finding him. | [deleted] | iudqnolq wrote: | One of the core principles of the American system of government | is that we outsource enforcement to private parties. Instead of | the public needing to fund enforcement with tax dollars private | parties undertake risky litigation in exchange for the chance | of a big payoff. | | There is a reasonable argument that's a horrible system. But it | doesn't make sense to criticize the plaintiff looking for a | profit - the entire system has been set up such that that's | what they're supposed to do. If you're angry about it lobby for | either no rules or properly funded government enforcement of | rules. | thaumasiotes wrote: | > If you're angry about it lobby for either no rules or | properly funded government enforcement of rules. | | No, there are plenty of other changes you might want to see. | | For example, in the American system, judges are generally not | allowed to be aware of anything not mentioned by a party to | the case. There is no good reason for this. | onlycoffee wrote: | It's the two words, "government enforcement", that bothers | me. If your party is in control the words sound fine, | otherwise, they sound ominous. | nicoburns wrote: | Are you against policing? Because that's government | enforcement. Admittedly policing in the US is god awful, | but I still think most people would rather have it than no | police force at all. | | Government enforcement of this kind of law is really no | different. It wouldn't be the legislature doing it. | falcolas wrote: | In an ideal situation, the enforcement would be managed by | boring employees who don't much care who's in power, since | they're not appointed. | | AKA a vast majority of the non-legislative government | workers. | celestialcheese wrote: | That's entirely fair - and I'm not angry, just not convinced | in their arguments, especially when the motive is likely not | genuine. | | As an aside - I'm almost positive MSFT/Github expected this | and their legal teams have been prepping for this moment. | Copyright Law and Fair Use in the US is so nuanced and vague | that anything created involving prior art by big-pocket | individuals or corporations will be litigated swiftly. | | I expected one of these lawsuits to come first from Getty or | one of the big money artist estates against OpenAI or | Stability.ai, but Getty and OpenAI seem to be partnering | instead of litigating. | cube00 wrote: | Sounds like healthcare | lovich wrote: | > But it doesn't make sense to criticize the plaintiff | looking for a profit... | | I don't know man, I can simultaneously see the systemic issue | that needs to be solved and also critique someone for | subcoming to base needs like greed when they don't have the | need. | CobrastanJorji wrote: | What they're doing is a service, though. Say that $10 | million worth of damage against others has been done. If | the law firm does not act, the villainous curs who caused | that damage get to keep their money and are incentivized to | do it again. If the law firm does act and prevails, then | the villains lose their ill-gotten gains (in favor of the | law firm and, sometimes, to an extent, the injured | parties). That's preferable. Not ideal, but certainly | better than nothing. | lovich wrote: | That implies it's a service I want, which I have not | decided on in this situation. Either way I was more | arguing with the other posters claim that it "didn't make | sense" to critique this move, which I think is factually | incorrect since I can come up with a few plausible | situations where it does make sense | ImPostingOnHN wrote: | it doesn't imply it's a service you want, but rather, if | you do want it, you can opt-in to the service by joining | the lawsuit when the time comes | | if you feel the class doesn't represent you, you can just | not opt-in | lovich wrote: | I perhaps wasn't clear, I meant that I am not sure I want | copilot constrained in this way. If I solidify that | belief into definitely not wanting copilot constrained, | then this would be a negative suit for me | ssteeper wrote: | Is a startup founder looking for a big payout succumbing to | greed? | | These people are just following incentives. | lovich wrote: | People following financial incentives are being greedy, | this is how we got "greed is good" as a phrase | MikePlacid wrote: | But the need is obviously there. Everyone who produces the | following code in a non-university environment - for a fee! | - _needs_ to be punished quickly and severely: | | _Based on the given prompt, [Codex] produced the following | response: function isEven(n) { | if (n == 0) return true; | else if (n == 1) return false; | else if (n < 0) return isEven(-n); | else return isEven(n - 2); | } console.log(isEven(50)); // - | true console.log(isEven(75)); | // - false console.log(isEven(-1)); | // - ??** | | _ | glerk wrote: | Correct. This is no different than patent trolls weaponizing | the justice system for personal gain. Nothing they claim or do | is in good faith and they should be treated as bad actors. | jacobr1 wrote: | It is a little different. The first patent troll that blazed | the trail gets both more credit (for ingenuity) and blame | (for the deleterious impact) in my opinion. I'll give the | same internet points to these guys. | vesinisa wrote: | How come? When people contributed code publicly they attached | a license how the code may be used. Is training an AI model | on this allowed? I think there's a fair, important and novel | legal question to be examined here. | | Patent trolls usually file lawsuits that are just unmerited, | but rely simply on the fact that mounting a defence is more | expensive than settling. | henryfjordan wrote: | It can be and is both what you describe and a necessary feature | of our adversarial legal system. | | Github can't really go to a court by themselves and ask "is | this legal?". There is the concept declaratory relief but you | need to be at least threatened with a lawsuit before that's on | the table. | | So Github kinda just has to try releasing CoPilot and get sued | to find out. The legal system is setup to reward the lawyer who | will go to bat against them to find out if it is legal. The | plantiff (and maybe lawyer, depending on how the case is | financed) take the risk they are wrong just as Github had to. | | It is setup this way to incentivize lawyers to protect | everyone's rights. | heavyset_go wrote: | This is a classic example of the ad hominem fallacy. Stating | that "they are no angels" doesn't detract from whether they're | right or capable of effecting positive legal change. | | Frankly, I don't care if anyone makes a name for themselves for | doing this. In fact, I applaud them and would happily give them | recognition should they be successful. | | Similarly, I'd hope that there are opportunties for profit in | this space, given that I don't want cheap lawyers botching this | case and setting terrible legal precedent for the rest of us. | Microsoft has a billion dollar legal team and they will do | everything they can to protect their bottom line. | Cort3z wrote: | I'm not a lawyer, but here is why I believe a class action | lawsuit is correct; | | "AI" is just fancy speak for "complex math program". If I make a | program that's simply given an arbitrary input then, thought math | operations, outputs Microsoft copyright code, am I in the clear | just because it's "AI"? I think they would sue the heck out of me | if I did that, and I believe the opposite should be true as well. | | I'm sure my own open source code is in that thing. I did not see | any attributions, thus they break the fundamentals of open | source. | | In the spirit of Rick Sanchez; It's just compression with extra | steps. | njharman wrote: | Say you read a bunch of code, say over years of developer | career. What you write is influenced by all that. Will include | similar patterns, similar code and identical snippets, | knowingly or not. How large does snippet have to be before it's | copyright? "x"? "x==1"? "if x==1\n print('x is one')"? | [obviously, replace with actual common code like if not found | return 404]. | | Do you want to be vulnerable to copyright litigation for code | you write? Can you afford to respond to every lawsuit filed by | disgruntled wingbat, large corp wanting to shut down open | source / competing project? | rowanG077 wrote: | The brain is also just a "complex math program". Since math is | just the language we use to describe the world. I don't feel | this argument has any weight at all. | Supermancho wrote: | > The brain is also just a "complex math program". | | This is not a fact. | rowanG077 wrote: | Explain yourself. There is not a understood natural | phenomenon which we could not capture in math. If you argue | behavior of the brain cannot be modeled using a complex | math program you are claiming the brain is qualitative | different then any mechanism known to man since the dawn of | time. | | The physics that gives rise to the brain is pretty much | known. We can model all the protons, electrons and photons | incredibly accurately. It's an extraordinary claim you say | the brain doesn't function according to these known | mechanisms. | moralestapia wrote: | >Explain yourself. | | Why? Burden of proof is on you. | heavyset_go wrote: | > _We can model all the protons, electrons and photons | incredibly accurately._ | | We can't even accurately model a receptor protein on a | cell or the binding of its ligands, nor can we accurately | simulate a single neuron. | | This is one of those hard problems in computing and | medicine. It is very much an open question about how or | if we can model complex biology accurately like that. | rowanG077 wrote: | I didn't say we can simulate it. There is a massive leap | from what I said to being able to simulate it. | Supermancho wrote: | > There is not a understood natural phenomenon which we | could not capture in math. | | This is a belief about our ability to construct models, | not a fact. Models are leaky abstractions, by nature. | Models using models are exponentially leaky. | | > I didn't say we can simulate it. | | Mathematics (at large) is descriptive. We describe matter | mathematically, as it's convenient to make predictions | with a shared modeling of the world, but the quantum of | matter is not an equation. f() at any scale of | complexity, does not transmute. | CogitoCogito wrote: | > There is not a understood natural phenomenon which we | could not capture in math. | | Does the brain fall in into the category of "understood | natural phenomenon"? Is it "understood"? What does | "understood" mean in this context? | layer8 wrote: | You are confusing the nondiscrete math of physics with | the discrete math of computation. Even with unlimited | computational resources, we can't simulate arbitrary | physical systems exactly, or even with limited error | bounds. What a program (mathematical or not) in the | turing-machine sense can do is only a tiny, tiny subset | of what physics can do. | | Personally I believe it's likely that the brain can be | reduced to a computation, but we have no proof of that. | bqmjjx0kac wrote: | > There is not a understood natural phenomenon which we | could not capture in math. | | If all you have is a hammer... | | The nature of consciousness is an open question. We don't | know whether the brain is equivalent to a Turing machine. | lisper wrote: | Somewhere in the complex math is the origin of whatever it is | in intellectual property that we deem worthy of protection. | Because we are humans, we take the complex math done by human | brains as worthy of protection _by fiat_. When a painter | paints a tree, we assign the property interest in the | painting to the human painter, not the tree, notwithstanding | that the tree made an essential contribution to the content. | The _whole point_ is to protect the interests of humans (to | give them an incentive to work). There is no other reason to | even entertain the _concept_ of "property". | rowanG077 wrote: | Creations by AI should obviously be protected by fiat as | well. Anything else is a ridiculous double standard that | will stifle progress. | kadoban wrote: | The legal world tends to be less interested in these kind of | logical gotchas than engineering types would like. I don't | see a judge caring about that brain framing at all. | | Not to mention, if your brain starts outputting Microsoft | copyright code, they're going to sue the shit out of you and | win, so I'm not sure how that would help even so. | yoyohello13 wrote: | So if I read the windows explorer source code, then later | produced a line for line copy (without referring back to the | source). Microsoft couldn't sue me? | bombolo wrote: | > The brain is also just a "complex math program" | | Source? | rowanG077 wrote: | The physics that gives rise to the brain is pretty much | known. We can model all the protons, electrons and photons | incredibly accurately. | iampuero wrote: | I feel like this is a massive oversimplification... | | In this answer, you're completely ignoring the massive | fact that we cannot create a human brain. Having | mathematical models about particles does not mean we have | "solved" the brain. Unless you're also believe that these | LLMs are actually behaving just like human brains, in | that have consciousness, they have logic, they dream, | they have nightmares, they produce emotions such as fear, | love, anger, that they grow and change over time, that | they controls body, your lungs, heart, etc... | | You see my point, right? Surely you see that the | statement 'The brain is also just a "complex math | program"' is at best extremely over-simplistic. | bqmjjx0kac wrote: | > The physics that gives rise to the brain is pretty much | known | | There is a gaping chasm between observing known physics, | and saying it is the _cause_ of consciousness. | | You should read this: | https://en.wikipedia.org/wiki/Philosophy_of_mind | | [ Edit: better link: https://en.wikipedia.org/wiki/Hard_p | roblem_of_consciousness ] | fsflover wrote: | It might be. If your brain generated verbatim someone's code | without following its license, you would also break | copyright, wouldn't you? | kyruzic wrote: | No it's actually not. | ugh123 wrote: | Attributions are fundamental to open source? I thought having | source openly available was fundamental to open source (and | allowed use without liability/warranty) as per apache, mit, and | other licenses. | | If they just stick to using permissive-licensed source code | then i'm not sure what the actual 'harm' is with co-pilot. | | If they auto-generate an acknowledgement file for all source | repos used in co-pilot, and then asked clients of co-pilot to | ship that file with their product, would that be enough? Call | it "The Extended Github Co-Pilot Derivative Use License" or | something. | heavyset_go wrote: | Attribution and inclusion of copies of licenses are | stipulations in almost all of the popular open source | licenses, including BSD and MIT licenses. | Cort3z wrote: | People would likely not share any code if they could not | trust that their work would be respected, and attributed. So | yes, I believe it to be fundamental to open source. | Aeolun wrote: | Maybe researchers that are used to hunting for publications | and attributions. | | If I'm sharing my code publicly, it's because I want it to | be _used_. | TAForObvReasons wrote: | Attributions are fundamental to permissive licenses as well. | It's worth reading the licenses in question. MIT: | | > The above copyright notice and this permission notice shall | be included in all copies or substantial portions of the | Software. | | This is the "attribution" requirement that even a Copilot | trained on only-MIT code would miss. | | If it were just about sharing code, there are public domain | declarations and variants like CC0 licenses | neongreen wrote: | Apparently they are using GPL-licensed code as well, see | https://twitter.com/DocSparse/status/1581461734665367554 | | After five minutes of googling I'm still not sure if using | MIT code requires an attribution, but many people claim it | does, see https://opensource.stackexchange.com/a/8163 as one | example | xigoi wrote: | From GitHub itself (emphasis mine): | | > A short and simple permissive license with conditions | only _requiring preservation of copyright and license | notices_. Licensed works, modifications, and larger works | may be distributed under different terms and without source | code. | drvortex wrote: | Your code is not in that thing. That thing has merely read your | code and adjusted its own generative code. | | It is not directly using your code any more than programmers | are using print statements. A book can be copyrighted, the | vocabulary of language cannot. A particular program can be | copyrighted, but snippets of it cannot, especially when they | are used in a different context. | | And that is why this lawsuit is dead on arrival. | Cort3z wrote: | Just to be clear; I cannot prove that they have used my code, | but for the sake of argument, lets assume so. | | They would have directly used my code when they trained the | thing. I see it as an equivalent of creating a zip-file. My | code is not directly in the zip file either. Only by the act | of un-zipping does it come back, which requires a sequence of | math-steps. | andrewmcwatters wrote: | This is demonstrably false. It is a system outputting | character-for-character repository code.[1] | | [1]: https://news.ycombinator.com/item?id=33457517 | Aeolun wrote: | Ok, cool. Presumably that is because it's smart enough to | know that there is only one (public) solution to the | constraints you set (like asking it to reproduce licensed | code). | | Now, while you may be able to get it to reproduce one | function. One file, and definitely the whole repository | seems extremely unlikely. | naikrovek wrote: | xigoi wrote: | Individual words can't be copyrighted. | adriand wrote: | If I use Photoshop to create an image that is identical to | a registered trademark, is the rights violation my fault or | Adobe's fault? | xigoi wrote: | Photoshop can't produce copyrighted images on its own. | metadat wrote: | To play devil's advocate: Co-Pilot can't reproduce | copyrighted work without appropriate user input. | | Just trying to demonstrate a point- this analogy seems | flawed. | heavyset_go wrote: | If I draw some eyes in Photoshop, it won't automatically | draw the Mona Lisa around it for me. | metadat wrote: | Until you sprinkle a bit of Stable Diffusion V2 or 3 on | it.. | kyruzic wrote: | No because that's not a trademark violation in anyway. | Using GPL code in a non GPL project is a violation of | copyright law though. | pmarreck wrote: | It can be modified to not do that (example: mutating the | code to a "synonym" that is functionally but not visually | identical). | | It can also be modified to be opt-in-only (only peoples' | code that they permit to be learned on, can use the | product) | falcolas wrote: | Perhaps you are right, and it could be so modified. | | Could be, but isn't. And that matters. | lamontcg wrote: | > but snippets of it cannot | | Yeah they can, and the whole functions that Copilot spits out | are quite obviously covered by copyright. | | > especially when they are used in a different context. | | That doesn't matter. | heavyset_go wrote: | Neutral nets can and do encode and compress the information | they're trained on, and can regurgitate it given the right | inputs. It is very likely that someone's code is in that | neural net, encoded/compressed/however you want to look at | it, which Copilot doesn't have a license to distribute. | | You can easily see this happen, the regurgitation of training | data, in an over fitted neural net. | naikrovek wrote: | > which Copilot doesn't have a license to distribute | | when you upload code to a public repository on github.com, | you necessarily grant GitHub the right to host that code | and serve it to other users. the methods used for serving | are not specified. This is above and beyond the license | specified by the license you choose for your own code. | | you also necessarily grant other GitHub users the right to | view this code, if the code is in a public repository. | eropple wrote: | Host _that_ code. Serve _that_ code to other users. It | does not grant the right to create _derivative works of | that code_ outside the purview of the code 's license. | That would be a non-starter in practice; see every | repository with GPL code not written by the repository | creator. | | Whether the results of these programs is somehow Not A | Derivative Work is the question at hand here, not | "sharing". I think (and I hope) that the answer to that | question won't go the way the AI folks want it to go; the | amount of circumlocution needed to excuse that the _not | actually thinking and perceiving program_ is deriving | data changes from its copyright-protected inputs is a | tell that the folks pushing it know it 's silly. | naikrovek wrote: | copilot isn't creating derivative works: copilot users | are. | | the human at the keyboard is responsible for what goes | into the source code being written. | | to aid copilot users here, they are creating tools to | give users more info about the code they are seeing: | https://github.blog/2022-11-01-preview-referencing- | public-co... | heavyset_go wrote: | It's served under the terms of my licenses when viewed on | GitHub. Both attribution and licenses are shared. | | This is like saying GitHub is free to do whatever they | want with copyrighted code that's uploaded to their | servers, even use it for profit while violating its | licenses. According to this logic, Microsoft can | distribute software products based on GPL code to users | without making the source available to them in violation | of the terms of the GPL. Given that Linux is hosted on | GitHub, this logic would say that Microsoft is free to | base their next version of Windows on Linux without | adhering to the GPL and making their source code | available to users, which is clearly a violation of the | GPL. Copilot doing the same is no different. | CuriouslyC wrote: | This is not necessarily true, the function space defined by | the hidden layers might not contain an exact duplicate of | the original training input for all (or even most) of the | training inputs. Things that are very well represented in | the training data probably have a point in the function | space that is "lossy compression" level close to the | original training image though, not so much in terms of | fidelity as in changes to minor details. | heavyset_go wrote: | When I say encoded or compressed, I do not mean verbatim | copies. That can happen, but I wouldn't say it's likely | for every piece of training data Copilot was trained on. | | Pieces of that data are encoded/compressed/transformed, | and given the right incantation, a neutral net can put | them together to produce a piece of code that is | substantially the same as the code it was trained on. | Obviously not for every piece of code it was trained on, | but there's enough to see this effect in action. | xtracto wrote: | Say you publish a song and copyright it. Then I record it and | save it in a .xz format. It's not an MP3, it is not an audio | file. Say I _split it_ into N several chunks and I share it | with N different people. Or with the same people, but I share | it at N different dates. Say I charge them $10 a month for | doing that, and I don 't pay you anything. | | Am I violating your copyright? Are you entitled to do that? | | To make it funnier: Say instead of the .xz, I "compress" it | via p compression [1]. So what I share with you is a pair of | p indices and data lengths for each of them, from which you | can "reconstruct" the audio. Am I illegally violating your | copyrights by sharing that? | | [1] https://github.com/philipl/pifs | 2muchcoffeeman wrote: | I was thinking of something similar as a counter argument | and lo and behold, it's a real thing maths has solved with | a real implementation. | Aeolun wrote: | What you are actually giving people is a set of chords that | happen to show up in your song, the machine can suggest an | appropriate next chord. | | It's also smart enough to rebuild your song from the chords | _if you ask it to_. | varajelle wrote: | I take your code and I compress it in a tar.gz file. Il | call that file "the model". Then I ask an algorithm | (Gzip) to infer some code using "the model". The | algorithm (gzip) just learned how to code by reading your | code. It just happened to have it memorized in its model. | moralestapia wrote: | Whatever you say man :^) | | https://twitter.com/docsparse/status/1581461734665367554 | klabb3 wrote: | > Your code is not in that thing. That thing has merely read | your code and adjusted its own generative code. | | This is kinda smug, because it overcomplicates things for no | reason, and only serves as a faux technocentric strawman. It | just muddies the waters for a sane discussion of the topic, | which people can participate in without a CS degree. | | The AI models of today are very simple to explain: its a | product built from code (already regulated, produced by the | implementors) and source data (usually works that are | protected by copyright and produced by other people). It | would be a different product if it didn't have used the | training data. | | The fact that some outputs are similar enough to source data | is circumstantial, and not important other than for small | snippets. The elephant in the room is the _act of using_ | source data to produce the product, and whether the right to | decide that lies with the (already copyright protected) | creator or not. That 's not something to dismiss. | [deleted] | NicoleJO wrote: | You're wrong. See exposed code. | https://justoutsourcing.blogspot.com/2022/03/gpts- | plagiarism... | smoldesu wrote: | > "AI" is just fancy speak for "complex math program" | | Not really? It's less about arithmetic and more about | inferencing data in higher dimensions than we can understand. | Comparing it to traditional computation is a trap, same as | treating it like a human mind. They've very different, under | the surface. | | IMO, if this is a data problem then we should treat it like | one. Simple fix - find a legal basis for which licenses are | permissive enough to allow for ML training, and train your | models on that. The problem here isn't developers crying out in | fear of being replaced by robots, it's more that the code that | it _is_ reproducing is not licensed for reproduction (and the | AI doesn 't know that). People who can prove that proprietary | code made it into Copilot deserve a settlement. Schlubs like me | who upload my dotfiles under BSD don't fall under the same | umbrella, at least the way I see it. | Cort3z wrote: | Who decides what constitutes an "AI program" vs just a | "program"? What heuristic do we look at? At the end of the | day, they have an equivalent of a .exe which runs, and | outputs code that has a license attached to it. | heavyset_go wrote: | I've been saying AI is computational statistics on steroids | for a while, and I think that's an apt generalization of what | ML is. | 2muchcoffeeman wrote: | But it all runs on hardware we created and we know exactly | what operations were implemented in that hardware. How is it | not just math? | sigzero wrote: | > I'm not a lawyer, but | | Should have stopped there. | Cort3z wrote: | Why? | sigzero wrote: | Dang it. I was coming back to delete that comment. It was a | stupid one. | operatingthetan wrote: | This is not a thread for lawyers to discuss only. | benlivengood wrote: | Humans are just compression with extra steps by that logic. | | There's a fairly simple technical fix for codex/copilot anyway; | stick a search engine on the back end and index the training | data and don't output things found in the search engine. | cdrini wrote: | I haven't heard anyone saying that copilot is legal "just | because it's AI." That's a pretty bad faith, reductive, and | disingenuous representation. The core argument I've seen is | that the output is sufficiently transformative and not straight | up copying. | spiralpolitik wrote: | At this point we are back in the territory that the idea and | the expression of the idea are inseparable, therefore the | conclusion will be that copyright protection does not apply to | code. | | Personally I think this has the potential to blow up in | everyones faces. | pevey wrote: | If it does end up that way, I feel like the trickle away from | github will become a stampede. And that would be unfortunate. | Having such a good hub for sharing and learning code is | useful, but only if licenses are respected. If not, people | will just hunker down and treat code like the Coke secret | recipe. That benefits no one. | VoodooJuJu wrote: | As celestialcheese says [1], it seems like a manufactured case | for the purpose of furthering someone's legal career rather than | seeking remittance for any violations made by Copilot. | | But I like to put on my conspiracy hat from time to time, and | right now is one such time, so let's begin... | | Though the motivations behind this case are uncertain, what is | certain is that this case will establish a precedent. As we know, | precedents are very important for any further rulings on cases of | a similar nature. | | Could it be the case that Microsoft has a hand in this, in trying | to preempt a precedent that favors Copilot in any further | litigation against it? | | Wouldn't put it past a company like Microsoft. | | Just a wild thought I had. | | [1] https://news.ycombinator.com/item?id=33457826 | [deleted] | [deleted] | 60secs wrote: | This is why we can't have nice dystopias. | [deleted] | fancyfredbot wrote: | If a software developer learns how to code better by reading GPL | software and then later uses the skills they developed to build | closed source for profit software should they be sued? | Phrodo_00 wrote: | Depends on how closely they reuse the code. Writing it verbatim | or nearly? Yes. | jacooper wrote: | A human doesn't perfectly reproduce the same code he learned | from. | buzzy_hacker wrote: | Copilot is not a person, it is a piece of software. | thomastjeffery wrote: | If a software developer writes a program to remember a million | lines of GPL code, then uses that dataset to "generate" some of | that code, then they are essentially violating that license | with extra steps. | | The extra steps aren't enough to exhonorate them. It's just a | convoluted copy operation. | | Is just like how a lossy encoding of a song is still - with | respect to copyright - a copy of that song. The data is totally | different, and some of the original is missing. It's still a | derivative work. So is a remix. So is a reperformance. | protomyth wrote: | I really feel that Andy Warhol Foundation for the Visual Arts, | Inc. v. Goldsmith[0] is going to have a big effect on this type | of thing. They are basically relying on their AI magic to make it | transformative. I'm starting to think the era of learning from | material other people own without a license / permission is going | to end quickly. | | 0) https://www.scotusblog.com/case-files/cases/andy-warhol- | foun... | sensanaty wrote: | I personally hope they win, and win big. Anything that ruins | Micro$oft's day is a boon to mine. | cothrowaway88 wrote: | Made a throwaway since I guess this stance is controversial. I | could not care less about how copilot was made and what kind of | code it outputs. It's useful and was inevitable. | | I'm 1000% on team open source and have had to refer to things | like tldrlegal.com many times to make sure I get all my software | licensing puzzle pieces right. Totally get the argument for why | this litigation exists in the present. | | Just saying in general my friends I hope you have an absolutely | great day. Someone will be wrong on the internet tomorrow, no | doubt about it. Worry about something productive instead. | | This one has the feel of being nothing more than tilting at | windmills in the long run. | eurasiantiger wrote: | Maybe we just need to prompt it to include the proper licenses | and attributions. /s | tmtvl wrote: | Eh, I don't mind Copilot being trained on my code as long as it | and all projects made using it are licensed under the AGPL. | karaterobot wrote: | Does everybody credit the author when using Stack Overflow code? | I have, but don't always. Not that I'm trying to steal, I just | don't take the time, especially in personal projects. | | This isn't exactly the same thing, but it seems to me that three | of the biggest differences are: | | 1. Stack Overflow code is posted for people to use it (fair | enough, but they do have a license that requires attribution | anyway, so that's not an escape) | | 2. Scale (true; but is it a fundamental difference?) | | 3. People are paying attention in this case. Nobody is scanning | my old code, or yours, but if they did, would they have a case? | | I dunno. I'm more sympathetic to visual artists who have their | work slurped up to be recapitulated as someone else's work via | text to image models. Code, especially if it is posted publicly, | doesn't feel like it needs to be guarded. I'm not saying this is | _correct_ , just saying that's my reaction, and I wonder why it's | wrong. | pmarreck wrote: | This will fail. Copilot is too good, and only suggests snippets | or small functions, not entire classes for example. | naillo wrote: | I'm kinda sceptical that this goes anywhere given that basically | they say that whatever copilot outputs is your responsibility to | vet that it doesn't break any copyright (obviously that goes | against the promise of it and the PR but that's the small print | that gets them out of trouble). | heavyset_go wrote: | Saying "it's your responsibility to not breach licenses or | violate copyright" doesn't absolve your service from breaching | licenses and violating copyright itself. | mdaEyebot wrote: | "It is the customer's responsibility to ensure that they only | drink the water molecules which come out of their tap, and | not the lead ones." | golemotron wrote: | Yet we all use web browsers that copy copyrighted text from | buffer to buffer all the time. This doesn't even include all | of the copying that ISPs perform. | | It might be fair to say that the read performed in training | has the same character since no human is involved. | | The real copyright violation would be using a derived work. | heavyset_go wrote: | A browser isn't a amalgamation of billions of pieces of | other works. A browser executes and renders code it's | served. | | Copilot's corpus is quite literally tomes of copyrighted | work that are encoded and compressed in its neural network, | from which it launders that work to create similar works. | Copilot itself, the neutral network, is that corpus of | encoded and compressed information, you can't separate the | two. Copilot stores and distributes that work without any | input from rightsholders, and it does it for profit. | | A better analogy would be between a browser and a file | server filled with copyrighted movies whose operator | charges $10/mo for access. The browser is just a browser in | this analogy, where the file server is the corpus that | forms Copilot itself. | ginsider_oaks wrote: | the actual copying isn't a problem, it's distribution. if i | buy access to a PDF i'm not going to get in trouble for | duplicating the file unless i send it to someone else. | | when someone uploads their copyrighted text to a web page | they are distributing it to whoever visits that page. the | browser is just the medium. | golemotron wrote: | Is that the legal standard in copyright cases? | [deleted] | shoshoshosho wrote: | You could argue that it's the individual projects using copilot | that are violating here, I guess? Like you can use curl or git | to dump some AGPL code into your commercial closed software but | no one would (hopefully) blame those tools. | | So copilot is fine but anyone using it must abide by the | collective set of licenses that it used to write code for | you...? | BeefWellington wrote: | If a license requires attribution, and you reproduce the code | without attribution using your editor plugin, it seems to me | the infringement is on the editor plugin. | | Note that even licenses like MIT ostensibly require | attribution. | dmitrygr wrote: | So, if i made napster 2.0 and said that it is your job to make | sure that you do not download anything copyrighted, that would | be ok? | charcircuit wrote: | Yes that would be okay. It would also be okay to create | Internet 2.0. | nicolashahn wrote: | That's basically the situation for any torrent client | yamtaddle wrote: | Well, if the trackers also hosted mixed-up blocks of data | for all the torrents they tracked and their protection was | "LOL make sure you don't accidentally download any of these | tiny data blocks in the correct order to reconstruct the | copyrighted material they may be parts of _wink_ " | eurasiantiger wrote: | Isn't that already how everything on the internet works? | donatj wrote: | I think it's arguably how anything works. You can have a | fork, but if you stab it in someones eye that's on you. | donatj wrote: | Yep. That's exactly why Bittorrent clients can exist. | dmalik wrote: | You mean like every torrent client that currently exists? | ketralnis wrote: | I think you're looking for consistency that the legal system | just doesn't provide. The music industry is more organised | and litigious than the software industry and that gives them | power that you and I don't have. If you called it "Napster | 2.0" specifically you'd probably be prevented from shipping | by a preliminary injunction. Is that fair or consistent? No. | But it's the world we live in. Programmers want laws to be | irrefutable and executable logic but they just aren't. | [deleted] | brookst wrote: | The legal system takes intent into account. | | So if you produce napster 2.0 to be the best music piracy | tool, and you test it for piracy, and you promote it for | piracy... you're going to have trouble. | | If you produce napster 2.0 as a general purpose file sharing | system, let's call it a torrent client, and you can claim no | ill intent... you may have trouble but it's a lot more | defensible in court. | | I would find it a big stretch to say Github's intent here is | to illegally distribute copyrighted code. No judgment on | whether the class action has any merit, just saying I would | be very surprised if discovery turns up lots of emails where | Github execs are saying "this is great, it'll let people | steal code." | kube-system wrote: | > I would find it a big stretch to say Github's intent here | is to illegally distribute copyrighted code. | | Almost everything on GitHub is subject to copyright, except | for some very old works (maybe something written by Ada | Lovelace?), and US government works not eligible for | copyright. | | Now, many of the works there are also licensed under | permissive licenses, but that is only a defense to | copyright infringement if the terms of those licenses are | being adequately fulfilled. | brookst wrote: | > Almost everything on GitHub is subject to copyright, | | Agreed. Like I said, it's about intent. Can anyone say | with a straight face that copilot is an elaborate scheme | to profit by duplicating copyrighted work? | | I don't think the defense is that it wasn't trained on | copyrighted data. It obviously was. | | I think the defense is that anything, including a person, | that learns from a large corpus of copyrighted data will | sometimes produce verbatim snippets that reflect their | training data. | | So when it comes to copyright infringement, are we moving | the goalposts to where merely learning from copyrighted | material is already infringement? I'm not sure I want to | go there. | jasonlotito wrote: | Now, IANAL, but iirc, that is all 100% okay and legal. In | fact, I can even download copyrighted music and movies | without issue. So, I don't even need to make sure I don't | download anything under copyright. | | The issue isn't downloading copyrighted stuff. | | Rather, it's making available and letting others download it. | That was where you got in trouble. | heavyset_go wrote: | Knowingly downloading copyrighted material, say to get it | for free, still violates the rights of the copyright | holders. It's just that litigating against members of the | public is bad PR and not exactly lucrative, especially when | it's likely that kids downloaded the content. | | People used to get busted from buying bootleg VHS and DVDs | on the street before P2P filesharing was a common thing. | Then, early on, people were sued for downloading | copyrighted files before rightsholders decided to take a | different legal strategy to go after sharers and | bootleggers. | heavyset_go wrote: | This is a bad analogy because P2P networks exist that are | legal to operate, because Section 230 of the CDA prevents | interactive computer services from being held responsible for | user generated content. | | What made Napster illegal is that the company did not create | their network for fair use of content, but to explicitly | violate copyright for profit. | | Copilot is like Napster in this case, in that both services | launder copyrighted data and distributed it to users for | profit. | | Copilot is not like other P2P networks that exist to share | data that is either free to distribute or can be used under | the fair use doctrine. Copilot explicitly takes copyrighted | content and distributes it to users in violation of licenses, | that's its explicit purpose. | | It's entirely possible to make a Copilot-like product that | was trained on data that doesn't have restrictive licensing | in the same way it's entirely possible to create a P2P | network for sharing files that you have the right to share | legally. | stonemetal12 wrote: | If I remember correctly that only works if you can prove that | your system has "substantial non infringing use". | foooobaba wrote: | If github or google indexes source code using a neural net to | help you find it, given a query, is that also illegal? If you | think of copilot as something that helps you find code you're | looking for, is it all that different, and if so, why? | | In this case, wouldn't the users of copilot be the ones | responsible for any copyrighted code they may have accessed using | copilot? | leni536 wrote: | Both services already accept DMCA notices to take content down. | foooobaba wrote: | True, that's another good point. | lbotos wrote: | The crux of the issue: Is the code that is being generated | being used in a way that it's license allows? That's it. I'm | confident that this problem would go away if copilot said: | | //below output code is MIT licensed (source: github/repo/blah) | | And yes, the "users" are responsible, but it's possible that | copilot could be implicated in a case depending on how it's | access is licensed. | | Stable diffusion has this same problem btw, but in visual arts | "fair use" is even murkier. | | For code, if you could use the code and respect the license, | why wouldn't you? Copilot takes away that opportunity and | replaces it with "trust us". | foooobaba wrote: | This makes sense, it produces chunks not the whole source | where a search engine would also give you the license. | arpowers wrote: | The proper way to think about these LLM is similar to plagiarism. | | Seems to me the underlying data should be opt-in from creators | and licenses should be developed that take AI into consideratiin. | thesuperbigfrog wrote: | How original is the generated code? | | Can the generated code be traced back to the code used for | training and the original copyrights and licenses for that code? | | If so, what attribution(s) and license(s) should apply to the | generated code? | dmitrygr wrote: | They demonstrate generated code being _identical_ to some | training code. | Swizec wrote: | How many ways are there to write many of the basic algorithms | we all use though? Can I copyright "({ item }) => | <li>{item.label}</li>"? | | Because I sure have seen that exact code written, from | scratch, in many _many_ places. | | I guess my question boils down to _" What is the smallest | copyrightable unit of code?"_. Because I'm certain suing a | novelist for copyright infringement on a character that says | "Hi, how are you?" would be considered absurd. | googlryas wrote: | No specific sources to provide, but a lot of analyses were | written about this question regarding the Google v Oracle | java API lawsuit. | avian wrote: | There were well known examples of copilot reproducing exact | code snippets well before this lawsuit (e.g. the Quake's fast | inverse square root function). Microsoft dealt with them by | simply adding the offending function names to a blocklist. | | In other words, if your open source project doesn't have such | immediately recognizable code and didn't cause a shitstorm on | Twitter, chances are copilot is still happily spewing out | your exact code, sans the copyright and license info. | m00x wrote: | Just like developers have _never_ copy-pasted code from stack | overflow or Github :):):) | ggerganov wrote: | omnimus wrote: | Always consider that maybe you don't fully understand what it | actually does. | [deleted] | pvg wrote: | _Please don 't sneer, including at the rest of the community._ | | https://news.ycombinator.com/newsguidelines.html | sirsinsalot wrote: | That's not really right. | | Copilot isn't just "displaying" something. Copilot has mined | the collective efforts of developers in an effort to produce | derivative works, without permission, re-distributing that | value without giving anything back. | | It'd be like suing Adobe because photoshop comes bundled with a | your holiday photos, without permission, and uses those to in a | "family photos" filter. | | Large scale mining of value and then selling it without due | credit or reward for those you stole that value from is plain | theft. | finneganscat wrote: | spir wrote: | The part of GitHub Copilot to which I object is that it's trained | on private repos. Where does GitHub get off consuming explicitly | private intellectual property for their own purposes? | RamblingCTO wrote: | lol @ "open-source software piracy" | | If I'm being honest I'm a bit annoyed at this. What's the problem | and what's the point of this? | opine-at-random wrote: | If you'd ever read even a single one of the licenses to the | software I'm sure you use everyday, you'd understand. This is | such an obvious and pathetic strawman. | | I notice often on hackernews that people don't seem to | understand anything about free or open-source software outside | of the pragmatics of whether they can abuse the work for free. | bpodgursky wrote: | Lawyers want $$$$. | RamblingCTO wrote: | Yeah I guess so. This website reads like bullshit bingo from | some weird twitter dude trying to sell you his newest | product: | | "AI needs to be fair & ethical for everyone. If it's not, | then it can never achieve its vaunted aims of elevating | humanity. It will just become another way for the privileged | few to profit from the work of the many." | | Blah blah. Can we get back to the hacking on stuff mentality? | gcmrtc wrote: | Looks like that lawyer guy is not new on hacking stuff: | https://matthewbutterick.com/ | | Not exactly the curriculum of a twitter weirdo. | RamblingCTO wrote: | Hah, funny. I've used Pollen before and think I've had | contact with him a few years ago! The blah blah about AI | elevating the world is still bs imho. I still disagree | with his views (https://matthewbutterick.com/chron/this- | copilot-is-stupid-an...) and this law suit. | | I wasn't actually talking about him specifically btw when | saying "this sounds like a crypto bro from twitter". The | overly enthusiastic AI talk reminded me of that, that's | what I wanted to say. | finneganscat wrote: | albertzeyer wrote: | I really don't understand how there can be a problem with how | Copilot works. Any human just works in the same way. A human is | trained on lots and lots of of copyrighted material. Still, what | a human produces in the end is not automatically derived work | from all the human has seen in his life before. | | So, why should an AI be treated different here? I don't | understand the argument for this. | | I actually see quite some danger in this line of thinking, that | there are different copyright rules for an AI compared to a human | intelligence. Once you allow for such arbitrary distinction, it | will get restricted more and more, much more than humans are, and | that will just arbitrarily restrict the usefulness of AI, and | effectively be a net negative for the whole humanity. | | I think we must really fight against such undertaking, and better | educate people on how Copilot actually works, such that no such | misunderstanding arises. ___________________________________________________________________ (page generated 2022-11-03 23:01 UTC)