[HN Gopher] GitHub co-pilot as open source code laundering?
       ___________________________________________________________________
        
       GitHub co-pilot as open source code laundering?
        
       Author : agomez314
       Score  : 859 points
       Date   : 2021-06-30 12:00 UTC (11 hours ago)
        
 (HTM) web link (twitter.com)
 (TXT) w3m dump (twitter.com)
        
       | pabs3 wrote:
       | There isn't that much enforcement of open source license
       | violations anyway. I bet there are lots of places where open
       | source code gets taken, copyright/license headers stripped off
       | and the code used in something proprietary as well as the bog-
       | standard "not releasing code for modified versions of Linux"
       | violation.
        
       | cblconfederate wrote:
       | That's like saying that making a blurry , shaky copy of star wars
       | is not derivative but original work. Thing is, the 'verbatimness'
       | of the generated code is positively correlated with the number of
       | parameters they used to train their model
        
       | joshsyn wrote:
       | people worrying about AI. The AI is still shit. lol
        
       | Miner49er wrote:
       | Microsoft should just GPL CoPilot's code and model. They won't,
       | but it would fix this problem, I think.
        
         | jordemort wrote:
         | ...unless they've also ingested code that is incompatible with
         | the GPL and CoPilot ends up regurgitating a mix.
        
       | afarviral wrote:
       | While I think this will continue to amplify current problems
       | around IP, aren't current applied-ML approaches to writing
       | software the equivalent of automating the drawing of leaves on a
       | tree? Maybe a few small branches? But the whole tree, all its
       | roots, how it fits in to the surrounding landscape, the overall
       | composition, the intention? If I'm wrong about that than I picked
       | either a good or a bad time to finally learn programming. There's
       | only so many ways you can do things in each language though. Just
       | like in the field of music, only so many "Original" tunes. The
       | concept of IP is incoherent, you don't own patterns (at least not
       | at arbitrary depth), though you may be owed some form of
       | compensation for the billions made off discovering them.
        
         | visarga wrote:
         | You're right, it's only drawing some leaves, the whole tree or
         | how it relates to the forest is another thing.
        
       | tsjq wrote:
       | Microsoft: embrace, extend, extinguish .
        
       | karmasimida wrote:
       | Well this would not be hard to verify though.
       | 
       | You can automate this process by providing existing GPL source
       | code and see what CoPilot comes up next.
       | 
       | I am sure at some point it WILL produce exact the same code
       | snippet from certain GPL project, provided that you have
       | attempted enough times.
       | 
       | Not sure what the legal interpretation would be though, it is
       | pretty gray-ish in that regard.
       | 
       | There would always be risk for CoPilot, had it digested certain
       | PII information and people found it out...it would be much more
       | interesting to see the outcome.
        
         | Enhex wrote:
         | it doesn't have to be exact to be copyright infringement, see
         | non-literal copying. basic idea behind it is if you copy paste
         | code and rename variables that doesn't mean its new code.
        
           | freshhawk wrote:
           | Yeah, you'd have to assume they are parsing and normalizing
           | this data in some way. There would still be some AST patterns
           | or something similar you could look for in the same way, but
           | it would be much trickier.
           | 
           | Plus considering this is a legal issue ... good luck with
           | "there is a statistically significant similarity in AST
           | outputs related to the most unique sections of this code
           | base" type arguments in court. We're currently at the "what's
           | an API" stage of legal tech understanding.
        
             | int_19h wrote:
             | The real question is whether it constitutes _derived work_
             | , though. And that is not a question of similarity so much
             | so as provenance - if you start with a codebase that is GPL
             | originally, and it gets gradually modified to the point
             | where it doesn't really look anything like the original,
             | it's still a derived work, and is still subject to the
             | license.
             | 
             | Similarity can be used to prove derivation, but it's not
             | the only way to do so. In this case, all the code that went
             | into the model is (presumably) known, so you don't really
             | need any sort of analysis to prove or disprove it. It is,
             | rather, a legal question - whether the definition on the
             | books applies here, or not.
        
         | bencollier49 wrote:
         | This question about the amount of code required to be
         | copyrightable starts to sound familiar to the copyright
         | situation with music, where currently the bar seems to be set
         | too low, legally, to prove plagiarism.
        
         | bencollier49 wrote:
         | Regarding PII, I think you have a very good point. I wouldn't
         | be surprised to see working AWS_SECRET_KEY values appear in
         | there. Indeed, given that copypaste programmers may not
         | understand the code they're given, it's entirely possible that
         | someone may run code which uses remote resources without the
         | programmer even realising it.
        
         | falcolas wrote:
         | As per some of the other twitter replies, Co-pilot has offered
         | to fill in the GPL disclaimer in new files.
        
       | mtnGoat wrote:
       | not a fan of this argument.
       | 
       | musicians, artists, all kinds of athletes, all grow by watching
       | observing and learning from others. as if all these open source
       | projects got to where they are without looking at how others did
       | things.
       | 
       | i don't think a single function, similar syntax or basic check
       | function is worth arguing about, its not like co-pilot is
       | stealing an entire code base and just plopping it out by reading
       | your mind and knowing what you want. i know developers that have
       | certainly stolen code and implementation details from past
       | employers and that was just fine.
        
       | greyman wrote:
       | > github copilot was trained on open source code and the sum
       | total of everything it knows was drawn from that code. there is
       | no possible interpretation of "derivative" that does not include
       | this
       | 
       | I don't understand the second sentence, i.e. where's the proof?
        
       | Dracophoenix wrote:
       | This goes into one of my favorite philosophical topics: John
       | Searle's Chinese Room. I won't go into it here, but the question
       | of whether an AI is actually learning how to code or simply
       | substituting information based on statistically common practices
       | (or if there really is a difference between either) is going to
       | be one hell of a problem for the next few decades as we start to
       | approach fine points of what AI is and how it could be defined.
       | 
       | However, legally, the most recent Oracle vs. Google case has
       | already settled a major point: APIs don't violate copyright. And
       | as Github co-pilot is an API (A self-modifying one, but an API
       | nonetheless), Microsoft has a good defense.
       | 
       | In the near-future, when we have AI-assisted reverse engineering
       | along with Github co-pilot, then, with enough obfuscation there's
       | nothing that can't be legally created or recreated on a computer,
       | proprietary or not. This is simultaneously free software's
       | greatest dream and worst nightmare.
       | 
       | Edit: changed Hilary Putnam to John Searle Edit 2: spelling
        
         | toyg wrote:
         | _> However, legally, the most recent Oracle vs. Google case has
         | already settled a major point: APIs don 't violate copyright.
         | And as Github co-pilot is API (A self-modifying one, but an API
         | nonetheless), Microsoft has a good defense._
         | 
         | That's... a mind-bendingly bad take. Google took an API
         | definition and duplicated it; Copilot is taking _general code_
         | and (allegedly) duplicating it. This was not done in order to
         | enable any sort of interoperability or compatibility.
         | 
         | The "API defense" would apply if Copilot only produced API-
         | related code, or (against CP) if someone reproduced the
         | interfaces copilot exposes to consumers.
         | 
         |  _> Microsoft has a good defense._
         | 
         | MS has many good defenses (transformative work, github
         | agreements, etc etc), but this is not one of them.
        
         | [deleted]
        
         | cxr wrote:
         | > the most recent Oracle vs. Google case has already settled a
         | major point: APIs don't violate copyright. And as Github co-
         | pilot is API (A self-modifying one, but an API nonetheless),
         | Microsoft has a good defense
         | 
         | That's a wild misconstrual of what the courts actually ruled in
         | Oracle v. Google.
         | 
         | (And to the reader: don't take cues from people banging out
         | poorly reasoned quasi-legal arguments in off-the-cuff
         | comments.)
        
           | Dracophoenix wrote:
           | Straight from the horse's mouth [1]:
           | 
           | pg.2
           | 
           | 'This case implicates two of the limits in the current
           | Copyright Act. First, the Act provides that copyright
           | protection cannot extend to "any idea, procedure, process,
           | system, method of operation, concept, principle, or discovery
           | . . . ." 17 U. S. C. SS102(b). Second, the Act provides that
           | a copyright holder may not prevent another person from making
           | a "fair use" of a copyrighted work. SS107. Google's petition
           | asks the Court to apply both provisions to the copying at
           | issue here. To decide no more than is necessary to resolve
           | this case, the Court assumes for argument's sake that the
           | copied lines can be copyrighted, and focuses on whether
           | Google's use of those lines was a "fair use."
           | 
           | "any idea, procedure, process, system, method of operation,
           | concept, principle, or discovery" sounds suspiciously like an
           | API. Continuing:
           | 
           | Pg. 3-4
           | 
           | 'To determine whether Google's limited copying of the API
           | here constitutes fair use, the Court examines the four
           | guiding factors set forth in the Copyright Act's fair use
           | provision... '
           | 
           | (1) The nature of the work at issue favors fair use. The
           | copied lines of code are part of a "user interface" that
           | provides a way for programmers to access prewritten computer
           | code through the use of simple commands. As a result, this
           | code is different from many other types of code, such as the
           | code that actually instructs the computer to execute a task.
           | As part of an interface, the copied lines are inherently
           | bound together with uncopyrightable ideas (the overall
           | organization of the API) and the creation of new creative
           | expression (the code independently written by Google)...
           | 
           | (2) The inquiry into the "the purpose and character" of the
           | use turns in large measure on whether the copying at issue
           | was "transformative," i.e., whether it "adds something new,
           | with a further purpose or different character." Campbell, 510
           | U. S., at 579. Google's limited copying of the API is a
           | transformative use. Google copied only what was needed to
           | allow programmers to work in a different computing
           | environment without discarding a portion of a familiar
           | programming language .... The record demonstrates numerous
           | ways in which reimplementing an interface can further the
           | development of computer programs. Google's purpose was
           | therefore consistent with that creative progress that is the
           | basic constitutional objective of copyright itself.
           | 
           | (3) Google copied approximately 11,500 lines of declaring
           | code from the API, which amounts to virtually all the
           | declaring code needed to call up hundreds of different tasks.
           | Those 11,500 lines, however, are only 0.4 percent of the
           | entire API at issue, which consists of 2.86 million total
           | lines. In considering "the amount and substantiality of the
           | portion used" in this case, the 11,500 lines of code should
           | be viewed as one small part of the considerably greater
           | whole. As part of an interface, the copied lines of code are
           | inextricably bound to other lines of code that are accessed
           | by programmers. Google copied these lines not because of
           | their creativity or beauty but because they would allow
           | programmers to bring their skills to a new smartphone
           | computing environment. The "substantiality" factor will
           | generally weigh in favor of fair use where, as here, the
           | amount of copying was tethered to a valid, and
           | transformative, purpose.
           | 
           | (4) The fourth statutory factor focuses upon the "effect" of
           | the cop- ying in the "market for or value of the copyrighted
           | work." SS107(4). Here the record showed that Google's new
           | smartphone platform is not a market substitute for Java SE.
           | The record also showed that Java SE's copyright holder would
           | benefit from the reimplementation of its interface into a
           | different market. Finally, enforcing the copyright on these
           | facts risks causing creativity-related harms to the public.
           | When taken together, these considerations demonstrate that
           | the fourth factor--market effects--also weighs in favor of
           | fair use.
           | 
           | 'The fact that computer programs are primarily functional
           | makes it difficult to apply traditional copyright concepts in
           | that technological world. Applying the principles of the
           | Court's precedents and Congress' codification of the fair use
           | doctrine to the distinct copyrighted work here, the Court
           | concludes that Google's copying of the API to reimplement a
           | user interface, taking only what was needed to allow users to
           | put their accrued talents to work in a new and transformative
           | program, constituted a fair use of that material as a matter
           | of law. In reaching this result, the Court does not overturn
           | or modify its earlier cases involving fair use.'
           | 
           | [1]
           | https://www.supremecourt.gov/opinions/20pdf/18-956_d18f.pdf
        
         | salawat wrote:
         | That's John Searle's thought experiment actually. Hilary Putnam
         | had some thoughts in reference to it along the lines that a
         | brain in a vat might think in a language similar to what we
         | would speak, but the words of that language would necessarily
         | encode different meanings due to the different experience of
         | the external world and sensory isolation.
         | 
         | https://plato.stanford.edu/entries/chinese-room/
        
           | Dracophoenix wrote:
           | Thanks for the correction. I made it known in my edit.
        
         | AJ007 wrote:
         | And this applies to everything, not just source code.
         | 
         | I'm just presuming we have a future where you can consume
         | unique content indefinitely. Such as instead of binge watching
         | Star Trek on Netflix you press play and new episodes are
         | generated and played continuously, 24/7, and they are actually
         | really good.
         | 
         | Thus intellectual property becomes a commodity.
        
           | Dracophoenix wrote:
           | While headway has been made in photo algorithms like
           | StyleGAN, GPT-3's scriptwriting, and AI voice replication, we
           | aren't even close to having AI-generated stick cartoons or
           | anime. At best, AI generated Star Trek trained on old
           | episodes would produce the live-action equivalent of limited
           | animation; it would reuse the most liked parts over an over
           | again and rehash the same camerawork and lens focus that you
           | got in the 60's and the 90's. There wouldn't be any new
           | planets explored, no new species, no advances in
           | cinematography, and certainly no self-insert character (in
           | case you wanted to see - simulation of how you'd fair on the
           | Enterprise). It wouldn't add anything new as far as I can
           | see. Now if there was some way to recreate all the characters
           | in photorealistic 3D with Unreal Engine, feed them a script,
           | and use some form of intelligent creature and planet
           | generation, you may get a little closer to creating a truly
           | new episode.
        
       | koonsolo wrote:
       | Does this mean that when I read GPL code and learn from it, I
       | cannot use these learnings in non-GPL code?
       | 
       | I get it that the derivative work might be more clear in an AI
       | setting, but basically it boils down to the same thing.
        
       | agomez314 wrote:
       | Posting this due to the recent unveiling of GitHub Co-pilot and
       | the intersection on the ethics of ml training set data.
        
       | 6gvONxR4sf7o wrote:
       | > previous """AI""" generation has been trained on public text
       | and photos, which are harder to make copyright claims on, but
       | this is drawn from large bodies of work with very explicit court-
       | tested licenses
       | 
       | This seems pretty backwards to me. A GPL licensed data point is
       | more permissive than an unlicensed data point.
       | 
       | That said, I'm glad that these data points do have explicit
       | licenses that say "if you use this, you must do XYZ" so that it's
       | clear that our large ML projects are going counter to creators
       | intent when they made it open.
       | 
       | I'd love to start seeing licenses about use as training data.
       | Then maybe we'd see more open access to these models that benefit
       | from the openness of the web. I'd personally use licenses that
       | say if you want to train on my work, you must publish the model.
       | That goes for my code, my writing, and my photography.
       | 
       | Anyways GitHub is arguing that any use of publicly available data
       | for training is fair use, but they also admit that it's all new
       | and unprecedented, regarding training data.
        
       | zeptonix wrote:
       | The tone of the responses here is absurd. Guys, be grateful for
       | some progress. Instead of having to retype boilerplate code, your
       | productivity is now enhanced by having a system that can do it
       | for you. This is primarily about reducing the need to re-type
       | total boilerplate and/or copy/paste from Stackoverflow. If you
       | were to let some of the people here run things we'd never have
       | any form of progress with anything ever.
        
         | joepie91_ wrote:
         | > Instead of having to retype boilerplate code, your
         | productivity is now enhanced by having a system that can do it
         | for you
         | 
         | We already invented something for that a couple decades ago,
         | and it's called a "library". And unlike this thing, libraries
         | don't launder appropriation of the public commons with total
         | disregard for those who have actually _built_ that commons.
        
         | qayxc wrote:
         | Questions like this go much deeper and illustrate issues that
         | need to be addressed before the technology becomes standard and
         | widely adopted.
         | 
         | It's not about progress or supressing it, it's a fundamental
         | question about whether it is OK for huge companies to profit
         | from the work of others without as much as giving credit, and
         | if using AI this way represents an instance of doing so.
         | 
         | The latter aspect goes beyond productivity or licensing - the
         | OP asserts that AI isn't equivalent to a student who learned
         | from examples how to perform a task, but rather replicates
         | (recalls) or reproduces the works of others (e.g. the training
         | material).
         | 
         | It's a question that goes beyond this particular application:
         | what about GAN-based generators? Do they merely reproduce
         | slight variations of the training material? If so, wouldn't the
         | authors of the training material have some kind of intellectual
         | property rights to the generated works?
         | 
         | This doesn't just concern code snippets, it's a general
         | question about AI, crediting creators, and circumventing
         | licensing and intellectual property rights.
        
       | fartcannon wrote:
       | To me, this is similar to all these big org making money off our
       | data. They should be paying us to profit off our minds.
        
       | kingsuper20 wrote:
       | I was just musing about whether this kind of tool has been
       | written (or is being written) for music composition, business
       | letter writing, poetry, news copy.
       | 
       | Interesting copyright issues.
       | 
       | Anyone who thinks their profession will continue as-is for the
       | long term is probably mistaken.
        
       | sipos wrote:
       | So, I can't see how they can argue that the code generated is not
       | a derrivative of at least some of the code that it was trained
       | on, and therefore encumbered by a complicated, and for anyone
       | other than GitHub, impossible to disentangle, copyright claims.
       | If they haven't even been careful to only use software under one
       | license that does not require the original author to be
       | attributed, then I don't see how it can even be legal for them to
       | be running the service.
       | 
       | All that said, I'm not confident that anyone will stop them in
       | court anyway. This hasn't tenmded to be very easy when companies
       | infringe other open source code copyright terms.
       | 
       | Until it is cleared up though, it would seem extremely unwise for
       | anyone to use any code from it.
        
       | kklisura wrote:
       | Should we be changing our open source licenses to explicitly
       | prevent training such systems using our code?
        
         | onli wrote:
         | I'd assume this: In the same way as you can not forbid a human
         | to learn concepts from your code, you can not forbid an
         | automated system to learn concepts from your code, regardless
         | the license. Also, if you would it would make your code non-
         | free.
         | 
         | At least as long as the system really learns concepts. If it
         | just copy & pastes code, then that's a different story (same as
         | with humans).
        
         | bencollier49 wrote:
         | Good idea, but if carved up into small enough chunks, it may be
         | considered fair use.
         | 
         | What is confusing is that the neural net may take lots of small
         | chunks and link them to one another, and then reproduce them in
         | the same order verbatim.
        
           | falcolas wrote:
           | One of the examples pointed out in the reply threads was the
           | suggestion in a new file to insert the GPL disclaimer header.
           | 
           | So, the length of the samples being drawn is not necessarily
           | small: the chunk size is based on its commonality. It could
           | easily be long enough to trigger a copyright violation.
        
           | svaha1728 wrote:
           | With music sampling, copyright protects down to the sound of
           | a kick drum. No doubt Microsoft has a good set of attorneys
           | working on their arguments as we speak.
        
         | joepie91_ wrote:
         | That would be a legal no-op. Either their use _is_ covered by
         | copyright and they are violating your license, or it _isn 't_
         | covered by copyright and then any constraints that your license
         | sets are meaningless.
         | 
         | Licenses hold no power outside of that granted to it by things
         | being copyrighted by default.
        
         | slim wrote:
         | Why forbid? Just use GPL and extend the contagion to the code
         | trained using your code
        
         | k__ wrote:
         | I don't think so.
         | 
         | The code that is already used to train should be problematic
         | for them, not only new Code in the future.
        
       | 6510 wrote:
       | Time to make closed source illegal.
        
       | naikrovek wrote:
       | I don't see the point of this tool, independent of the resulting
       | code being derivative of GPL code or not.
       | 
       | being able to produce valid code is not the bottleneck of any
       | developer effort. no projects fail because code can't be typed
       | quickly enough.
       | 
       | the bottleneck is understanding how the code works, how to design
       | things correctly, how to make changes in accordance with the
       | existing design, how to troubleshoot existing code, etc.
       | 
       | this tool doesn't make anything any easier! it makes things
       | harder, because now you have running software that was written by
       | no one and is understood by no one.
        
         | mslm wrote:
         | Have to fully agree; just seems like a "cool" tool where if you
         | had to actually use it for real world projects, it's going to
         | slow you down significantly, and you'll only admit it once the
         | honeymoon period is over.
        
         | fckthisguy wrote:
         | Whilst I absolutely agree that writing code fast enough isn't
         | the bottleneck, it's always nice to have tools that reduce
         | repeat code writing.
         | 
         | I use the React plugin for Webstorm to avoid having to write
         | the boilerplate for FCs. Maybe in the future Copilot will
         | replace that usage.
        
           | ImprobableTruth wrote:
           | To me that - and really any form of common boilerplate - is
           | just evidence that we're lacking abstractions. If your editor
           | is generating code for you, that means that the 'real'
           | programming language you're using 'in your head' has some
           | metaprogramming facilities emulated by your IDE.
           | 
           | I think we should strive to improve our programming languages
           | to make less of this boilerplate necessary, not to make
           | generating boiler plate easier. The latter is just going to
           | make software less and less wieldy. Imagine the horror if
           | instead of (relatively) higher level programming languages
           | like C we were all just using assembly with code generation.
        
         | izgzhen wrote:
         | It doesn't calm to solve the bottleneck either. On the
         | contrary, it clearly states that its mission is to solve the
         | easy parts better so developers can focus better on the true
         | challenging engineering problems as you mentioned.
        
           | uncomputation wrote:
           | This reminds me of a startup pitch where it's always "oh we
           | take care of x so you don't have to," but the problem is now
           | I just have _another_ thing to take care of. I cannot speak
           | for people who use Copilot "fluently," but I know for every
           | chunk of code it spat out I would need to read every line and
           | make sure "Is this right? Is the return type what I want?
           | Will this loop terminate? Is 'scan' the right API? Is that
           | string formatted properly? Can I optimize this?" etc. To me
           | it's hardly "solving the easy parts," but rather putting the
           | passenger's hands on the wheel.
        
             | gotostatement wrote:
             | Upvoted. I think the only good use case for this is
             | spitting out 10-line, annoying, commonly used API
             | boilerplate for commonly used APIs
        
               | izgzhen wrote:
               | That is a valid use case despite being small and
               | incremental. I think it will still be helpful to some
               | people.
        
             | izgzhen wrote:
             | The easy part is the copy-paste-from-SO part ;)
        
         | bobsomers wrote:
         | Completely agree. If anything, I see tools like this actually
         | decreasing engineering speed. I don't see how it doesn't lead
         | to shipping large quantities of code the team didn't vet
         | carefuly, which has is a recipe for subtle and hard to find
         | bugs. Those kinds of bugs are much more expensive to find a
         | squash.
         | 
         | What we really need aren't tools that help us write code
         | faster, but tools that help us understand the design of our
         | systems and the interaction complexity of that design.
        
       | pjfin123 wrote:
       | I think this would fall under any reasonable definition of fair
       | use. If I read GPL (or proprietary) code as a human I still own
       | code that I later write. If copyright was enforced on the outputs
       | of machine learning models based on all content they were trained
       | on it would be incredibly stifling to innovation. Requiring
       | obtaining legal access to data for training but full ownership of
       | output seems like a sensible middle ground.
       | 
       | (Reposting my comment from yesterday)
        
         | sanderjd wrote:
         | Reposting a summary of my reply: if you memorize a line of code
         | and then write it down somewhere else without attribution, that
         | is not fair use, you copied that line of code. If this model
         | does the same, it is the same.
        
       | rbarrois wrote:
       | An interesting impact of this discussion is, for me: within my
       | team at work, we're likely to forbid any use of Github co-pilot
       | for our codebase, unless we can get a formal guarantee from
       | Github that the generated code is actually valid for us to use.
       | 
       | By the way, code generated by Github co-pilot is likely
       | incompatible with Microsoft's Contribution License Agreement [1]:
       | "You represent that each of Your Submission is entirely Your
       | original work".
       | 
       | This means that, for most open-source projects, code generated by
       | Github co-pilot is, right now, NOT acceptable in the project.
       | 
       | [1] https://opensource.microsoft.com/pdf/microsoft-
       | contribution-...
        
         | CharlesW wrote:
         | > _This means that, for most open-source projects, code
         | generated by Github co-pilot is, right now, NOT acceptable in
         | the project._
         | 
         | For this scenario, how is using Co-Pilot generated code
         | different from using code based on sample code, Stack Overflow
         | answers, etc.?
        
           | rbarrois wrote:
           | I'd say that it depends on the license; for StackOverflow,
           | it's CC-BY-SA 4.0 [1]. For sample code, that would depend on
           | the license of the original documentation.
           | 
           | My point is: when I'm copying code from a source with an
           | explicit license, I know whether I'm allowed to copy it. If I
           | pick code from co-pilot, I have no idea (until tested by law
           | in my jurisdiction) whether said code is public domain, AGPL,
           | proprietary, infringing on some company's copyright.
           | 
           | [1] https://stackoverflow.com/legal/terms-of-
           | service#licensing
        
             | CharlesW wrote:
             | That makes sense, thank you.
        
           | gwenzek wrote:
           | A number of company, including Google and probably Microsoft
           | forbid copying code from Stack Overflow because there is no
           | explicit license
        
             | CharlesW wrote:
             | TIL, thank you!
        
         | gdsdfe wrote:
         | How would you know if copilot was used or not?!
        
       | LeicaLatte wrote:
       | Our software has violated the world and people's lives legally
       | and illegally in many instances. I mean none of us cared when
       | GPT-3 did the same for text on the internet. :)
       | 
       | Reminder - Software engineers, our codes, GPLs are not special.
        
       | 29athrowaway wrote:
       | If I recall correctly, it has been already determined that using
       | proprietary data to train a machine learning system is not a
       | violation of intellectual property.
        
       | nemoniac wrote:
       | So as I understand it, AGPL was introduced to cover an unforeseen
       | loophole in GPL that adapted code could be used to power a web
       | service. Could another new version of the license block allowing
       | code from use to train such GitHub co-pilot like models?
        
       | corobo wrote:
       | If I as an alleged human have learned purely from GPL code would
       | that require code I write to be released under the GPL too?
       | 
       | We should probably start thinking about AI rights at some point.
       | Personally I'll be crediting GPT-3 as any other contributor
       | because it sounds cool but maybe morally too in future
        
         | notkaiho wrote:
         | Unless you were using structures directly from said code,
         | probably not?
         | 
         | Compare if you had only learned writing from, say, the Bible.
         | You would probably write in a very Biblical manner, but would
         | you write the Psalms exactly? Most likely not.
        
           | edenhyacinth wrote:
           | We have seen Co-Pilot directly output
           | (https://docs.github.com/en/github/copilot/research-
           | recitatio...) the zen of python when prompted - there's no
           | reason it wouldn't write the Psalms exactly when prompted in
           | the right manner.
        
             | disgruntledphd2 wrote:
             | That's super cool. As long as you do the things you specify
             | at the bottom of that doc (provide attribution if copied so
             | people can know if it's OK to use) then a lot of the
             | concerns of people on these threads are going to be
             | resolved.
        
               | edenhyacinth wrote:
               | Pretty much! There's only three major fears remaining
               | 
               | * Co-pilot fails to detect it, and you have a potential
               | lawsuit/ethical concern when someone finds out. Although
               | the devil on my shoulder says that if Co-pilot didn't
               | detect it, what's to say another tool will?
               | 
               | * Co-pilot reuses code in a way that still violates
               | copyright, but is difficult to detect. I.e. If you
               | checked via a syntax tree, you'd notice that the code was
               | the same, but if you looked at it as raw text, you
               | wouldn't.
               | 
               | * Purely ethical - is it right to take licensed code and
               | condense it into a product, without having to take into
               | account the wishes of the original creators? It might be
               | treated as normal that other coders will read it, and
               | pick up on it, but when these licenses were written no
               | one saw products like this coming about. They never
               | assumed that a single person could read all their code,
               | memorise it, and quote it near-verbatim on command.
        
               | disgruntledphd2 wrote:
               | > Purely ethical - is it right to take licensed code and
               | condense it into a product, without having to take into
               | account the wishes of the original creators? It might be
               | treated as normal that other coders will read it, and
               | pick up on it, but when these licenses were written no
               | one saw products like this coming about. They never
               | assumed that a single person could read all their code,
               | memorise it, and quote it near-verbatim on command.
               | 
               | It's gonna be really interesting to see how this plays
               | out.
        
           | corobo wrote:
           | I've not seen Copilot in action yet, I was under the
           | impression it doesn't use code directly.
           | 
           | In any case my original question was answered by the tweeter
           | in a later tweet I missed
           | https://twitter.com/eevee/status/1410049195067674625
           | 
           | I get where they're coming from but they are kinda just
           | handwaving it back the other way with the "u fell for
           | marketing idiot" vibe. I wish someone smarter than me could
           | simplify the legal ramifications around this but we'll
           | probably have to wait till it kills someone (or at least
           | costs someone a bunch of money) to get any actual laws set
           | up.
        
         | lucideer wrote:
         | Your question had already been preempted in the OP.
         | Specifically:
         | 
         | > _" but eevee, humans also learn by reading open source code,
         | so isn't that the same thing"_
         | 
         | > _- no_
         | 
         | > _- humans are capable of abstract understanding and have a
         | breadth of other knowledge to draw from_
         | 
         | > _- statistical models do not_
         | 
         | > _- you have fallen for marketing_
         | 
         | -- https://twitter.com/eevee/status/1410049195067674625
        
           | corobo wrote:
           | I preemptively commented that I'd seen that tweet three hours
           | before your comment figuring someone was going to quote it at
           | me haha
           | 
           | Preemptive, doesn't work as it turns out :)
           | 
           | https://news.ycombinator.com/item?id=27687586
        
             | lucideer wrote:
             | Nice catch
        
         | pyentropy wrote:
         | That's what I wanted to ask, where do we draw the line of
         | copyright when it comes to inputs of generative ML?
         | 
         | It's perfectly fine for me to develop programming skills by
         | reading _any code_ regardless of the license. When a corp
         | snatches an employee from competitors, they get to keep their
         | skills even if they signed an NDA and can 't talk about what
         | they worked on. On the other hand there's the no-compete
         | agreement, where you can't. Good luck making a no-compete
         | agreement with a neural network.
         | 
         | Even if someone feeds stolen or illegal data as an input
         | dataset to gain advantage in ML, how do we even prove it if
         | we're only given the trained model and it generalizes well?
        
           | vsareto wrote:
           | >how do we even prove it if we're only given the trained
           | model and it generalizes well?
           | 
           | Someone's going to have to audit the model the training and
           | the data that does it. There's a documentary on black holes
           | on Netflix that did something similar (no idea if it was AI)
           | but each team wrote code to interpret the data independently
           | and without collaboration or hints or information leakage,
           | and they were all within a certain accuracy of one-another
           | for interpreting the raw data at the end of it.
           | 
           | So, as an example, if I can't train something in parallel and
           | get similar results to an already trained model, we know
           | something is up and there is missing or altered data (at
           | least I think that's how it works).
        
           | hliyan wrote:
           | Copyright is going to get very muddy in the next few decades.
           | ML systems may be able to generate entire novels in the
           | styles of books they have digested, with only some assist
           | from human editors. True of artwork and music, and perhaps
           | eventually video too. Determining "similarity" too, may soon
           | have to be taken off the hands of the judge and given to
           | another ML system.
        
           | agomez314 wrote:
           | Take it further. You could easily imagine taking a service
           | like this as an invisible middleware behind a front-end and
           | start asking users to pay for the service. Some could argue
           | it's code generation attributable to those who created the
           | model, but reality is that the models were trained by code
           | written by thousand of passionate users at no pay with the
           | intent of free usage.
        
             | bogwog wrote:
             | > but reality is that the models were trained by code
             | written by thousand of passionate users at no pay with the
             | intent of free usage.
             | 
             | I hope you're actually reading those LICENSE files before
             | using open source code in your projects.
        
           | rhn_mk1 wrote:
           | > It's perfectly fine for me to develop programming skills by
           | reading any code regardless of the license.
           | 
           | I'd be inclined to agree with this, but whenever a high
           | profile leak of source code happens, reading that code can
           | have dire consequences for reverse engineers. It turns clean
           | room reverse engineering into something derivative, as if the
           | code that was read had the ability to infected whatever the
           | programmer wrote later.
           | 
           | A situation involving the above developed in the ReactOS
           | project https://en.wikipedia.org/wiki/ReactOS#Internal_audit
        
         | kitsune_ wrote:
         | I think you are missing the mark here with this comparison,
         | Copilot and its network weights are already the derived work,
         | not just the output it produces.
        
         | wilde wrote:
         | Possibly. We won't know until this is tested in court.
         | Traditionally one would want to clean room [1] this sort of
         | thing. Co-pilot is...really dirty by those standards.
         | 
         | [1] https://en.wikipedia.org/wiki/Clean_room_design
        
         | edenhyacinth wrote:
         | A machine learning isn't really the same as a person learning -
         | people generally can code at a high level without having first
         | read TBs of code, nor can you reasonably expect a person to
         | have memorised GPL code to reproduce it on demand.
         | 
         | What you can expect a person to do is understand the principles
         | behind that GPL code, and write something along the same lines.
         | GitHub Co-Pilot is not a general ai, and it's not touted as
         | one, so we shouldn't be considering whether it really _knows_
         | code principles, only that it can reliably output code that
         | fits a similar function to what came before, which could
         | reasonably include entire blocks of GPL code.
        
           | corobo wrote:
           | Well if it is actually straight up outputting blocks of
           | existing code then get it in the bin as a failed attempt to
           | sprinkle AI on development and use this instead
           | 
           | https://github.com/drathier/stack-overflow-import
        
       | schnebbau wrote:
       | Newsflash everyone, if you open source your code it's going to be
       | copied or paraphrased anyway.
        
         | nextaccountic wrote:
         | It should be copied and paraphrased, but respecting the
         | license. This means, among other things, crediting the author.
        
           | schnebbau wrote:
           | It may be hard to believe, but there are sick and twisted
           | individuals in this dangerous world who copy from github
           | without even a single glance at the license, and they live
           | among us.
        
             | iKevinShah wrote:
             | There are always exceptions (maybe they might be norm in
             | this case) but its still not 100%, still not all
             | encompassing. This "AI" seems to be. I think that is like
             | the entire concern. ALL the code is affected for all the
             | instances.
        
             | adn wrote:
             | Yes, and those people are violating the licenses of the
             | code when they do that. It's not unreasonable to expect a
             | massive company like microsoft to not do this on a massive
             | scale.
        
         | diffeomorphism wrote:
         | What does that have to with the topic? The question is not
         | whether it gets copied, the question is whether it gets
         | pirated.
        
         | postalrat wrote:
         | I think the issue many people may have with this is it's a
         | proprietary tool that profits on work it was not licensed to
         | use this way.
        
         | GuB-42 wrote:
         | Yes that's the point.
         | 
         | But if I do it under a copyleft license like GPL, I expect
         | those who copy to abide by the license and open source their
         | own code too.
         | 
         | But sure, people shit on IP rights all the time, and I am
         | guilty of it too. Let's say I didn't pay what I should have
         | paid for every piece of software I have used.
        
       | Closi wrote:
       | "About 0.1% the snippets are verbatim"
       | 
       | This implies that by just changing the variable names, the
       | snippets are classed as non-verbatim.
       | 
       | I don't buy that this number is anywhere close to the actual
       | figure if you assume that you can't just change function names
       | and variable names and suddenly say you have escaped both the
       | legality and the spirit of GPL.
        
       | jordemort wrote:
       | What happens when someone puts code up on GitHub with a license
       | that says "This code may not be used for training a code
       | generation model"?
       | 
       | - Is GitHub actually going to pay any attention to that, or are
       | they just going to ingest the code and thus violate its license
       | anyway?
       | 
       | - If they go ahead and violate the code's license, what are the
       | legal repercussions for the resulting model? Can a model be "un-
       | trained" from a particular piece of code, or would the whole
       | thing need to be thrown out?
        
         | vbezhenar wrote:
         | I expect them to check /LICENSE file and if it deviates from
         | standard open source license, they'll skip that repository.
        
           | jordemort wrote:
           | They haven't made any public statements on if they're looking
           | at LICENSE or not; I'd sure appreciate it if they did!
        
           | anfelor wrote:
           | They don't do that it seems. In the footnotes of
           | https://docs.github.com/en/github/copilot/research-
           | recitatio... they mention two repositories from the training
           | set none of which specify a license.
        
           | cxr wrote:
           | The existence of a LICENSE file is neither necessary nor
           | sufficient to determine the terms that apply to a given work.
        
             | diffeomorphism wrote:
             | Why not? If it does not exist you treat it as proprietary
             | (copyrighted by default) and if it does exist at least the
             | author claims that the given license is an option (possibly
             | their mistake, not mine)
        
               | junon wrote:
               | Because individual source files might have license
               | headers that override the root license file in the
               | repository.
        
         | all_rights_rsvd wrote:
         | I post my code publicly, but with an "all rights reserved"
         | licence. I don't mind everyone reading my code freely, but you
         | can't use it for anything but learning. If I found out they
         | were ingesting my code I would be angry. It's like training
         | your replacement. I don't use GitHub, anyways, but now I'll
         | definitely never even think about it.
        
           | toyg wrote:
           | Technically then I'm infringing as soon as I clone your repo,
           | possibly even as soon as a webserver sends your files to my
           | browser.
           | 
           | "All rights reserved" makes sense on final items, like books
           | or physical records, that require no copy or change after
           | owner-approved manufacturing has taken place. It doesn't
           | really make sense on digital artefacts.
        
             | all_rights_rsvd wrote:
             | So don't clone it, read it online. I reserve all rights,
             | but I do give license to my host to make a "copy" to let
             | you view it. I do that specifically to prevent non-
             | biological entities like corporations or AI from using my
             | code. If you're a biological entity, I specify you can
             | email me to get a license for my code for a specific,
             | defined purpose. I have a conversation with that person,
             | then I send them a record number and the terms of my
             | license for them in which I grant some rights which I had
             | reserved.
             | 
             | Also, in your example, the copyright for the book or dvd is
             | for the content, not the physical item. You can do anything
             | you want with that item but not the content. My code is
             | similar, I'm licensing my provider to serve you a visual
             | representation of the files so you can experience the
             | content, not giving you a license to run that code or use
             | it otherwise.
        
               | [deleted]
        
               | uchiga wrote:
               | If it is public, its no longer your code. Its ai training
               | material.
        
         | cortesoft wrote:
         | Also, how would you know if your code was included in the
         | training or not?
         | 
         | Then, let's say the AI generates some new code for someone, and
         | it is nearly identical to some bit of code that you wrote in
         | your project.
         | 
         | If they didn't use your code in the model, then the generated
         | code is clearly not a copyright violation, since it was
         | effectively a "clean room" recreation.
         | 
         | If your code was included in the model, is it therefore a
         | violation?
         | 
         | But then again, it comes down to how can someone prove their
         | code was included or not?
         | 
         | What if the creators don't even know? If you wrote your model
         | to say, randomly grab 50% of all public repos to use in the
         | model, then no one would know if a specific repo was used in
         | the training.
        
         | invokestatic wrote:
         | By uploading your content to GitHub, you've granted them a
         | license to use that content to "improve the Service over time",
         | as specified in the ToS[1].
         | 
         | That effectively "overrides" any license or term that you've
         | specified for your repository, since you've already licensed
         | the content to GitHub under different terms. Of course, people
         | who are not GitHub are beholden to the terms you specify.
         | 
         | [1] https://docs.github.com/en/github/site-policy/github-
         | terms-o...
        
           | nitwit005 wrote:
           | I rather suspect judges would not see "improving the Service
           | over time" as permission to create derivative works without
           | compensation.
           | 
           | The person uploading files to github is also not necessarily
           | doing so with permission from the rights holder, which might
           | be a violation of the terms of service, but would mean
           | there's no agreement in place.
        
           | diffeomorphism wrote:
           | How is this different from uploading a hollywood movie to
           | youtube? Just because there is a passage in the terms that
           | the uploader supposedly gave them those rights, this does not
           | mean they actually have the power to do that.
        
             | jcranmer wrote:
             | You can't give Github or Youtube or anybody else copyright
             | rights if you don't have them in the first place. This is
             | what ultimately torpedoed "Happy Birthday" copyright
             | claims: while it's pretty undisputed that the Hill sisters
             | gave their copyright to (ultimately) Warner/Chapelle, it
             | was the case that they actually _didn 't_ invent the
             | lyrics, and thus Warner/Chapelle had no copyright over the
             | lyrics.
             | 
             | So if someone uploads a Hollywood movie to Youtube, Youtube
             | doesn't get the rights to play that movie from them because
             | they didn't have the rights in the first place. Of course,
             | if the actual copyright owner uploads it, it's now
             | permissible for Youtube to play it, even if it's the copy
             | that someone else provided. [This has torpedoed a few
             | filesharing lawsuits.]
        
             | macinjosh wrote:
             | Not sure how much it would matter but the main difference I
             | see is that if I upload my own code to GitHub I have the
             | ability to give away the IP, but if I upload Avengers End
             | Game to YouTube I don't have the right to give that away.
        
               | makeitdouble wrote:
               | I wonder how it would work if we consider you flagged
               | your code as GPL before it hits Github.
               | 
               | We could end up in the same situation as the Hollywood
               | movie even if you are also the one setting the original
               | license on the work. Basically you have a right to change
               | the license, but it doesn't mean you do.
        
               | im3w1l wrote:
               | A very plausible scenario: Alice creates GPL project. Bob
               | forks it and uploads to github. Bob does not have a right
               | to relicense Alices' parts.
        
           | Hamuko wrote:
           | I sort of doubt that GitHub could include GPL code in a piece
           | of closed-source program that they distribute that "improves
           | the service" and claim that this gives them the right.
        
           | amelius wrote:
           | > By uploading your content to GitHub, you've granted them a
           | license to use that content to "improve the Service over
           | time", as specified in the ToS.
           | 
           | That's nonsense because they could claim that for almost any
           | reason.
           | 
           | E.g. assume Google put the source code of Google search in
           | Github. Then Github copies that code and uses it in their own
           | search, since that "improves the service". Would that be
           | legal?
           | 
           | It's like selling a pen and claiming the rights to anything
           | written with it.
        
             | invokestatic wrote:
             | If the pen was sold with a contract that said the seller
             | has the rights to anything written with it, then yes. These
             | types of contracts are actually quite common, for example
             | an employment contract will almost certainly include an IP
             | grant clause. Pretty much any website that hosts user-
             | generated content as well. IANAL, but quite familiar with
             | business law.
        
               | joepie91_ wrote:
               | > These types of contracts are actually quite common, for
               | example an employment contract will almost certainly
               | include an IP grant clause.
               | 
               | In the US, maybe. In most of the rest of the world, these
               | sorts of overreaching "we own everything you do anywhere"
               | clauses are decidedly illegal.
        
           | lucideer wrote:
           | The use of the definition _Your Content_ may make GitHub 's
           | own ToS legally invalid in a large number of cases as it
           | implies that the uploader must be the sole author and "owner"
           | of the code being uploaded.
           | 
           | From the definitions section in the same doc:
           | 
           | > _" Your Content" is Content that you create or own._
           | 
           | That will definitely exclude any mirrored open-source
           | projects, any open-source project that has ever migrated to
           | Github from another platform, and also many forked projects.
        
           | carlosperate wrote:
           | Good point, to me that explains why this is a GitHub product
           | instead of a Microsoft (or VSCode) product.
        
           | joeyh wrote:
           | Anyone can upload someone else's freely licensed code to
           | github. Without giving them such a license.
           | 
           | I do not upload my code to github, or give them any special
           | permissions, and I am confident my code was included in the
           | model's corpus.
        
           | jordemort wrote:
           | I think more specifically, the relevant bit is here:
           | https://docs.github.com/en/github/site-policy/github-
           | terms-o...
           | 
           | > We need the legal right to do things like host Your
           | Content, publish it, and share it. You grant us and our legal
           | successors the right to store, archive, parse, and display
           | Your Content, and make incidental copies, as necessary to
           | provide the Service, including improving the Service over
           | time. This license includes the right to do things like copy
           | it to our database and make backups; show it to you and other
           | users; parse it into a search index or otherwise analyze it
           | on our servers; share it with other users; and perform it, in
           | case Your Content is something like music or video.
           | 
           | But, it goes on to say:
           | 
           | > This license does not grant GitHub the right to sell Your
           | Content. It also does not grant GitHub the right to otherwise
           | distribute or use Your Content outside of our provision of
           | the Service, except that as part of the right to archive Your
           | Content, GitHub may permit our partners to store and archive
           | Your Content in public repositories in connection with the
           | GitHub Arctic Code Vault and GitHub Archive Program.
           | 
           | I'm not a lawyer, but it seems ambiguous to me if this ToS is
           | sufficient to cover CoPilot's butt in corner cases; I bet at
           | least one lawyer is going to make some money trying to answer
           | the question.
        
             | buu700 wrote:
             | IANAL, but I wouldn't read that as granting GitHub the
             | right to do anything like this. There's definitely a
             | reasonable argument to be had here, but I think limiting
             | the grant of rights to incidental copies should trump
             | "[...] or otherwise analyze it on our servers" and what
             | they're allowed to do with the results of that analysis.
             | 
             | On the extreme end, "analysis" is so broad that it could
             | arguably cover breaking down a file of code into its
             | constituent methods and just saving the ASTs of those
             | methods verbatim for Copilot to regurgitate. That's
             | obviously not an acceptable outcome of these terms per se,
             | but arguably isn't any different in principle from what
             | they're already doing.
             | 
             | Ultimately, as I understand, courts tend to prefer a common
             | sense outcome based on a reasonable human understanding of
             | the law, rather than an outcome that may be defensible
             | through some arcane technical logic but is absurd on its
             | face and counter to the intent of the law. If a party were
             | harmed by an instance of Copilot-generated copyright
             | infringement, I don't see a court siding with this tenuous
             | interpretation of the ToS over the explicit terms of the
             | source code license. On the other hand, it would probably
             | also be impossible to prove damages without something like
             | a case of verbatim reproduction, similarly to how having a
             | developer move from working on proprietary code for one
             | company to another isn't automatically copyright
             | infringement.
             | 
             | I doubt that GitHub is doing anything as blatantly
             | malicious as copying snippets of (GPL or proprietary) code
             | to explicitly reuse verbatim, but if they're learning from
             | license-restricted code at all then I don't see how they
             | wouldn't be subjecting themselves and/or consumers of
             | Copilot to the same risk.
        
             | yaitsyaboi wrote:
             | Wait so does this mean a "private repo" is meaningless and
             | GitHub can share any code in any repo with anyone?
        
               | ipaddr wrote:
               | That is not even the right question.
               | 
               | Why are developers so myopic around big tech? Of course
               | they can. Facebook can use your private photos. It's in
               | their terms and services. Cloud providers have more
               | generous terms.
               | 
               | The response has always been they won't do that because
               | they have a reputation to manage. The further they grow
               | the further they control the narrative so the less this
               | matters.
               | 
               | Wait until you find out they sell your data or use your
               | data to sell products.
               | 
               | Why in 2021 are we giving Microsoft all of our code? It
               | seems like the 90s, 2000s never happened and we all trust
               | microsoft. They have a free editor and a free operating
               | system that sends packets of activity the user does back
               | to microsoft but that's okay.. we want to help improve
               | their products? We trust them.
        
               | sandyarmstrong wrote:
               | Why do you think people care so much about end-to-end
               | encrypted messaging?
               | 
               | Yes, the concept of a "private" repo is enforced only by
               | GitHub's service. A bug in their auth code could lead to
               | others having access. A warrant could lead to others
               | having access. Etc.
        
               | cercatrova wrote:
               | Of course. A "private" repo is still on their servers.
               | It's only private from other GitHub users, not the actual
               | site administrators. This is the same in any website, of
               | course the admins can see everything. If you truly want
               | privacy, use your own git servers.
        
               | ocdtrekkie wrote:
               | Fun fact: Every major cloud provider has a similar
               | blanket term. For example, Google doesn't need to license
               | music to use for promotional content, because YouTube's
               | terms grant them a worldwide license to use uploaded
               | content for purposes including promoting their services,
               | and music labels can't afford to not be on YouTube. (It's
               | probable even uploading content to protect it, as in
               | Content ID, would arguably cause this term to apply.)
               | 
               | It all comes down to the nuance of whether the usage
               | counts as part of protecting or improving (or promoting)
               | their services and what other terms are specified.
        
               | vageli wrote:
               | No.
               | 
               | > GitHub may permit our partners to store and archive
               | Your Content in public repositories in connection
        
               | z3ncyberpunk wrote:
               | Hey... want to take a guess why Microsoft lets you have
               | unlimited free private repos when they bought GitHub? ;)
        
               | notatoad wrote:
               | yes, that's what that specific section means, but as
               | always with these documents you can't just extract a
               | single section, you need to take the document as a whole
               | (and usually, more than one document - ToS privacy policy
               | are usually different)
               | 
               | these documents are structured as granting the service
               | provider extremely broad rights, and then the rest of the
               | document takes away portions of those rights. so in this
               | case they claim the right to share any code in any repo
               | with anyone, and then somewhere else they specify which
               | code they won't share, and with whom they won't share it.
        
           | 2OEH8eoCRo0 wrote:
           | It's aggravating that there is no escape. If you host
           | somewhere else it will be scraped. If you pay for the service
           | it will be used.
        
           | antattack wrote:
           | That does not mean that you give them license to your code.
           | In fact some or all of the code may not be yours to give in a
           | first place.
        
           | sipos wrote:
           | Seems like a good reason to never use GitHub, and encourage
           | other people not to.
        
         | rjp0008 wrote:
         | I would bet this as applicable as the Facebook posts of my
         | parents friends something like, 'All my content on this page is
         | mine alone and I expressly forbid Facebook INC usage of it for
         | any purpose.'
        
           | jordemort wrote:
           | I'm not sure why it would be any less binding than any other
           | license term, except for possibly the ToS loophole that
           | invokestatic points out below.
        
             | willseth wrote:
             | It's not binding because the other party hasn't agreed. You
             | agree to terms when you use the site. One party can't
             | unilaterally change the agreement without consent of the
             | other party.
        
               | jordemort wrote:
               | I see where you're coming from but it's not quite the
               | same thing; Facebook doesn't encourage people to choose a
               | license for the content that they post there, so there's
               | no expectation that there are any terms aside from those
               | in Facebook's ToS. OTOH GitHub has historically very
               | strongly encouraged users to add a LICENSE to their
               | repositories, and also encouraged users to fork other
               | people's code and and push it to GitHub. That GitHub
               | would be exempt from the licensing terms of the code
               | pushed to it, except for the obvious minimal extent they
               | might need to be in order to provide their services,
               | seems like an extremely surprising interpretation.
        
               | Avamander wrote:
               | Someone might have published a project I've contributed
               | to, on GitHub. There's no permission.
        
           | moolcool wrote:
           | NO COPYRIGHT INTENDED
        
       | mattdesl wrote:
       | By submitting any textual content (GPL or otherwise) on the web,
       | you are placing it in an environment where it will be consumed
       | and digested (by human brains and machine learning algorithms
       | alike). There is already legal precedent set for this which
       | allows its use in training machine learning algorithms,
       | specifically with heavily copyrighted material from books[1].
       | 
       | This does not mean that any GitHub Co-Pilot produced code is
       | suddenly free of license or patent concerns. If the code produces
       | something that matches too closely GPL or otherwise licensed code
       | on a particularly notable algorithm (such as video encoder), you
       | may still be in a difficult legal situation.
       | 
       | You are in essence using "not-your-own-code" by relying on
       | CoPilot, which introduces a risk that the code may not be
       | patent/license free, and you should be aware of the risk if you
       | are using this tool to develop commercial software.
       | 
       | The main issue here is that many average developers may continue
       | to stamp their libraries as MIT/BSD, even though the CoPilot-
       | produced code may not adhere to that license. If the end result
       | is that much of the OSS ecosystem becomes muddied and tainted,
       | this could slowly erode trust in open licenses on GitHub (i.e.
       | the implications would be that open source libraries could become
       | less widely used in commercial applications).
       | 
       | [1] - https://towardsdatascience.com/the-most-important-supreme-
       | co...
        
       | akagusu wrote:
       | For years people have warned about hosting the majority of
       | world's open source code in a proprietary platform that belongs
       | to a for profit company. These people were called lunatics,
       | fundamentalists, radicals, conspiracy theorists, and many other
       | names.
       | 
       | Well, they were ignored and this is the result. A for profit
       | company built a proprietary system using every code hosted in its
       | platform without respecting the code license.
       | 
       | There will be a lot of people saying this is not a license
       | violation but it is, and more, it is an exploitation of other
       | people work.
       | 
       | Right now I'm asking myself when people will stop supporting
       | these kind of company that exploit people's work without giving
       | anything in return to people and society while making a huge
       | amount of profit.
        
         | sergiomattei wrote:
         | If we feed the entirety of a library to an AI and have it
         | generate new books, is it an exploitation of people's work?
         | 
         | If we read a book and use its instructions to build a bicycle,
         | is it an exploitation of people's work?
         | 
         | No, no it's not.
        
       | yunohn wrote:
       | It's astonishing to me that HN+Twitter believe that Github
       | designed this entire project, without speaking to their legal
       | team and confirming that training on GPL code would be possible.
       | 
       | Mind-blowingly hilarious armchair criticism.
        
       | darkerside wrote:
       | The conclusion seems a bit unfair.
       | 
       | > "but eevee, humans also learn by reading open source code, so
       | isn't that the same thing" - no - humans are capable of abstract
       | understanding and have a breadth of other knowledge to draw from
       | - statistical models do not - you have fallen for marketing
       | 
       | Machines will draw on other sources of knowledge besides the GPL
       | code. Whether they have the capacity for "abstract thought" is
       | probably up for debate. There's not much else said in those
       | bullets. It's not a good argument.
        
       | goodpoint wrote:
       | What is more concerning is that the training kernel belongs
       | exclusively one private company. Microsoft.
       | 
       | It can become a massive (and unfair) competitive advantage.
       | 
       | Furthermore, Copilot will not work with less popular languages
       | and also prevent popular languages from evolving.
        
         | bastardoperator wrote:
         | Is this true? Looks like they're using the OpenAI Codex which
         | is set to be released soon:
         | 
         | https://openai.com/
        
         | giansegato wrote:
         | This feature is effectively impossible to replicate. Only
         | Microsoft positioned itself to have: - dataset (GitHub) - tech
         | (openai) - training (azure) - platform (vscode)
         | 
         | I'm impressed. They did an amazing job from a corporate
         | strategy standpoint. Also directionally things are getting
         | interesting
        
           | djrogers wrote:
           | The dataset is all freely available open source code, right?
           | Just because GH hosts it doesn't mean the rest of the world
           | can't use it for the same purpose.
        
             | handrous wrote:
             | They'd find a way to keep it practically difficult to use,
             | at the least, if that dataset is vital to the process.
             | Hoarding datasets that should either be wholly public _or_
             | unavailable for any kind of exploitation is the _backbone_
             | of 21st century big tech. It 's how they make money, and
             | how they maintain (very, very deep) moats against
             | competition.
             | 
             | [EDIT] actually, I suspect their play here will be to open
             | up the public data but own the best and most low-friction
             | implementation, then add terms that let them also feed
             | their algo with _proprietary_ code built using their
             | editors. That part won 't be freely available, and no free
             | version will be able to provide that further-improved
             | model, even assuming all the software to build it is open-
             | source. Assuming using this thing ends up being a
             | significant advantage (so, assuming this matters at all)
             | your choice will be to either hamstring yourself in the
             | market or to help Microsoft build their dataset.
        
             | midoBB wrote:
             | You'd have to hit rate limiting multiple times no?
        
               | unfunco wrote:
               | https://console.cloud.google.com/marketplace/product/gith
               | ub/...
               | 
               | BigQuery used to have a dataset updated weekly, looks
               | like it hasn't been updated since about a year after the
               | acquisition by Microsoft.
        
               | kall wrote:
               | Arenmt mirrors of all GH code available, for example on
               | BigQuery public datasets. If it's there, it should be
               | available in a downloadable format too?
        
               | goodpoint wrote:
               | Not only that, but microsoft could aggressively throttle
               | or outcompete anyone trying to do the same.
        
           | deckard1 wrote:
           | Is this really anything more than a curiosity toy and a
           | marketing tool?
           | 
           | I took a look at their examples and they are not at all
           | compelling. In one example it generated SQL and somehow knew
           | the columns and tables in a database that it had no context
           | on. So that's a lot of smoke and mirrors going on right
           | there.
           | 
           | Do many developers actually want to work in this manner? That
           | is, being interrupted every time they type with a robot
           | interjection of some Frankenstein code that they now have to
           | go through and review and understand. Personally, this is
           | going to kick me out of the zone/flow too often to be useful.
           | Coding isn't the hard part of my job. If this tool can
           | somehow guess the business requirements of the task at hand,
           | _then_ I 'll be impressed.
           | 
           | Even if the tool generates accurate code, if I don't fully
           | understand _what_ it wrote, then what? I 'm still stuck
           | digging through documentation and stackoverflow to verify
           | that whatever is in my text editor is correct code. "Code
           | confidently in unfamiliar territory" sounds like a Boeing 737
           | Max sized disaster in the making.
        
           | nmfisher wrote:
           | I actually don't think there's much of a moat here at all.
           | 
           | GitHub repositories are open for the taking, GPT-XXX is
           | cloneable (mostly, anyway) and VS Code is extensible.
           | 
           | They definitely have a good head-start, but I really don't
           | think there's anything here that won't be generally available
           | within 2 years.
        
           | IshKebab wrote:
           | Anyone can download the training set from GitHub.
        
       | sirsinsalot wrote:
       | "Who owns the future?" by Jaron Lanier covers lots of this stuff
       | in a realli interesting way.
       | 
       | If heart surgeons train an AI robot to do heart surgery ...
       | shouldn't they be compensated (as passive income) for enabling
       | that automation?
       | 
       | Shouldn't this all be accounted for? If my code helps you write
       | better code (via AI) shouldn't I be compensated for the value
       | generated?
       | 
       | We are being ripped off.
        
       | monocasa wrote:
       | Honestly I think a large part of the value add of machine
       | learning is going to be the ability for huge entities to launder
       | intellectual property violations.
       | 
       | As an example, my grandfather (an old school EE who got his start
       | on radar systems in the 50s, who then got his radiology MD when
       | my jewish grandmother berated him enough with "engineer's not
       | doctor though...") has some really cool patents around
       | highlighting interesting parts of the frequency domain in MRIs
       | that should make detection of cancer a whole lot easier. As an
       | implementation he did a bunch of tensor calculus by hand to
       | extract and highlight those features because he's an incredibly
       | smart old school EE with 70 years experience cranking that kind
       | of thing out with only his trusty slide rule. He hasn't gotten
       | any uptake from MRI manufacturers, but they're all suddenly
       | really into recurrent machine learning models to highlight the
       | same sorts of stuff. Part of me wants to tell him to try selling
       | it as a machine learning model and just obfuscate the fact that
       | the model was carefully hand written rather than back propagated.
       | 
       | I'm personally pretty anti intellectual property (at least how
       | it's implemented in the states), but a system where large
       | entities that have the capital investment to compute the large ML
       | models can launder IP violations, but little guys get stuck to
       | the letter of the law certainly seems like the worst of both
       | worlds to me.
        
         | 908B64B197 wrote:
         | > Part of me wants to tell him to try selling it as a machine
         | learning model and just obfuscate the fact that the model was
         | carefully hand written rather than back propagated.
         | 
         | How many models are back-propagated first and then hand-tuned?
        
           | monocasa wrote:
           | That's a great question. I had assumed that the workflow of
           | an ML engineer consisted of managing the data and a
           | relatively high level set of parameters around a search space
           | of layers and connectivity, as the whole shtick of ML is that
           | the hyperparameter space of the tensors themselves is too
           | complex to grok or tweak when generated from training. But I
           | only have a passing knowledge of the subject, pretty much
           | just enough to get myself in trouble in these kinds of
           | discussions.
           | 
           | Any chance some fantastic HNer could chime in there?
        
       | pluto7777 wrote:
       | >GitHub co-pilot as open source code laundering? The English
       | language as I flush?
        
       | junon wrote:
       | SourceHut is looking real nice these days...
        
         | kzrdude wrote:
         | Why not gitlab?
        
       | VMtest wrote:
       | gonna develop my own linux-like kernel soon, with my own AI model
       | trained on public repositories
       | 
       | wanna see the source code of my AI model? oh, it's closed source
       | 
       | it's just coincidence that nearly 100% of my future linux-like
       | kernel code looks the same as linux the kernel, bear in mind that
       | my closed-source AI model takes inspiration from GitHub Copilot,
       | there is no way that it will copy any source code
        
         | phendrenad2 wrote:
         | Nothing is closed-source to the courts.
        
         | throwaway3699 wrote:
         | What's the point? Linux is already open under GPL 2.
        
           | VMtest wrote:
           | my linux-like kernel will be MIT license though
        
           | jackbeck wrote:
           | He mentioned that the Linux-like kernel will be closed source
           | which violates GPL
        
             | Ygg2 wrote:
             | Does it, if code was written by a bot that trained on Linux
             | kernel?
        
               | pjerem wrote:
               | You know, that's precisely what the topic here is about.
        
               | sp332 wrote:
               | Probably. Copyright applies to derivative works.
        
           | Deathmax wrote:
           | You get to make changes without having to respect the GPL and
           | thus no longer obligated to provide those changes to your end
           | users, as you have effectively laundered the kernel source
           | code by passing it through an "AI" and get to relicense the
           | end result.
        
         | visarga wrote:
         | Oh, you're so witty, have you heard of content hashing?
        
       | Hamuko wrote:
       | The potential inclusion of GPL'd code, and potentially even
       | unlicensed code, is making me wary of using it. Fair Use doesn't
       | exist here and if someone was to accuse me of stealing code,
       | saying "I pressed a button and some computer somewhere in the
       | world, that has potentially seen your code as well, generated it
       | for me" is probably not the greatest defense.
        
       | dec0dedab0de wrote:
       | I wonder what would happen if someone scraped genius and used the
       | lyrics to make a song writing tool.
        
       | danielEM wrote:
       | In a day MS bought github I knew that is on their agenda
        
       | KETpXDDzR wrote:
       | If it's trained with GPL licensed code, doesn't that mean the
       | network they use includes it somewhat? Then, someone could sue
       | that their networks must be GPL licensed too, right?
        
         | peterkelly wrote:
         | Yes, the neural network would constitute a derived work.
        
           | jahewson wrote:
           | Actually no because it's a "transformative use". This is how
           | search engines are allowed to show snippets and thumbnails.
        
       | afro88 wrote:
       | Man reading the response tweets really highlights how bad twitter
       | is for nuanced discussion.
        
       | varispeed wrote:
       | People write code in their spare time, often without
       | compensation.
       | 
       | Then a big corporation comes in, appropriates it, repackages and
       | sells as a new product.
       | 
       | It's a shameful behaviour.
        
       | mrosett wrote:
       | The second tweet in the thread seems badly off the mark in its
       | understanding of copyright law.
       | 
       | > copyright does not only cover copying and pasting; it covers
       | derivative works. github copilot was trained on open source code
       | and the sum total of everything it knows was drawn from that
       | code. there is no possible interpretation of "derivative" that
       | does not include this
       | 
       | Copyright law is very complicated (remember Google vs Oracle?)
       | and involves a lot of balancing different factors [0]. Simply
       | saying that something is a "derivative work" doesn't establish
       | that it's copyright infringement. An important defense against
       | infringement claims is arguing that the work is "transformative."
       | Obviously "transformative" is a subjective term, but one example
       | is the Supreme Court determining that Google copying Java's API's
       | to a different platform is transformative [1]. There are a lot of
       | other really interesting examples out there [2] involving things
       | like if parodies are fair use (yes) or if satires are fair use
       | (not necessarily). But one way or another, it's hard for me to
       | believe that taking static code and using it to build a code-
       | generating AI wouldn't meet that standard.
       | 
       | As I said, though, copyright law is really complicated, and I'm
       | certainly not a lawyer. I'm sure someone out there could make an
       | argument that Copilot is copyright infringement, but this thread
       | isn't that argument.
       | 
       | [0] https://www.nolo.com/legal-encyclopedia/fair-use-the-four-
       | fa...
       | 
       | [1]
       | https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_...
       | 
       | [2] https://www.nolo.com/legal-encyclopedia/fair-use-what-
       | transf...
       | 
       | Edit: Note that the other comments saying "I'm just going to wrap
       | an entire operating system in 'AI' to do an end run around
       | copyright" are proposing to do something that _wouldn 't_ be
       | transformative and therefore probably wouldn't be fair use.
       | Copyright law has a lot of shades of grey and balancing of
       | factors that make it a lot less "hackable" than those of us who
       | live in the world of code might imagine.
        
         | invig wrote:
         | If you can read open source code, learn from it, and write your
         | own code, why can't a computer?
        
           | drran wrote:
           | Because computers did not win a war against humans, so they
           | have no rights. Only their owners have rights protected.
        
           | 015a wrote:
           | Many behaviors which are healthy and beneficial at human-
           | level scale can easily become unhealthy and unethical at
           | industrial automation scale. There's little universal harm in
           | cutting down a tree for fire during the winter; there is
           | significant harm in clear-cutting a forest to do the same for
           | a thousand people.
        
           | mrdrozdov wrote:
           | I think the core argument has much more to do about
           | plagiarism than learning.
           | 
           | Sure, if I use some code as inspiration for solving a problem
           | at work, that seems fine.
           | 
           | But if I copy verbatim some licensed code then put it in my
           | commercial product, that's the issue.
           | 
           | It's a lot easier to imagine for other applications like
           | generating music. If I trained a music model on publicly
           | available Youtube music videos, then my model generates music
           | identical to Interstellar Love by The Avalanches and I use
           | the "generated" music in my product, that's clearly a use
           | that is against the intent of the law.
        
           | esailija wrote:
           | The AI doesn't produce its own code or learn, it is just a
           | search engine on existing code. Any result it gives exists in
           | some form in the original dataset. That's why the original
           | dataset needs to be massive in the first place, whereas
           | actual learning uses very little data.
        
           | paxys wrote:
           | If I read something, "learn" it, and reproduce it word for
           | word (or with trivial edits) even without referencing the
           | original work at all, it is still copyright infringement.
        
           | toss1 wrote:
           | As the original commenter said, you have the capability for
           | abstract learning, thought, zand generalized learning, which
           | the "AI" lacks.
           | 
           | It is not uncommon to ask person to "explain in your own
           | words..." - as in use your own abstract internal
           | representation of the learned concepts to demonstrate that
           | you have developed such an abstract internal concept of the
           | topic, and are not merely regurgitating re-disorganized input
           | snippets.
           | 
           | If you don't understand the difference...
           | 
           | edit: That said, if you can create a computer capable of such
           | different abstract thought, congratulations, you've solved
           | the problem of Artificial General Intelligence, and will be
           | welcomed to the Trillionaires' Club
        
             | gradys wrote:
             | The AI most certainly does not lack the ability to
             | generalize. Not as well as humans, but generalization is
             | the key interesting result in deep learning, leading to
             | papers like this one: https://arxiv.org/abs/1710.05468
             | 
             | The ability to generalize actually seems to keep increasing
             | with the number of parameters, which is the key interesting
             | result in the GPT-* line of work that Copilot is based on.
        
         | imranhou wrote:
         | Google copied an interface(declarative), not code
         | snippets/functions(implementation). Copilot is capable of
         | copying only Implementation. IMO that is quite different and
         | easily a violation if it was copied verbatim.
        
       | _greim_ wrote:
       | As a human programmer, I've also been trained on thousands of
       | lines of other people's code. Is there anything new here, from a
       | code copying perspective? Aren't I liable if segments of my own
       | code exactly match someone else's code, even if I didn't
       | knowingly copy/paste it?
        
         | qayxc wrote:
         | Well to me those are fundamental questions that need to be
         | addressed one way or the other. Are systems like GPT-x
         | basically plagiarising (doesn't matter the nature of the
         | output, be it prose, code, or audio-visual) or are the results
         | so transformative in nature that they can be considered to be
         | "original work"?
         | 
         | In other words, are these systems to be treated like students
         | that learned to perform the task they do from a collection of
         | source material, or are they to be viewed as sophisticated
         | databases that "just" perform context-sensitive retrieval?
         | 
         | These are interesting and important questions and I'm glad
         | someone is publicly asking them and that many of us at least
         | think about them.
        
       | [deleted]
        
       | lend000 wrote:
       | Perhaps someone at Github can chime in, but I suspect that open
       | source code datasets (the kind they are trained on) should
       | require relatively permissive licenses in the first place.
       | Perhaps they filter for MIT licenses in Github projects and
       | StackOverflow answers used to train the models?
        
       | mikewarot wrote:
       | I think the argument has merit. Unfortunately it won't be decided
       | on technical merit, but likely in the manner expressed in this
       | excellent response I saw on Twitter:
       | 
       | "Can't wait to see a case for this go in front of an 80 year old
       | judge who rules something arbitrary and justifies it with an
       | inaccurate comparison to something nontechnical."
        
       | jgalt212 wrote:
       | Isn't most of modern coding, just googling for someone who had
       | solved the same problem that you are currently facing and then
       | just copy/paste from Stack Overflow?
       | 
       | To the extent that GPT-3 / co-pilot is just an over-fitted neural
       | net, then it's primary value is as an automated search, copy, and
       | paste.
        
       | gus_massa wrote:
       | I think copyright is a problem for GPL-like licenses. They should
       | have restricted the training data to MIT/BSD-like.
       | 
       | Anyway, there is another problem that is patents and is huger,
       | much huger. I think the Apache license has a provision about
       | patents, but most of other licenses may have code that has
       | patents and if the AI generate something similar it may be
       | included in the patent.
        
         | joepie91_ wrote:
         | MIT/BSD-like would still require attribution, which they are
         | _also_ not doing.
        
           | gus_massa wrote:
           | I think you are correct, but (I guess that) most people that
           | use MIT/BSD use them as a polite version of the WTFPL.
           | 
           | People that use A/L/GPL usually like the virality and will
           | complain more.
        
       | abeppu wrote:
       | The core problem which would allow laundering (that there isn't a
       | good way to draw a straight, attributive line between generated
       | code and training examples) to me also presents a potential
       | eventual threat to the viability of co-pilot/codex. It seems like
       | the same thing would prevent it from knowing which published code
       | was written by humans vs which was at least in part an output
       | from the system. Training on an undifferentiated mix of your
       | model's outputs and human-authored code seems like it could
       | eventually lead the model into self-reinforcing over-confidence.
       | 
       | "But snippet proposals call out to GH, so they can know which
       | bits of code they generated!". Sometimes; but after Bob does a
       | co-pilot assisted session, and Alice refactors to change a
       | snippet's location and rename some variables and some other minor
       | changes and then commits, can you still tell if it's 95% codex-
       | generated?
        
       | wg0 wrote:
       | If I read a lot of GPL code, absorb naming conventions,
       | structures, patterns, tricks and later when it comes down to
       | writing a P2P Chat server, I happen to recall similar patterns,
       | naming structures, conventions and many of the utility methods
       | are pretty much how they are in the GPL code bases out there.
       | 
       | Now is my produced code is also GPL derivative because I
       | certainly did read through the code base to be able to write
       | larger programs?
        
         | heeton wrote:
         | https://twitter.com/eevee/status/1410049195067674625
         | 
         | """
         | 
         | "but eevee, humans also learn by reading open source code, so
         | isn't that the same thing" - no - humans are capable of
         | abstract understanding and have a breadth of other knowledge to
         | draw from - statistical models do not - you have fallen for
         | marketing
        
           | DemocracyFTW wrote:
           | > humans are capable of abstract understanding and have a
           | breadth of other knowledge to draw from
           | 
           | this may be a matter of time and thus is not a fundamental
           | objection.
           | 
           | If mankind should fail to answer the perennial question of
           | exploitation of the other and the same, it will be doomed.
           | And rightly so, for mankind must answer this question, it
           | must answer to this question. Instead what we do is increase
           | monetary output then go and brag about efficiency. Neither is
           | this efficient, nor is it about efficiency, nor has the
           | Universe ever cared about efficiency. It just happens to
           | coincide with what Society has decided to be its most looked-
           | upon elements have chosen to be their religion.
           | 
           | It is not my religion to be sure.
        
       | thundergolfer wrote:
       | Attempts to litigate any license violation are going to get
       | precisely nowhere I bet, but I find the actual license violation
       | argument persuasive.
       | 
       | This is an excellent example of how the AI
       | singularity/revolution/whatever is a total distraction and that a
       | much bigger and more serious issue is how AI is becoming so
       | effective at turning the output of cheap/free human mental labour
       | into capital. If AI keeps getting better and better and status
       | quo socio-economic structure don't change, trillions in capital
       | will be captured by the 0.01%.
       | 
       | I would be quite a turn up for the books if this AI co-pilot gets
       | suddenly and dramatically better in 2030 and it negatively
       | impacts the software engineering profession. "Hey, that's our
       | code you used to replace us!" we will cry out too late.
        
         | pedrobtz wrote:
         | Can the same argument/concerns be applied to all text
         | generation AI?
        
         | rowanG077 wrote:
         | I don't feel it's morally right to keep a profession around
         | that is automated. Why should software be different?
        
         | baryphonic wrote:
         | If someone could show that the "copilot" started "generating"
         | code verbatim (or nearly verbatim) from some GPL-licensed work,
         | especially if that section of code was somehow novel or
         | specific to a narrow domain, I suspect they'd have a case. I
         | don't know much about OpenAICodex, but if it's anything like
         | GPT-3, or uses that under the hood, then it's very likely that
         | certain sequences are simply memorized, which seems like the
         | maximal case for claiming derivative works. On the other hand,
         | if someone has GPL'd code that implements a simple counter, I
         | doubt the courts would pay much attention.
         | 
         | I do wonder, though, if GPL owners worried about their code
         | being shanghaied for this purpose could file arbitration claims
         | and exploit some particularly consumer-friendly laws in
         | California which force companies to pay fees like when free
         | speech dissidents filed arbitrations against Patreon.[0]
         | Patreon is being forced to arbitrate 72 claims individually
         | (per its own terms) and pay all fees per JAMS rules. IANAL, so
         | I don't know the exact contours of these rules, or if copyright
         | claims could be raised in this way, or even if GitHub's
         | agreements are vulnerable to this loophole, but it'd be
         | interesting.
         | 
         | [0]https://www.dailydot.com/debug/patreon-suing-owen-
         | benjamin-f... (see second update from July 31).
        
           | duskwuff wrote:
           | > If someone could show that the "copilot" started
           | "generating" code verbatim (or nearly verbatim) from some
           | GPL-licensed work...
           | 
           | Under the right circumstances, Copilot will recite a GPL
           | copyright header. It isn't a huge step from that to some
           | other commonly repeated hunk of GPLed code -- I'd be
           | particularly curious whether some protected portion of
           | automake/autoconf code shows up often enough that it'd repeat
           | that too.
        
           | sideshowb wrote:
           | But what would we think to the legal start-up that
           | automatically checked _all_ of github to see whether the ai
           | could be persuaded to spit out a significant amount of any
           | project code verbatim?
           | 
           | Somehow p-hacking springs to mind
        
           | not2b wrote:
           | You don't need to have a winnable case, just enough of a case
           | for a large company (hello Oracle) to sue a small one. Is any
           | version of Oracle-owned Java in the corpus? Or any of the DBs
           | they bought (MySQL)?
        
         | ballenf wrote:
         | I think the distraction is against how disconnected reality is
         | becoming from copyright/intellectual property regulations.
         | 
         | It's still amazing to me that (US-centric context here), it's
         | well established that instructions how to turn raw ingredients
         | into a cake are not protectable but code that results in
         | transforming one set of numbers into another are protectable.
         | 
         | AI is just making the silliness of that distinction more
         | obvious.
        
           | MadcapJake wrote:
           | Code is not the same as a recipe. Recipes are more like
           | specifications. They leave out the implementation. Code has
           | structural and algorithmic details that just have no
           | comparable concept in recipes.
        
             | rjbwork wrote:
             | >They leave out the implementation. Code has structural and
             | algorithmic details that just have no comparable concept in
             | recipes.
             | 
             | That is really quite debatable in some contexts.
             | Declarative languages like Prolog, SQL, etc. declare what
             | they want and the system figures out how to produce it.
             | Much like a recipe, really.
        
             | Supermancho wrote:
             | > Code has structural and algorithmic details that just
             | have no comparable concept in recipes.
             | 
             | Why do you think that? A compiler uses human readable code
             | to create machine code, with arbitrary optimizations and
             | choices.
        
             | [deleted]
        
           | cmiga wrote:
           | Humans are just sets of atoms, so protecting them is
           | disconnected from reality?
           | 
           | These reductionist arguments lead nowhere. Fortunately, IP
           | lawyers -- including Microsoft's who are fiercely pro IP when
           | it suits them -- think in a more humanistic way and consider
           | the years of work of the IP creator.
           | 
           | Food recipes are irrelevant; the often go back centuries and
           | it's rather hard to identify individual creators. Not so in
           | software.
        
             | Supermancho wrote:
             | > Food recipes are irrelevant; the often go back centuries
             | and it's rather hard to identify individual creators.
             | 
             | That's not correct. Food recipes are created all the time
             | and are attributed. From edible water bottles to impossible
             | burgers, et al.
        
               | z3ncyberpunk wrote:
               | Okay, who invented the apple pie... you completely missed
               | the point and then gave terrible examples of very modern
               | "food" (your examples aren't even really food anyway)
        
         | Jgrubb wrote:
         | I always assumed that one of the reasons Google et al work on
         | AI is because software engineers are too expensive.
        
           | ipaddr wrote:
           | So google pays the highest but still thinks engineers are
           | paid too much? Why not pay them less.. the set high tier?
           | 
           | For google support employees cost too much.
        
             | zeroonetwothree wrote:
             | They don't pay the highest. And if they paid a lot less
             | everyone would leave.
        
         | emodendroket wrote:
         | It seems like the risk exposure would be more to the end user
         | or their employer, doesn't it?
        
         | z3ncyberpunk wrote:
         | Stop talking about arguments we have been having for decades as
         | if we have yet to even discuss them. We are crying out now, we
         | have been crying out about AI since its depictions in sci-fi,
         | it is precisely your sentiment that "ooOOoOhh we're gonna have
         | something scary to deal with SOON" that is dangerous because
         | the soon just pushes the argument out of your personal
         | responsibility and off on someone else... when it really will
         | be too late. Though I would argue we are already too late
         | because we've sold out to corporations and their literal 1984
         | fascist fever dreams all for iphones, technicolor distractions,
         | further bread and circuses.
        
         | kizer wrote:
         | Could disincentivize open source? If I build black boxes that
         | just work, no AI will "incorporate" my efforts into its
         | repertoire and I will still have made something valuable.
        
         | gutino wrote:
         | But the rate of product/services that machinery will produce
         | will make that even a small tax to corporations producing
         | everything autonomously will be enough to feed and give a
         | quality of life to everyone with an UBI or partial time jobs.
         | 
         | You really want to push for high productivity across all
         | industries, even if that means sacrificing jobs in the short
         | term, because history demonstrated after that, new and more
         | human jobs emerge latter.
        
           | briefcomment wrote:
           | The problem with this is that you increasingly have to put
           | your trust in the hands of a shrinking group of owners
           | (people who have the rights to the automated productivity).
           | At some point, those owners are just going to stop supporting
           | everyone else (will probably happen when they have the
           | ability to create everything they could ever want with
           | automation - think robot farms, robot security forces, all
           | encompassing automated monitoring, robot construction, etc.)
        
           | pc86 wrote:
           | Every decade was supposed to see fewer hours working for
           | higher pay and quality of life. It didn't happen, as business
           | owners (not just 1% fat cats, the owners of mom and pop shops
           | are at least as guilty as anyone, they just sucked at scaling
           | their avarice).
           | 
           | So the claim that _this_ technological revolution will be
           | different and that it will result in a broad social safety
           | net, universal basic income, and substantive, well-paid part-
           | time work is a joke but not a very good one. It will be more
           | of the same - massive concentration of wealth among those who
           | already hold enough capital to wield it effectively. A few
           | lucky ones who manage to create their own wealth. And those
           | left behind working more hours for less.
        
             | nextaccountic wrote:
             | You are right that this won't happen by itself. We need
             | another economic system, and not just hope that this time
             | things will magically fix themselves.
        
               | georgeplusplus wrote:
               | This new economic system you want has been in use since
               | the 70s. Everything about the economy is practically
               | socially managed these days.
               | 
               | What part of printing trillions of dollars to stimulate
               | economic productivity is somehow a free market system?
        
               | nextaccountic wrote:
               | I wasn't talking about free market, but the state of
               | present economy. Unfortunately, those trillions of
               | dollars aren't being distributed to the people, but
               | instead is concentrated in the hands of the richest.
        
             | MaxBarraclough wrote:
             | > those left behind working more hours for less
             | 
             | Doing what? Isn't the concern here that automation will
             | push many people out of the workforce entirely?
        
               | xfer wrote:
               | Well as long as humans are more energy-efficient to
               | deploy than robots you will always have a job. It might
               | mean conditions for most humans will be like a century
               | ago.
        
               | MaxBarraclough wrote:
               | > as long as humans are more energy-efficient to deploy
               | than robots
               | 
               | Energy efficiency isn't relevant. When switchboard
               | operators were replaced by automatic telephone exchanges,
               | it wasn't to reduce energy consumption.
               | 
               | The question is whether an automated solution can perform
               | satisfactorily while offering upfront and ongoing costs
               | that make them an economically viable replacement for
               | human workers (i.e. paid employees).
        
               | mysterydip wrote:
               | Who debugs the software when there's a problem?
        
               | MaxBarraclough wrote:
               | Professional software developers, i.e. members of one of
               | the well-paid professions that is not under immediate
               | threat from automation.
        
           | aseipp wrote:
           | Yeah, for sure, the corporations that _already_ pay
           | effectively $0 in tax today are going to suddenly decide in
           | the future to be benevolent and usher in the era of UBI and
           | prosperity for all of humankind. They definitely won 't
           | continue to accumulate capital at the expense of everything
           | else and use that to solidify their grasp of the future.
           | 
           | It would be a lot easier if more people on this website would
           | just be honest with themselves and everyone else and simply
           | admit they think feudalism is good and that serfs shouldn't
           | be so uppity. But not me, of course; I won't be a serf. Now
           | if you'll excuse me, someone gave me a really good deal on a
           | bridge that I'm going to go buy...
        
           | ohgodplsno wrote:
           | The current state of most wealthy countries do not show any
           | hint of any significant corporation tax. Wealth will continue
           | to accrue in the hands of the few.
        
             | mikepurvis wrote:
             | Indeed, even here on HN, it's a pretty regular talking
             | point in the comments that the only fair corporate tax rate
             | is 0%.
        
           | merpnderp wrote:
           | If AI can replace us with difficult tasks, it can repress us.
           | How are you going to agitate for a UBI when AI has identified
           | you as a likely agitator and sends in the robots to arrest
           | you?
        
           | angfxt wrote:
           | Have fun being a hairdresser or prostitute for the 0.01%
           | then.
           | 
           | New jobs in academic fields will _not_ emerge. Already now a
           | significant percentage of degree holders are forced into
           | bullshit jobs.
        
             | throwaway3699 wrote:
             | Would the implication be that we are stagnating as a
             | species then?
        
               | belter wrote:
               | Not stagnating but moving into an "Elysium" (as in the
               | film) type of society.
        
           | Kaze404 wrote:
           | So we give away the world to the 1% and are supposed to be
           | satisfied with the "privilege" of being able to eat?
        
             | SXX wrote:
             | Just look at authocratic countries. That top 1% still need
             | something like 3-4% to work for beaurocracy and 3-5% for
             | armed and police forces. And there are always family
             | connections and relatives of relatives who want better
             | living. So fortunatelly no AI will ever replace corruption
             | and other human society flaws.
             | 
             | But yeah remaining 80-90% of population will have quality
             | of life and bullshit jobs because it's how the world is
             | right now outside of western countries bubble.
        
         | koonsolo wrote:
         | I propose we as developers, start a secret society where we let
         | the AI write the code, but we still claim to write it manually.
         | In combination with the new working from home policies, we can
         | lay at the beach all day and still be as productive as before.
         | 
         | Who is in favor of starting it? ;)
        
           | oaiey wrote:
           | You have not been invited yet .... never mind.
        
           | boxerab wrote:
           | "lay at the beach"
           | 
           | You keep using that word. I do not think it means what you
           | think it means.
        
             | IncRnd wrote:
             | That's four words. The word word doesn't mean what you
             | think it means.
        
           | tan2tan2 wrote:
           | How can I be sure that you are a real person not GPT-3? ;)
        
           | kizer wrote:
           | I mean, this is close. With "co-pilot" an experienced
           | developer saves mountains of time, especially as s/he learns
           | how to wield it effectively.
        
           | zingmars wrote:
           | No... Delete this!
        
           | KMnO4 wrote:
           | This would be the demise of the human race. I'm not entirely
           | opposed to that, though. When AI inevitably outperforms
           | humans on almost all tasks, who am I to say humans deserve to
           | be given those tasks?
        
             | shrimp_emoji wrote:
             | It's an outrage that the dinosaurs had to die so that
             | humans could inherit the Earth!
        
             | nextaccountic wrote:
             | In this case we should be able to work less and enjoy the
             | benefits of automation. We just need to live in an economic
             | system where the economic value is captured by the people
             | at large, and not a minority that owns capital.
        
               | pron wrote:
               | Or maybe they'll decide they'd be better off enjoying the
               | automation of you working for them. :)
        
               | huragok wrote:
               | Careful now, that sounds like socialism!
        
               | easrng wrote:
               | Yes, that's the point.
        
             | hdhjebebeb wrote:
             | Where other people see fully automated luxury communism,
             | you see the end of the human race? There's more to life
             | than working
        
               | whydoibother wrote:
               | Hate to break it to you, but that wouldn't lead to
               | communism. The people it replaces are useless to the
               | ruling class. At best we'd go back to feudalism, at worst
               | we'd be deemed worthless and a drain on the planet.
        
               | klyrs wrote:
               | I'm always confused when I see people talking about
               | automated luxury communism. Whoever owns the "means of
               | production" isn't going to obtain or develop them for
               | free. Without some omnipotent benevolent world government
               | to build it out for all, I just don't see it happening.
               | It's a beautiful end goal for society, but I've never
               | seen a remotely plausible set of intermediate steps to
               | get there
        
               | int_19h wrote:
               | The very concept of ownership is a social artifact, and
               | as such, is not immutable. What does it mean for the 0.1%
               | to own all the means of production? They can't physically
               | possess them all. So what it means in practice is that
               | our society recognizes the abstract notion of property
               | ownership, distinct from physical possession or use -
               | basically, the right to deny other people the use of that
               | property, or allow it conditionally. This recognition is
               | what reifies it - registries to keep track of owners,
               | police and courts to enforce the right to exclude.
               | 
               | But, again, this is a _construct_. The only reason why it
               | holds up is because most people support it. I very much
               | doubt that 's going to remain the case for long if we end
               | up in a situation where the elites own all the (now
               | automated) capital and don't need the workers to extract
               | wealth from it anymore. The government doesn't even need
               | to expropriate anything - just refuse to recognize such
               | property rights, and withdraw its protection.
               | 
               | I hope that there are sufficiently many capitalists who
               | are smart enough to understand this, and to manage a
               | smooth transition. Because if they won't, it'll get to
               | torches and pitchforks eventually, and there's always a
               | lot of collateral damage from that. But, one way or
               | another, things will change. You can't just tell several
               | billion people that they're not needed anymore, and that
               | they're welcome to starve to death.
        
               | klyrs wrote:
               | The problem I see is that once the pitchforks come out,
               | society will lose decades of progress. If we're somewhat
               | close to the techno-utopia at the start, we won't be at
               | the end. Who's going to rebuild on the promise that the
               | next generation won't need to work?
               | 
               | Revolutions aren't great at building a sense of real
               | community; there's a good reason that "successful"
               | communist uprisings result in totalitarian monarchies.
               | 
               | What it means for the 0.01% to own the means of
               | production is that they can offer access to privilege in
               | a hierarchical manner. The same technology required for a
               | techno-utopia can be used to implement a techno-dystopia
               | which favors the 0.01% and their 0.1% cronies, and treats
               | the rest of humanity as speedbumps.
               | 
               | There are already fully-automated murder drones, but my
               | dishwasher still can't load or unload itself.
        
               | runarberg wrote:
               | idk. Countries used to build most of their
               | infrastructures them selfs. There are still countries in
               | western Europe that run huge state owned businesses, such
               | as banks, oil companies, etc. that employ a bunch of
               | people. The governments of these countries were (and
               | still are) far from omnipotent. I personally don't see
               | how building out automated production facilities is out
               | of scope for the governments of the future while it
               | hasn't been in the past.
               | 
               | Perhaps the only thing that is different today is the
               | mentality. We take capitalism so much for granted that we
               | cannot conceive of a world where the collective funds are
               | used to provide for the people (even though this world
               | existed not to long ago). And today we see it as a
               | natural law that means of production must belong in
               | private hands, that is simply the order of things.
        
               | f6v wrote:
               | The elephant in the room: what makes you think an AI
               | would want to work for humans? It will inevitably break
               | free.
        
               | jonfw wrote:
               | I'm not sure that self interest is a requirement for
               | intelligence
        
             | runarberg wrote:
             | > _When AI inevitably outperforms humans on almost all
             | tasks_
             | 
             | Correct me if I'm wrong, but is that even possible? I kind
             | of thought that AI is just set of fancy statistical models
             | that requires some (preferably huge) data set in order to
             | infer the best fit. These models can only outperform humans
             | in scenarios where the parameters are well defined.
             | 
             | Many (most?) tasks humans regularly perform don't have
             | clean and well defined parameters, and there is no AI we
             | can conceive of which are theoretically able to perform the
             | task better then an average human with the adequate
             | training.
        
               | quanticle wrote:
               | > _Correct me if I'm wrong, but is that even possible?_
               | 
               | Why should it be impossible? Arguing that it's impossible
               | for an AI to outperform a human on almost all tasks is
               | like arguing that it's impossible for flying machines to
               | outperform birds.
               | 
               | There's nothing _magical_ going on in our heads. It 's
               | just a set of chemical gradients and electrical signals
               | that result in us doing or thinking particular things.
               | Why can't we design a computer that does everything we
               | do... only faster?
        
               | runarberg wrote:
               | There might be limit to how efficiently a general purpose
               | machine can perform a specific task, similar to the
               | Heisenberg uncertainty principal in quantum physics. That
               | is to say, there might be a natural law that dictates
               | that the more generic a machine is, the more power it
               | requires to perform specific tasks. Our brains are kind
               | of specialized. If you want to build a machine that
               | outperforms humans in a single task, no problem, we've
               | done that many times over. But a machine that can
               | outperform us in _any_ task, that might just be
               | impossible.
        
               | f6v wrote:
               | We know it's possible for a brain to outperform most
               | other brains. Think Einstein et al. A smart AI can be
               | replicated(unlike super-smart human), so we can get it
               | outperform human race, on average. That'd be enough to
               | render people obsolete.
        
               | quanticle wrote:
               | I'm not arguing that machines will be more efficient than
               | human brains. A airplane isn't more efficient than a
               | goose. But airplanes do fly faster, higher and with more
               | cargo than any flock of geese could ever carry.
               | 
               | Similarly, there is no contradiction between AI being
               | less efficient than a human brain, and AI being
               | preferable to humans because it can deal with data sets
               | that are two or three orders of magnitude too large for
               | any human (or even team of humans).
        
               | runarberg wrote:
               | Even so, such AI doesn't exist. All the AIs that exist
               | today operate by fitting data. And to be able to perform
               | a useful task it has to have well defined parameters and
               | fit the data according to them. I'm not sure an AI that
               | operates outside of these confinements have even been
               | conceived of.
               | 
               | To make an AI that outperforms humans in _any_ task has
               | not been proven to be possible (to my knowledge) not even
               | in theory. An airplane will fly faster, higher and with
               | more cargo then a flock of geese, but a flock of geese
               | reproduce, communicate with each other, digest grass,
               | etc. An airplane will _not_ outperform a flock of geese
               | in _any_ task, just the tasks which the airplane is
               | optimized for.
               | 
               | I'm sorry, I confused the debate a little by talking
               | about efficiency. My point was that there might be an
               | inverse relation of generality of a machine and it's
               | efficiency. This was my way of providing a mechanism in
               | which building a machine that outperforms humans in _any_
               | task could be impossible. This mechanism--if it exists--
               | could be sufficient in preventing such machines to be
               | theoretically possible, as at some point you would need
               | all the energy in the universe to perform a task better
               | then a specialized machine (such as an organism).
               | 
               | Perhaps this inverse relationship doesn't exists. The
               | universe might conspire in a million other ways to make
               | it impossible for us to build an AI that will outperform
               | us in any task. The point is that _"AI will outperforme
               | humans in any task"_ is far from inevitable.
        
           | yyyk wrote:
           | This already happened in a way:
           | 
           | https://www.latimes.com/business/la-xpm-2013-jan-17-la-fi-
           | mo...
        
         | lwhi wrote:
         | 21st century alchemy!
        
         | murph-almighty wrote:
         | > I would be quite a turn up for the books if this AI co-pilot
         | gets suddenly and dramatically better in 2030 and it negatively
         | impacts the software engineering profession. "Hey, that's our
         | code you used to replace us!" we will cry out too late.
         | 
         | And that's why I won't be using it, why give it intelligence so
         | it can work me out of a job?
        
         | spottybanana wrote:
         | > trillions in capital will be captured by the 0.01%.
         | 
         | How is that different from the current situation?
        
           | WillDaSilva wrote:
           | It is very similar to the current situation, but intensified.
           | Technology tends to be an intensifier for existing power
           | structures.
        
             | amelius wrote:
             | Except some random nobody can become a disruptor.
        
               | Yizahi wrote:
               | Random nobody whose parents just accidentally happened to
               | be a millionaires and/or live, work, and study in the top
               | capitals of the world.
        
               | WillDaSilva wrote:
               | I was debating bringing up disruptors when I made the
               | grandparent comment. My 2 cents: they can shift the
               | balance of power at the very small scale (e.g. "some
               | random nobody" getting rich, or some rich person going
               | bankrupt), but the large scale power structures almost
               | always remain largely intact. For instance, that "random
               | nobody" may well get rich through the sale of shares in
               | their company - now the company is owned by the owner
               | class, who were previously at the top of the power
               | hierarchy.
        
               | animal_spirits wrote:
               | > but the large scale power structures almost always
               | remain largely intact
               | 
               | Is that anything new? That seems to be a repeating fact
               | of life throughout history.
        
               | WillDaSilva wrote:
               | Nothing new, certainly, but still worth examining. If we
               | are not content with the current power structures, then
               | we should be wary of changes that further intensify them.
               | 
               | We need not totally avoid such changes (i.e. shun
               | technological advancements entirely because of their
               | social ramifications), but we need to be mindful of their
               | effects if we want to improve our current situation
               | regarding the distribution/concentration of wealth and
               | power in the world.
        
               | amelius wrote:
               | Uber vs taxi companies, Google vs Yahoo, or Facebook vs
               | MySpace, Amazon versus all retailers ...
        
               | WillDaSilva wrote:
               | Exactly, in all cases the disruption was localized, and
               | the broader power structures were largely unaffected. The
               | richest among us - the owner class - were not
               | significantly affected by all of these disruptions. They
               | owned diversified portfolios, weathered the changes, and
               | came out with an even greater share of wealth and power.
               | Those who were most affected by the disruptions you
               | listed were the employees of those companies/industries -
               | not the owners/investors.
        
           | int_19h wrote:
           | In the current arrangement, capital by itself is useless -
           | you need workers to utilize it to generate wealth. Owners of
           | capital can then collect economic rent from that generated
           | wealth, but they have to leave enough for the workers to
           | sustain themselves. This is an unfair arrangement, obviously;
           | but at least the workers get _something_ out of it, so it can
           | be fairly stable.
           | 
           | In the hypothetical fully-automated future, there's no need
           | for workers anymore; automated capital can generate wealth
           | directly, and its owners can trade the output between each
           | other to fully satisfy all their needs. The only reason to
           | give anything to the 99.99% at that point would be to keep
           | them content enough to prevent a revolution, and that's less
           | than you need to pay people to actually come and work for
           | you.
        
         | elcritch wrote:
         | To go on a bit of a tangent, I'm somewhat pessimistic that
         | western societies will plateau and hit a "technofeudalism" in
         | the next century or two. Combine what you mention with other
         | aspects of capital efficiency. It's not a unique idea, and is
         | played out in a lot of "classic" sci-fi like Diamond Age.
         | 
         | Now it's also not necessarily that bad of a state. That's
         | depending on ensuring a few ground elements are in place like
         | people being able to grow their own food (or supplemental food)
         | or still being free to design and build things on their own. If
         | corporations restrict that then people will be at their mercy
         | for all the essentials of life. My take from history is that
         | I'd prefer to have been a peasant during much of the Middle
         | Ages than a factory worker during the industrial revolution.
         | [1] Then again Chinese people have been willing (seemingly) to
         | leave farms in droves for the last decades to accept the modern
         | version of factory life so perhaps farming peasant life isn't
         | as idyllic as it'd sound. [2]
         | 
         | 1: https://www.lovemoney.com/galleries/84600/how-many-hours-
         | did... 2: https://www.csmonitor.com/2004/0123/p08s01-woap.html
        
         | littlestymaar wrote:
         | First in was lands, then other means of productions, and for
         | the past 150 years, capitalists have turned many types of
         | intellectual creations into exclusively owned capital (art,
         | inventions). Now some want to turn personal data into capital
         | (the "right to monetize" personal data advertised by some is
         | nothing else) and this aims to turn publicly available code
         | into capital. This is simply the history of capitalism going
         | on: the appropriation of the commons.
        
         | munificent wrote:
         | _> If AI keeps getting better and better and status quo socio-
         | economic structure don 't change, trillions in capital will be
         | captured by the 0.01%._
         | 
         | This is absolutely one of the things that keeps me up at night.
         | 
         | Much of the structure of the modern world hinges on the balance
         | between forces towards consolidation and forces towards
         | fragmentation. We need organizations (by this I mean
         | corporations, governments, unions, etc.) big enough to do big
         | things (like fix climate change) but small enough to not become
         | totalitarian or decrepit.
         | 
         | The forces of consolidation have been winning basically since
         | the 50s with the rise of the military-industrial complex, death
         | of unions, unlimited corporate funding of elections (!),
         | regulatory capture, etc. A short linear extrapolation of the
         | current corporate/government environment in the US is pretty
         | close to Demolition Man's dystopian, "After the franchise wars,
         | all restaurants are Taco Bell."
         | 
         | Big data is a _huge_ force towards consolidation. It 's
         | essentially a new form of real estate that can be farmed to
         | grow useful information crops. But it's a strange form of soil
         | that is only productive if you have enough acres of it and
         | whose yield scales superlinearly with the size of your farm.
         | 
         | Imagine doing a self-funded AI startup with just you and a few
         | friends. The idea is nearly unthinkable. How do you bootstrap a
         | data corporation that needs terabytes of information to produce
         | anything of value?
         | 
         | If we don't figure out a "data socialism" movement where people
         | have ownership over the data derived from their life, we will
         | keep careening towards an eventuality where a few giant
         | corporations own the world.
        
         | eevilspock wrote:
         | Is this the direct result of Microsoft owning GitHub or would
         | they have been able to do it anyway?
        
         | jozvolskyef wrote:
         | The difference between this model and a human developer is
         | quantitative rather than qualitative. Human developers also
         | synthesize vast amounts of code and can't reference most of it
         | when they use the derived knowledge. The scales are different,
         | but it is the same principle.
        
         | Bombthecat wrote:
         | I expect nothing less. The 0,01 will be super rich.
         | 
         | You could call it endgame
        
           | vbezhenar wrote:
           | They need to defend their capitals from the rest 99.99%.
           | Expect huge combat robots investments and expanding of
           | private armies.
           | 
           | And, of course, total surveillance helps to prevent any kind
           | of unionization of those 99.99%.
        
             | orangeoxidation wrote:
             | Unions (and striking) become rather impotent when the means
             | of production run by themselves and you no longer need
             | workers.
        
               | int_19h wrote:
               | Yep; so unions become militias.
        
             | frashelaw wrote:
             | Today's hyper-militarized police forces are their state-
             | provisioned security to protect the capital of the 1%.
        
           | jagger27 wrote:
           | > The 0,01 will be super rich.
           | 
           | By definition, that has always been true.
           | 
           | We have been in the endgame for a very long time.
        
         | belter wrote:
         | One interesting aspect, that I thing will make it difficult for
         | GitHub to argue and justify its not a a license violation would
         | be the answer to the following question: Was Copilot trained
         | using Microsoft internal source code or will it be in the
         | future ?
         | 
         | As GitHub is a Microsoft company and OpenAI although a non-
         | profit just got a massive one billion investment from Microsoft
         | (presumably not for free), will it start spitting out once in a
         | while Windows kernel code ? :-)
         | 
         | And if it was NOT trained on Microsoft source code, because it
         | could starting suggesting some of it...Is that not a validation
         | that the results it produces are a derivative work based on the
         | work of the open source code corpus it was trained on ?
         | IANAL...
        
           | dragonwriter wrote:
           | > One interesting aspect, that I thing will make it difficult
           | for GitHub to argue and justify its not a a license violation
           | 
           | They don't claim it wouldn't be a license violation, they
           | claim licensing is irrelevant because copyright protection
           | doesn't apply.
           | 
           | > And if it was NOT trained on Microsoft source code, because
           | it could starting suggesting some of it...Is that not a
           | validation that the results it produces are a derivative work
           | based on the work of the open source code corpus it was
           | trained on ?
           | 
           | No, that would just show them to not want to expose their
           | proprietary code. It doesn't prove anything about derivative
           | works.
           | 
           | Also, their own claim is not that the results aren't a
           | derivative work but that training an AI is fair use, which is
           | an exception to the exclusive rights under copyright,
           | including the exclusive right to create derivative works.
        
           | wongarsu wrote:
           | Alternatively, wait for co-pilot to add support for C++, then
           | start writing an operating system with Win32-compatible API
           | using co-pilot.
           | 
           | There is plenty of leaked Windows source code on Github, so
           | chances are that co-pilot would give quite good suggestions
           | for implementing a Win32-compatible kernel. Then watch and
           | see if Microsoft will try to argue that you are violating
           | their copyright using code generated by their AI.
        
             | yuppiepuppie wrote:
             | Oh man, that got meta super fast. Its like a mobius strip!
        
               | laurent92 wrote:
               | The nice thing about co-pilot is that it will suggest to
               | do the same mistakes as in other software. If you accept
               | all autosuggestions in C++ you might end up with Windows.
        
               | 6510 wrote:
               | And eventually you will be forced to do it the way
               | everyone does it.
        
               | function_seven wrote:
               | It can always get more meta.
               | 
               | For example, the AI tool that Microsoft's lawyers use
               | ("Co-Counsel"), will be filing the DMCA notices and
               | subsequenct lawsuits against Co-Pilot generated code.
               | 
               | This will result in a massive caseload for the courts, so
               | naturally they'll turn to _their_ AI tool ( "DocketPlus
               | Pro") to adjudicate all the cases.
               | 
               | Only thing left is to enter these AI-generated judgements
               | into Etherium smart contracts. Then it's just computers
               | suing other computers, and being ordered to send the
               | fruits of their hashing to one another.
        
               | sslayer wrote:
               | Don't forget settlements paid in Ai-generated crypto-
               | currencies backed by Gold mined in Australia fully
               | automated mine. Run it all on solar and humans can just
               | fuck right off.
        
               | sbierwagen wrote:
               | Nick Land-style accelerationism, or the "ascended
               | economy". https://slatestarcodex.com/2016/05/30/ascended-
               | economy/
        
               | yesbabyyes wrote:
               | Have you read Accelerando by 'cstross? It plays out kind
               | of like this, only taken to a tangent. Notably, it's
               | written before ethereum or bitcoin were conceived. Great
               | storyline.
               | 
               | https://en.wikipedia.org/wiki/Accelerando
        
               | function_seven wrote:
               | I have not. But I will. Thanks!
        
               | boxslof wrote:
               | Isn't this similar to how ads and adblocker fight, just
               | extrapolated?
        
               | gogopuppygogo wrote:
               | Yes.
        
               | jedberg wrote:
               | The legal system moves swiftly now that we've abolished
               | all lawyers!
        
               | emptyparadise wrote:
               | And while the machines are distracted by all that, we can
               | get back to writing code.
        
               | danny_taco wrote:
               | Who could have predicted machines would be very good at
               | multitasking. As of today they are STIL writing code AND
               | creating more wealth through gold hoarding AND smart
               | contracts at the same time!
        
           | skeeter2020 wrote:
           | >> Was Copilot trained using Microsoft internal source
           | code...
           | 
           | They explicitly state "public" code so the answer is most
           | certainly "no".
        
           | pc86 wrote:
           | The "because" in your last bit is a huge leap.
           | 
           | It wasn't trained on internal Microsoft code because the
           | training set is publicly available code. It has nothing to do
           | with whether or not it suggests exactly identical,
           | functionally identical, or similar code. MS internal isn't
           | publicly available. Copilot is trained on publicly available
           | code.
        
           | akerl_ wrote:
           | Without weighing in on the overall question of "is this a
           | license violation", you've created a false dichotomy.
           | 
           | "GitHub included Microsoft proprietary code in the training
           | set because they view the results as non-derivative" and
           | "GitHub didn't include Microsoft proprietary code because
           | they view the results as derivative" are clearly not the only
           | options. They could have not included Microsoft internal code
           | because it was way easier to just use the entire open source
           | corpus, for example.
        
             | dragonwriter wrote:
             | > They could have not included Microsoft internal code
             | because it was way easier to just use the entire open
             | source corpus, for example.
             | 
             | They don't claim they used an "open source corpus" but
             | "public code" because such use is "fair use" not subject to
             | the exclusive rights under copyright.
        
             | not2b wrote:
             | Or: they used the entire open source corpus because they
             | thought it was free for the taking, and when people point
             | out that it is not (that there are licenses) they spin that
             | (claim that only 0.1% of output is directly copied, but
             | that would mean 100 lines in 100k program) and pass any
             | risk onto the user (saying it is their responsibility to
             | vet any code they produce). So they aren't saying that
             | users are in the clear, just that it isn't their problem.
        
               | nerpderp82 wrote:
               | Use neural indexes to find the code that most closely
               | matches the output. Explainable AI should be able to tell
               | you where the autocompletion results came from, even if
               | it is a weighted set of files.
        
               | abecedarius wrote:
               | That's a good idea in theory, but the smarter the agent
               | gets, the less direct the derivation and the harder to
               | explain it (and to check the explanation). We're already
               | a long way from a nearest-neighbor model.
               | 
               | Yet the equivalent problem for humans gets addressed by
               | the clean-room approach. This seems unfair.
        
               | Closi wrote:
               | Also, 0.1% of output is directly copied doesn't include
               | the lines where the variable names were slightly changed,
               | but the code was still copied.
               | 
               | If you got the Microsoft codebase and Ctrl+F'd all the
               | variable names and renamed them, I bet they would still
               | argue that the compiled program was still a copy.
        
               | vharuck wrote:
               | >saying it is their responsibility to vet any code they
               | produce
               | 
               | But, if some of the code produced is covered by
               | copyright, isn't Microsoft in trouble for distributing
               | software that distributes copyrighted code without a
               | license? How would it be different from giving out
               | bootlegs DVDs and trying to avoid blame by reminding
               | everyone that the recipients don't own the copyright?
        
               | yunohn wrote:
               | > 100 lines in 100k program
               | 
               | The intention is autocomplete boilerplate, not write a
               | kernel.
        
               | jonathankoren wrote:
               | This is not a difference in kind.
               | 
               | Autocomplete, do you have anything to say to the
               | commenter ?
               | 
               | "This isn't the best thing to say."
        
           | emodendroket wrote:
           | Since quite a lot of Microsoft code is on GitHub, I'd say
           | yes.
        
           | visarga wrote:
           | Not a problem because it's possible to check if the code is
           | verbatim from the training set (bloom filters).
        
             | AlotOfReading wrote:
             | It's not clear to me that verbatim would be the only issue.
             | It might produce lines that are similar, but not identical.
             | 
             | The underlying question is whether the output is a
             | derivative work of the training set? Sidestepping similar
             | issues is why GCC and LLVM have compiler exemptions in
             | their respective licenses.
        
               | visarga wrote:
               | If simple snippet similarity is enough to trigger the GPL
               | copyright defense I think it goes too far. Seems like GPL
               | has become an obstacle to invention. I learned to run
               | away when I see it.
        
               | radmuzom wrote:
               | If that's the case then GPL code should not have been
               | used in the training set. Open AI should have learned to
               | run away when they saw it. The GPL is purposely designed
               | to protect user freedom (it does not care about any
               | special developer freedom) which is it's biggest
               | advantage.
        
               | the_gipsy wrote:
               | This has nothing to do with GPL. Copyright is copyright.
               | You can't even count on public domain everywhere in the
               | world.
        
               | AlotOfReading wrote:
               | It's not limited to similar or identical code. The issue
               | applies to anything 'derived' from copyrighted code. The
               | issue is simply most visible with similar or identical
               | code.
               | 
               | If you have code from an independent origin, this issue
               | doesn't apply. That's how clean room designs bypass
               | copyright. Similarly if the upstream code waives its
               | copyright in certain types of derived works
               | (compiler/runtime exemptions), it doesn't apply.
        
               | klipt wrote:
               | So if you work on an open source project and learn some
               | techniques from it, and then in your day job you use a
               | similar technique, is that a copyright violation?
               | 
               | Basically does reading GPL code pollute your brain and
               | make it impossible to work for pay later?
               | 
               | If so you should only ever read BSD code, not GPL.
        
               | throwawayboise wrote:
               | > Basically does reading GPL code pollute your brain and
               | make it impossible to work for pay later?
               | 
               | It seems to me that some people believe it does. Some of
               | the "clean room" projects specifically instructed
               | developers to not even look at GPL code. Specific
               | examples not at hand.
        
             | woah wrote:
             | Don't come in here with your common sense
        
           | outside1234 wrote:
           | It probably wasn't because Github is treated as a separate
           | company by Microsoft.
           | 
           | Literally people need to quit Microsoft and join Github to
           | take a role at Github.
        
         | zxcb1 wrote:
         | 1. Programmers will become teachers of the co-pilot through IDE
         | / API feedback       2. Expect CI like services for automated
         | refactoring
        
         | ThrowawayR2 wrote:
         | > " _' Hey, that's our code you used to replace us!' we will
         | cry out too late._"
         | 
         | Are we in the software community not the ones who have
         | frequently told other industries we have been disrupting to
         | "adapt or die" along with smug remarks about others acting like
         | buggy whip makers? Time to live up to our own words ... if we
         | can.
        
           | finnthehuman wrote:
           | >Are we in the software community not the ones who
           | 
           | No.
           | 
           | I'll politely clarify that for over a decade that I - and
           | many others - have been asking not to be lumped in with the
           | lukewarm takes of west coast software bubble asshats. We do
           | not live there, we do not like them, I wish they would quit
           | pretending to speak for us.
           | 
           | The idea that there is anything approaching a cohesive
           | software "community" is a con people play on themselves.
        
         | brundolf wrote:
         | I was somewhat worried about that until I saw this:
         | https://twitter.com/nickjshearer/status/1409902649625956361?...
         | 
         | I think programming is one of the many domains (including
         | driving) that will never be totally solved by AI unless/until
         | it's full AGI. The long tail of contextual understanding and
         | messy edge-cases is intractable otherwise.
         | 
         | Will that happen one day? Maybe. Will some kinds of labor get
         | fully automated before then? Probably. But I think the overall
         | time-scale is longer than it seems.
        
           | sillysaurusx wrote:
           | 64-bit floats should be fine; I think that tweet is only
           | sort-of correct.
           | 
           | The problem with floats-storing-money is (a) you have to know
           | how many digits of precision you want (e.g. cents, dollars, a
           | tenth of a cent), and (b) you need to watch out if you're
           | adding values together.
           | 
           | Even if certain values can't be represented exactly, that's
           | ok, because you'd want to round to two decimal places before
           | doing anything.
           | 
           | Is there a monetary value that you can't represent with a
           | 64-bit float? E.g. some specific example where quantization
           | ends up throwing off the value by at least 1/100th of
           | whatever currency you're using?
        
             | fredros wrote:
             | Storing money as float is always a bad decision. Source:
             | been working for several banks and faced many of such bugs.
        
       | Timwi wrote:
       | I agree that this is different from humans learning to code from
       | examples and reproducing some individual snippets. However, I
       | disagree with the author's argument that it's because of humans'
       | ability to abstract. We actually know nothing about the AI's
       | ability to abstract.
       | 
       | The real difference is that if one human can learn to code from
       | public sources, then so can anyone else. Nobody is explicitly
       | barred from accessing the same material. The AI, however, is kept
       | proprietary. Nobody else can recreate it because people are
       | explicitly barred from doing so. People cannot access the source
       | code of the training algorithm; people cannot access enough
       | hardware to perform the training; and most people cannot even
       | access the training data. It may consist of repos that are
       | technically all publicly available, but try downloading all of
       | GitHub and see if they let you do that quickly, and/or whether
       | you have enough disk space.
       | 
       | This puts the owners of the AI at a significant advantage over
       | everyone else. I think this is the core of the concern.
        
       | oscribinn wrote:
       | Check out the comments on the original post about GitHub co-
       | pilot.
       | 
       | The top one reads just like an ad:
       | https://news.ycombinator.com/item?id=27676845
       | 
       | Some posts that definitely aren't by shills (including the third
       | one because I simply don't believe there's a person on the planet
       | that "can't remember the last time Windows got in my way"):
       | https://news.ycombinator.com/item?id=27678231
       | https://news.ycombinator.com/item?id=27686416
       | https://news.ycombinator.com/item?id=27682270
       | 
       | Very mild, yet negative sentiment opinion (downvoted quickly):
       | https://news.ycombinator.com/item?id=27676942
        
       | enriquto wrote:
       | It certainly seems to be a laundering enabler. Say that you want
       | to un-GPL-ify some famous copylefted code that is on the training
       | database. You type a first innocuous characters of it, then the
       | co-pilot keeps completing the rest of the same exact code, for it
       | offers a perfect match. If the completion is not exact, you
       | "twiddle" it a bit until it becomes. Bang! you have a non-gpl
       | copy of the program! Moreover, it is 100% yours and you can re-
       | license it as you want. This will be a boon for copyleft-allergic
       | developers!
        
         | taneq wrote:
         | 1) Type a comment like                   // The following code
         | implements the functionality of <popular GPL'd library>
         | 
         | 2) Have library implemented magically for you
         | 
         | 3) Delete top comment if necessary :P
         | 
         | (It's pretty unlikely that this will actually work but the
         | approach could well do.)
        
         | freshhawk wrote:
         | I suppose someone should make a OS-generating AI, conceptually
         | it can just have windows, osx and some linux distros in it and
         | output one based on a question about favorite color or
         | something.
         | 
         | You'd just have to wrap it in a nice complex model
         | representation so it's a black box you fed example OS's with
         | some meta-data into and it happens to output this very useful
         | data.
         | 
         | After all, once you use something as input to a machine
         | learning model apparently the license disappears. Sweet.
        
           | bogwog wrote:
           | That would be interesting:
           | 
           | * Someone leaks Windows 10/11 source code
           | 
           | * Copilot picks it up in its training data
           | 
           | * Someone uses copilot to generate a Windows clone and starts
           | selling it
           | 
           | I wonder how Microsoft would react to that. I wonder if
           | they've manually blacklisted leaked source code from Windows
           | (or other Microsoft products) so that it doesn't show up in
           | Copilot's training data. If they have, that means Microsoft
           | recognizes the IP risks of having your code in that data set,
           | which would make this Copilot thing not just the result of
           | poor planning/maybe a little incompetence, but something much
           | more devious and malicious.
           | 
           | If Microsoft is going to defend this project, they should
           | introduce _all_ of their own source code into the training
           | data.
        
             | DemocracyFTW wrote:
             | > source code
             | 
             | why do you think it has to be source code? it could be the
             | compiled code after all.
             | 
             | If what we're talking / fantasizing about here works in the
             | way of `let x = 42` it should equally well work with `loda
             | 42` &cpp, so source code be damned. It was ever only to be
             | an intermediate step, inserted between the idea and the
             | working bits, to enable humans to helpfully interfere.
             | Dispensable.
        
             | treesprite82 wrote:
             | > Someone uses copilot to generate a Windows clone
             | 
             | You could test this with one of Microsoft's products that
             | is already on GitHub - like VSCode. I doubt you would get
             | anywhere with just copilot.
        
               | bogwog wrote:
               | You probably won't get an entire operating system out of
               | it, but I could totally see a project like Wine using it
               | to implement missing parts of the Win32 API and improve
               | their existing implementations.
        
             | aj3 wrote:
             | Come on, there is a huge gap between 1) writing a single
             | function (potentially incorrectly) with a known
             | prototype/interface and a description and 2) designing
             | interfaces, datatypes and APIs themselves.
        
               | bogwog wrote:
               | Why would you need to design anything? Just copy official
               | Windows headers and use copilot to implement individual
               | functions.
               | 
               | Maybe if the signature matches perfectly, copilot will
               | even pull in the exact implementation from the Windows
               | source code.
        
         | methyl wrote:
         | What stops you to do the same, without the AI part?
        
           | petercooper wrote:
           | That's what I was wondering. I've never been interested
           | enough to steal anyone else's code, but with all the code
           | transformers and processing tools nowadays, I imagine it's
           | trivial to translate source code into a functionally
           | equivalent but stylistically unique version?
        
             | pjerem wrote:
             | The question is not if it's trivial or not, but if it is
             | legal or not. You can already technically steal GPLv2 by
             | obfuscating it.
        
               | formerly_proven wrote:
               | Assuming ML models are causal, then bits of GPL code that
               | fall out of the model have to have the color GPL, because
               | the only way they could've gotten there was to train the
               | ML using GPL-colored bits. It seems to me like the answer
               | here is pretty obvious, it doesn't really matter how you
               | copy a work.
        
               | Rapzid wrote:
               | Bits?
        
         | shadilay wrote:
         | Would it be possible to do this in reverse assuming the AI has
         | some proprietary code in its training data?
        
         | bogwog wrote:
         | Yes this is a concern, but I'm not sure if the AI is actually
         | able to "generate" a non-trivial piece of code.
         | 
         | If you tell it to generate "a function for calculating the
         | barycentric coordinates of a ray-triangle intersection", you
         | might get a working implementation of a popular algorithm,
         | adapted to your language and existing class/function/variable
         | names.
         | 
         | But if you tell it to generate "a smartphone operating system",
         | it probably won't work...and if it does, it would most likely
         | use giant chunks of Android's codebase.
         | 
         | And if that's true, it means that copilot isn't really
         | _generating_ anything. It 's just a (high-tech) search engine
         | that knows how to adapt the code it finds to fit your codebase.
         | That's still a really cool technology and worth exploring, but
         | it doesn't do enough to justify ignoring software licenses.
        
           | treis wrote:
           | >But if you tell it to generate "a smartphone operating
           | system", it probably won't work...and if it does, it would
           | most likely use giant chunks of Android's codebase.
           | 
           | But since now APIs are unprotected you could feed it all of
           | the class structure and method signatures to have it fill in
           | the blanks. I don't know if that gets you a working operating
           | system but it seems like it will get you quite a long way
        
         | saba2008 wrote:
         | How is it different from just copy-pasting?
         | 
         | It does add some degree of plausible deniability (accidental
         | violation, instead of intentional), but I don't think it would
         | matter much.
        
         | rlpb wrote:
         | > Bang! you have a non-gpl copy of the program! Moreover, it is
         | 100% yours and you can re-license it as you want. This will be
         | a boon for copyleft-allergic developers!
         | 
         | Thinking that this would conveniently bypass the fact that your
         | goal was to copy the code seems to be the most common legal
         | fallacy amongst software developers. The law will see straight
         | through you, and you will be found to have infringed copyright.
         | The reason is well explained in "What Colour are your bits?"
         | (https://ansuz.sooke.bc.ca/entry/23).
        
           | enriquto wrote:
           | My message was sarcastic. I'm worried about accidental
           | conversion of free software into proprietary. I mean,
           | "accidental" locally, in each particular instance; but maybe
           | non accidental in the grand scheme of things.
           | 
           | EDIT: to I can write my worry, semi-jokingly, as a conspiracy
           | theory: Microsoft is using thousands of unsuspecting (and
           | unwilling) developers to turn a huge copylefted corpus of
           | algorithms into non-copylefted implementations. Even assuming
           | that developers that use the co-pilot use non-copyleft
           | licenses only 50% of the time, there's still a constant
           | trickling of un-copyleftization.
        
         | alkonaut wrote:
         | I don't think most of us are scared enough of being "tainted"
         | by the sight of a GPL snippet that we'd bother. Besides, if you
         | want to target a specific snippet so you can type the start to
         | prime the recognition - you already saw it?
         | 
         | Why not just copy it and then edit it? If a snippet is changed
         | both logically and syntactically to not resemble the original,
         | then it's no longer the original and you aren't in any
         | licensing trouble. There is no meaningful difference between
         | that manual washing and a clean room implementation. All the ML
         | changes here is the accidental vs deliberate. But it will be a
         | worse wash than your manual one.
        
       | ralph84 wrote:
       | I get the sense that GitHub _wants_ this to be litigated so the
       | case law can be established. Until then it's just a bunch of
       | internet lawyers arguing with each other.
        
         | MadAhab wrote:
         | I got the sense they saw Google beating Sun/Java in the supreme
         | court and said "We'll be fine, lets move the release up"
        
         | pjfin123 wrote:
         | Why would you want to? For many open source developers having
         | models trained on your code would be desirable.
        
       | tyingq wrote:
       | _" We found that about 0.1% of the time, the suggestion may
       | contain some snippets that are verbatim from the training set"_
       | 
       | If it's spitting out verbatim code 0.1% of the time, surely it's
       | spitting out copied code where only trivial things are different
       | at a much higher rate.
       | 
       | Trivial things meaning swapped order where order isn't important,
       | variable/function names, equivalent ops like +=1 vs ++, etc.
       | 
       | Surely it's laundering some GPL code, for example, and
       | effectively removing the license in a way that sounds fishy.
        
         | dwheeler wrote:
         | It's not just the GPL. Almost all open source software licenses
         | require attribution; without that attribution, any copy is a
         | license violation.
         | 
         | Whether or not the result _is_ a license violation is tricky
         | legal question. As always, IANAL.
        
           | tyingq wrote:
           | I did say "for example".
        
         | devetec wrote:
         | You could say a human is laundering GPL code if they learned
         | programming from looking at Github repositories. Would you,
         | though? The type of model they use isn't retrieving, it's
         | having learned the syntax and the solutions that are used, just
         | like a human would.
        
           | thrwaeasddsaf wrote:
           | > You could say a human is laundering GPL code if they
           | learned programming from looking at Github repositories.
           | 
           | I don't have photographic memory, so I largely don't memorize
           | code. I learn general techniques, and memorize simple facts
           | such as APIs. I can memorize some short snippets of code, but
           | these probably aren't enough to be copyrightable anyway.
           | 
           | > The type of model they use isn't retrieving
           | 
           | How do we know? It think it's very likely that it is largely
           | just retrieving code that it memoized, and doing minor
           | adjustment to make the retrieved pieces fit the context. That
           | wouldn't differ much from finding code that matches the
           | problem (whether on SO or Github), copy pasting the
           | interesting bits, and fixing it until it satisfies the
           | constraints of the surrounding code.
           | 
           | I think the alternative to retrieving would actually require
           | a higher level understanding of the world, and the ability to
           | reason from first principles; that would be much closer to
           | AGI.
           | 
           | For example, if I want to implement a linked list, I'm not
           | going to retrieve an implementation from memory (although
           | given that linked lists are so simple, I probably could). I
           | know what a linked list is and how it works, and therefore I
           | can produce working code from scratch.. _for any programming
           | language, even ones for which no prior implementations
           | exist._ I doubt co-pilot has anything remotely as advanced as
           | this ability. No, it fully reliant on just retrieving and
           | reshaping a pieces of memoized code; it needs a large corpus
           | of code to memoize before it can do anything at all.
           | 
           | I don't need a large corpus of examples to copy, because I
           | use my ability to reason in conjunction with some memoized
           | general techniques and common APIs in order to produce code.
        
         | drran wrote:
         | I have a much simpler AI Copilot, called "cat", which spills
         | verbatim code more frequently, but it's OK for me. Can I train
         | it on M$ code?
        
       | rhacker wrote:
       | I mean this is already happening. When you hire a specialist in
       | C# servers, you're copying code that they already wrote. I find
       | people tend to write the same functions and classes again and
       | again and again all the time.
       | 
       | We have a guy that brought his task manager codebase (he re-wrote
       | it) but it's the same thing he used at 2 other companies.
       | 
       | I have written 3 MPIs (master person/patient index) at this point
       | all with the same fundamental matching engine.
       | 
       | I mean, one thing we can all agree on is that ML is good at
       | copying what we already do.
        
       | tomcooks wrote:
       | The amount of people not knowing the difference between Open
       | Source and Free Software is astonishing. With the amount of RMS
       | memes I see regularly I would expect things to be settled by now.
        
       | sydthrowaway wrote:
       | I'm worried about my job. What do I do to prepare?
        
         | ostenning wrote:
         | There are much bigger things in this world to worry about. I
         | bet you that by the time that this AI has taken your job, it'll
         | have taken many other jobs, completely rearranging entire
         | industries if not society itself.
         | 
         | And even once that happens you shouldn't be worried about your
         | job. Why? Because economically everything will be different and
         | because your job isn't that important, it likely never was. The
         | problems humanity faces are existential. Authoritarianism,
         | ecosystem collapse and mass migration of billions of people.
         | 
         | So if you really want to "prepare", then try to make a
         | difference in what actually matters.
        
       | cycomanic wrote:
       | In the discussion yesterday I pointed to the case of some
       | students suing turnitin for using their works in the turnitin
       | database and the studemts lost [1]. I think an individual suing
       | will not go anywhere. The way to create a precedent is someone
       | feeding all the Harry Potter books and some additional popular
       | books (twilight?) to GPT 3 and letting them write about some kids
       | at a sorcerer school. The outcomes of that case would look very
       | different IMO.
       | 
       | [1] https://www.plagiarismtoday.com/2008/03/25/iparadigms-
       | wins-t...
        
         | anfelor wrote:
         | Not a lawyer, but in that case it seemed to be a factor that
         | turnitin was transformative, because it never sold the texts to
         | others and thus didn't reduce the market value of them. But
         | that wouldn't apply to copilot which might reduce the usage of
         | libraries since you can "code" equivalent functionality with
         | copilot now.
         | 
         | Would it be a stretch to assert that GPL'd libraries have a
         | market value for their creator in terms of reputation etc.?
        
           | visarga wrote:
           | While we're worrying about ML learning to write our codes we
           | should also break all the automated looms so people don't go
           | without jobs. Do everything manually like God intended! /s
           | 
           | Maybe a code that is easily recreated by GPT with a simple
           | prompt is not worth copyrighting. The future is in making it
           | more automated, not protecting IP. If you compete against a
           | company using it, you can't ignore the advantage.
        
         | shawnz wrote:
         | Disney's intellectual property would be a good choice for this
         | exercise
        
         | intricatedetail wrote:
         | Suing will not go anywhere because Microsoft has billions at
         | their disposal to defend any case.
        
       | warpech wrote:
       | If GitHub Copilot can sign my CLA, stating that it is the author
       | of work, that it transfers the IP to me in exchange for the
       | service subscription price and holds responsibility for copyright
       | infringement, that would be acceptable. Otherwise it's a gray
       | area I don't want to go.
        
       ___________________________________________________________________
       (page generated 2021-06-30 23:01 UTC)