[HN Gopher] GitHub Copilot, with "public code" blocked, emits my...
       ___________________________________________________________________
        
       GitHub Copilot, with "public code" blocked, emits my copyrighted
       code
        
       Author : davidgerard
       Score  : 262 points
       Date   : 2022-10-16 19:33 UTC (3 hours ago)
        
 (HTM) web link (twitter.com)
 (TXT) w3m dump (twitter.com)
        
       | [deleted]
        
       | naillo wrote:
       | It's interesting if the consequence of this will be people open
       | sourcing things way less. Would give another layer of irony to
       | openais name.
        
       | D13Fd wrote:
       | If the code is public, my guess is that someone else stole it and
       | added it to an open source repo without authorization. Microsoft
       | may have then picked it up from there.
        
         | SrslyJosh wrote:
         | This just means that if you use Copilot for work, you're
         | exposing yourself and/or your employer to unknown legal
         | liability. =)
        
           | teaearlgraycold wrote:
           | Not sure how you'd get caught if your code is kept private.
        
       | jen20 wrote:
       | Not entirely sure how this could happen! "naikrovek" assured me
       | not three days ago on this very site that I was "detached from
       | reality" [1] for thinking that this would happen again.
       | 
       | To be fair I thought it might be at least a week or two.
       | 
       | [1]: https://news.ycombinator.com/item?id=33194643
        
       | an1sotropy wrote:
       | This is a huge and looming legal problem. I wonder if what should
       | be a big uproar about it is muted by the widespread
       | acceptance/approval of github and related products, in which case
       | its a nice example of how monopolies damage communities.
        
         | jeroenhd wrote:
         | I think it won't become a legal problem until Copilot steals
         | code from a leaked repository (i.e. the Windows XP source code)
         | and that code gets reused in public.
         | 
         | Only then will we see an answer to the question "is making an
         | AI write your stolen code a viable excuse".
         | 
         | I very much approve of the idea of Copilot as long as the
         | copied code is annotated with the right license. I understand
         | this is a difficult challenge but just because this is
         | difficult doesn't mean such a requirement should become
         | optional; rather, it should encourage companies to fix their
         | questionable IP problems before releasing these products into
         | the wild, especially if they do so in exchange for payment.
        
       | jijji wrote:
       | if you're posting your code publicly on the web, its hard to get
       | upset that people are seeing/using it
        
         | betaby wrote:
         | It's under the very specific license though. With your logic
         | it's OK to train AI on a leaked Windows code then. It is/was
         | publicly on the web.
        
           | galleywest200 wrote:
           | While I agree with the first portion of your rebuttal, the
           | second portion makes no sense as leaked code is not "you"
           | putting it on the internet. It would be a nefarious actor
           | doing so.
        
           | lerpgame wrote:
           | code licenses will be irrelevant in a few years if you are
           | able to refactor anything you want using ai.
        
             | betaby wrote:
             | Unless lawyers from the music industry step in.
        
           | themoonisachees wrote:
           | Yes? Its production code that is supposed to work (keyword
           | supposed). I'd like the code-suggestor to also be trained on
           | AAA games source leaks.
        
         | qu4z-2 wrote:
         | Can I introduce you to a concept called copyright?
         | 
         | Is it fine if an author publishes a short story publicly on the
         | web for someone else to submit it to a contest as their own
         | work?
        
         | Hamuko wrote:
         | https://nedroidcomics.tumblr.com/post/41879001445/the-intern...
        
       | deepspace wrote:
       | This shows how copyright is all screwed up. Let's say the code in
       | question is based on a published algorithm, maybe Yuster and
       | Zwick, (I did not check).
       | 
       | What exactly gives Davis a better claim to the copyright than the
       | inventors of the algorithm? Yes, I know software is copyrightable
       | while algorithms are not, but it is not at all clear to my why
       | that should be the case. The effort of translating an algorithm
       | into code is trivial compared to designing the algorithm in the
       | first place, no?
        
         | clnq wrote:
         | To be honest, it would probably benefit all of humanity if we
         | stopped rewriting the same code to then fix the same bugs in
         | it, and instead just used each other's algorithms to do
         | meaningful work.
         | 
         | I work for a large tech company whose lawyers definitely care
         | that my code doesn't train an AI model somewhere much more than
         | I do. On the contrary, I would really like to open source all
         | of my work - it would make it more impactful and would
         | demonstrate my skills. It makes me a bit sad that my life's
         | work is going to be behind lock and key, visible to relatively
         | few people. Not to mention that the hundreds of thousands of
         | work hours, energy and effort that will be spent to replicate
         | it all over my industry in all other lock-and-key companies
         | makes the industry as a whole tremendously inefficient.
         | 
         | I hope that AI models like Copilot will finally show to the
         | very litigious tech companies that their intellectual property
         | has been all over the public domain from the start. And we can
         | get over a lot of the petty algorithm IP suits that probably
         | hold back all tech in aggregate. We should all be working
         | together, not racing against each other in the pursuit of
         | shareholder value.
         | 
         | Historically, mathematicians used to keep their solutions
         | secret in the interest of employment in the middle ages. So
         | there used to be mathematicians that could, for example, solve
         | certain quadratic equations but it took centuries before all
         | humanity could not benefit from this knowledge. I believe this
         | is what is happening with algorithms now. And it is very
         | counter-progress in my opinion.
        
         | heavyset_go wrote:
         | You can patent an algorithm if you want to protect it.
        
           | jeroenhd wrote:
           | (in some countries)
        
       | jonnycomputer wrote:
       | Microsoft should just train it on all their proprietary code
       | instead. See how sanguine they are about it then.
        
         | jacooper wrote:
         | They avoided answering this question at all costs.
         | 
         | Because it exposes their direct hypocrisy in this, its fair use
         | for OSS but not for us.
         | 
         | Questions here are very important, and its no surprise GitHub
         | avoided answering anything about CoPilot's legality:
         | 
         | https://sfconservancy.org/GiveUpGitHub/
        
         | naikrovek wrote:
         | who said they haven't.
         | 
         | for something to show up verbatim in the output of a textual AI
         | model it needs to be an input many times.
         | 
         | I wonder if the problem is not copilot, but many people using
         | this person's code without license or credit, and copilot being
         | trained on those pieces of code as well. copilot may just be
         | exposing a problem rather than creating one.
         | 
         | I don't know much about AI, and I don't use copilot.
        
           | make3 wrote:
           | there's exactly no way they have
        
           | akudha wrote:
           | With the amount of resources that Microsoft has, how hard can
           | it be for them to exclude proprietary code that other people
           | have stolen? I'd bet it is easy for them, but they won't do
           | it. Because they don't care, because who is gonna take on
           | them?
           | 
           | Will they "accidentally" include proprietary code from say,
           | Oracle? Nope. They'll make sure of it. But Joe Random? Sure
        
           | belorn wrote:
           | Microsoft have a public statement that they don't use
           | proprietary code, only public code with public licenses. They
           | have a lot of companies as customers who uses github, and
           | they also use a lot third-party code in their own products.
        
             | stefan_ wrote:
             | Even BSD et. al. have attribution requirements - that must
             | be a vanishingly small amount of code to be used. Me thinks
             | the people who run GitHub (who have apparently decided to
             | abandon the core business for the latest fun project)
             | aren't being entirely upfront.
        
         | eyelidlessness wrote:
         | As a thought experiment: what do we all suppose would be the
         | impact to Microsoft if they deliberately made public the
         | proprietary source code for all of their publicly available
         | commercial products and efforts (including licensed software,
         | services; excluding private contracts, research), but the rest
         | of their intellectual property and trade secrets remained
         | private?
         | 
         | Since I'm posing the question, here's my guess:
         | 
         | - Their stock would take at least a short term hit because it's
         | an unconventional and uncharacteristic move
         | 
         | - The code would reveal more about their strategic interests to
         | competitors than they'd like, but probably nothing revelatory
         | 
         | - It might confirm or reinforce some negative perceptions of
         | their business practices
         | 
         | - It might dispel some too
         | 
         | - It may reduce some competitive advantage amongst enormous
         | businesses, and may elevate some very large firms to potential
         | competitors
         | 
         | - It would provide little to no new advantage to smaller
         | players who aren't already in reach of competing with them
         | and/or don't have the resources to capitalize on access to the
         | code
         | 
         | - It would probably significantly improve public perception of
         | the company and its future intentions, at least among
         | developers and the broader tech community
         | 
         | In other words, a wash. Overall business impact would be
         | roughly neutral. The code has more strategic than technical
         | value, there are few who could leverage the technical value
         | that is any kind of revenue center with growth potential. Any
         | disadvantage would be negated by the public image goodwill it
         | generated.
         | 
         | Maybe my take is naive though! Maybe it would really hurt
         | Microsoft long term if suddenly everyone can fork Windows 11,
         | or steal ideas for their idiosyncratic office suite, or get
         | really clever about how to get funded to go head to head with
         | Azure armed with code everyone else can access too.
        
           | 8note wrote:
           | If they released all the source, I'd be able to run the nice
           | drawing app from windows inkspaces again, unkilling the app
           | they want dead
        
           | mccorrinall wrote:
           | If they'd open source their software I wouldn't have to wait
           | two months till they finally release the pdbs for the kernel
           | after every 2XH1 / 2XH2 update.
           | 
           | It's so annoying that they are sooooo slow at this and we
           | have to keep our users from upgrading after every release.
        
       | thorum wrote:
       | What might be going on here is that Copilot pulls code it thinks
       | may be relevant from other files in your VS Code project. You
       | have the original code open in another tab, so Copilot uses it as
       | a reference example. For evidence, see the generated comment
       | under the Copilot completion: "Compare this snippet from
       | Untitled-1.cpp" - that's the AI accidentally repeating the prompt
       | it was given by Copilot.
        
         | ianbutler wrote:
         | I just tested it myself, and I most certainly do not have his
         | source open, and it reproduced his code verbatim with just the
         | function header in a random test c file I created in the middle
         | of a rust project I'm working on.
        
           | thorum wrote:
           | Ah ok.
        
         | naikrovek wrote:
        
         | stefan_ wrote:
         | Seems other people tried it?
         | https://twitter.com/larrygritz/status/1581713252144517120
        
       | zaps wrote:
       | Drunk conspiracy theory: Nat knew Copilot would be a complete
       | nightmare and bailed.
        
       | [deleted]
        
       | colesantiago wrote:
       | Github Copilot is not AI at all, it is just a dumb code
       | regurgitator that just sells you code you wrote on GitHub and
       | takes all the credit for it shamelessly.
        
         | davidgerard wrote:
         | it's totally AI, in the "legal responsibility laundering"
         | sense. This is the main present day use case for saying "AI".
        
         | Jevon23 wrote:
         | Hopefully you understand how artists feel about DALL-E and
         | Midjourney now.
        
           | pessimizer wrote:
           | I like that if you prompt these with specific artists names,
           | they try their best to rip those particular artists off.
        
         | lolinder wrote:
         | I use copilot in my work every day, but only in places where I
         | know the code cannot be regurgitated because what I'm doing has
         | never been done before.
         | 
         | I can write an HTML form, then prompt copilot to generate a
         | serializable class that can be used to deserialize that form on
         | the server. I can write a test for one of our internal apis,
         | and for every subsequent test I can just write the name of what
         | I expect it to check and it generates a test that _correctly_
         | uses our internal APIs and verifies the expected behavior.
         | 
         | You can have problems with the ethics of how GitHub and OpenAI
         | produced what they did, but to describe it the way that you did
         | requires never having really attempted to use it seriously.
        
       | ianbutler wrote:
       | I just tested it myself on a random c file I created in the
       | middle of a rust project I'm working on, it reproduced his full
       | code verbatim from just the function header so clearly it does
       | regurgitate proprietary code unlike some people have said, I do
       | not have his source so co-pilot isn't just using existing
       | context.
       | 
       | I've been finding co-pilot really useful but I'll be pausing it
       | for now, and I'm glad I have only been using it on personal
       | projects and not anything for work. This crosses the line in my
       | head from legal ambiguity to legal "yeah that's gonna have to
       | stop".
        
         | naikrovek wrote:
         | what proprietary code? the guy on Twitter is seeing his own GPL
         | code bring reproduced. nothing proprietary there.
         | 
         | do you have the "don't reproduce code verbatim" preference set?
        
           | webstrand wrote:
           | He owns the copyright to the code, and the code is not in the
           | public domain, therefore it is proprietary code.
        
             | yjftsjthsd-h wrote:
             | That's not how anybody uses the word proprietary when
             | dealing with software licensing. It's a term of art that
             | stands in contrast to open source licenses.
        
               | ianbutler wrote:
               | For the record, I don't typically think in terms of the
               | open source community.
               | 
               | I grant that if most people are using it one way here I
               | was likely wrong for the way it is typically used by the
               | normal open source community, I followed up with a reply
               | saying it would likely be more correct for me to have
               | said "improperly licensed" to be included in the training
               | set.
               | 
               | Still it being private means it probably shouldn't be in
               | the training set anyway regardless of license, because in
               | the future, truly proprietary code could be included, or
               | code without any license which reserves all right to the
               | creator.
        
           | ianbutler wrote:
           | Sorry it would likely be more correct to say "improperly
           | licensed" code and not proprietary. Still for someone like
           | me, the possibility of having LGPL, or any GPL licensed code
           | generated in their project is a solid no thanks. I know
           | others may think differently but those are toxic licenses to
           | me.
           | 
           | Not to mention this code wasn't public so it's kind of moot,
           | having someone's private code be generated into my project is
           | very bad.
           | 
           | As to the option, I do not, I wasn't even aware of the
           | option, but it's pretty silly to me that's not on by default,
           | or even really an option. That should probably be enabled
           | with no way to toggle it without editing the extension.
        
         | shadowgovt wrote:
         | Searching for the function names in his libraries, I'm seeing
         | some 32,000 hits.
         | 
         | I suspect he has a different problem which (thanks to
         | Microsoft) is now a problem he has to care about: his code
         | probably shows up in one or more repos copy-pasted with
         | improper LGPL attribution. There'd be no way for Copilot to
         | know that had happened, and it would have mixed in the code.
         | 
         | (As a side note: understanding _why_ an ML engine outputs a
         | particular result is still an open area of research AFAIK.)
        
           | chiefalchemist wrote:
           | I thought the same thing. But then shouldn't CP look at
           | things it's not supposed to use and see if that's happened?
           | How is that any different than you committing your API to
           | Platform X and shortly thereafter Platform X reaches out to
           | you...because GH let them know?
        
           | ianbutler wrote:
           | Yeah that's a mess, but that's way too much legal baggage for
           | me, an otherwise innocent end user, to want to take on.
           | Especially when I personally tend to try and monetize a lot
           | of my work.
           | 
           | I understand there's no way for the model to know, but it's
           | really on Microsoft then to ensure no private, or poorly
           | licensed or proprietary code is included in the training set.
           | That sounds like a very tall order, but I think they're going
           | to have to otherwise they're eventually going to run into
           | legal problems with someone who has enough money to make it
           | hurt for them.
        
             | shadowgovt wrote:
             | Agreed. Silver lining: MS is now heavily incentivized to
             | invest in solutions for an open research problem.
        
           | [deleted]
        
           | enragedcacti wrote:
           | Expanding on that, even if Microsoft sees the error of their
           | ways and retrains copilot against permissively licensed
           | source or with explicit opt-in, it may get trained on
           | proprietary code the old version of copilot inserted into a
           | permissively licensed project.
           | 
           | You would have to just hope that you can take down every
           | instance of your code and keep it down, all while copilot
           | keeps making more instances for the next version to train on
           | and plagiarize.
        
           | [deleted]
        
       | mdaniel wrote:
       | I didn't feel like weighing into that Twitter thread, but in the
       | screenshot one will notice that the code generated by Copilot has
       | secretly(?) swapped the order of the interior parameters to
       | "cs_done". Maybe that's fine, but maybe it's not, how in the
       | world would a Copilot consumer know to watch out for that? Double
       | extra good if a separate prompt for "cs_done" comingles multiple
       | implementations where some care and some don't. Partying ensues!
       | 
       | Not to detract from the well founded licensing discussion, but
       | who is it that finds this madlibs approach useful in coding?
        
       | bmitc wrote:
       | What does
       | 
       | > with "public code" blocked
       | 
       | mean? Are you able set a setting in GitHub to tell GitHub that
       | you don't want your code used for Copilot training data? Is this
       | an abuse of the license you sign with GitHub, or did they update
       | it at some point to allow your code to be automatically used in
       | Copilot? I'm not crazy about the idea of paying GitHub for them
       | to make money off of my code/data.
        
         | galleywest200 wrote:
         | The option to omit "public code" means it should, in theory,
         | omit code that is licensed under such banners as the GPL. It
         | does not mean "omit private repositories".
        
       | [deleted]
        
       | _the_inflator wrote:
       | Well, this can impose a serious risk to companies and their cloud
       | strategy based on GitHub.
       | 
       | Can these enterprises really make sure, that their code won't be
       | used to train Copilot? I am skeptical.
        
       | deworms wrote:
       | It prints this code because you have it open in another editor
       | tab. Wish people who don't know at all how it works stopped
       | acting all outraged when they're laughably wrong.
        
         | yjftsjthsd-h wrote:
         | > It prints this code because you have it open in another
         | editor tab.
         | 
         | People upthread have reproduced and demonstrated that that's
         | not the issue here.
         | 
         | EDIT: Actually, OP says "The variant it produces is not on my
         | machine." -
         | https://twitter.com/DocSparse/status/1581560976398114822
         | 
         | > Wish people who don't know at all how it works stopped acting
         | all outraged when they're laughably wrong.
         | 
         | Physician, heal thyself.
        
         | lupire wrote:
         | Can you link to more info about this? If this is accurate, many
         | people aren't aware.
        
       | Traubenfuchs wrote:
       | What keeps him from suing if he is so sure?
       | 
       | Those pretty little licenses are a waste of storage if no one
       | enforces them.
        
         | SamoyedFurFluff wrote:
         | Money. Suing is often survival of the richest.
        
       | psychphysic wrote:
       | Hot take, AI will steal all our jobs. Get over it.
        
       | kweingar wrote:
       | I've noticed that people tend to disapprove of AI trained on
       | their profession's data, but are usually indifferent or positive
       | about other applications of AI.
       | 
       | For example, I know artists who are vehemently against DALL-E,
       | Stable Diffusion, etc. and regard it as stealing, but they view
       | Copilot and GPT-3 as merely useful tools. I also know software
       | devs who are extremely excited about AI art and GPT-3 but are
       | outraged by Copilot.
       | 
       | For myself, I am skeptical of intellectual property in the first
       | place. I say go for it.
        
         | bcrosby95 wrote:
         | I look at IP differently.
         | 
         | For copyright, the act of me creating something doesn't deprive
         | you of anything, except the ability to consume or use the thing
         | I created. If I were influenced by something, you can still be
         | influenced by that same thing - I do not exhaust any resources
         | I used.
         | 
         | This is wholely different from physical objects. If I create a
         | knife, I deprive you of the ability to make something else from
         | those natural resources. Natural resources that I didn't create
         | - I merely exploited them.
         | 
         | Because of this, I'm fine with copyright (patents are another
         | story). But I have some issues with physical property.
        
         | joecot wrote:
         | > For myself, I am skeptical of intellectual property in the
         | first place. I say go for it.
         | 
         | If we didn't live in a Capitalist society, that would be fair.
         | But we currently do. That Capitalist society cares little about
         | the well being of artists unless it can find a way to make
         | their art profitable. Projects like DALL-E and Midjourney
         | pillage centuries of human art and sell it back to us for a
         | profit, while taking away work from artists who struggle to
         | make ends meet as it is. Software Developers are generally less
         | concerned about Copilot because they're still making 6 figures
         | a year, but they'll start to get concerned if the technology
         | gets smart enough that society needs less Developers.
         | 
         | An automated future _should_ be a good thing. It should mean
         | that computers can take care of most tasks and humans can have
         | more leisure time to relax and pursue their passions. The
         | reason that artists and developers panic over things like this
         | is that they are watching themselves be automated out of
         | existence, and have seen how society treats people who aren 't
         | useful anymore.
        
         | yjftsjthsd-h wrote:
         | I can think of two explanations for that off the top of my
         | head.
         | 
         | The first is that people only recognize the problems with the
         | things that they're familiar with, which you would kind of
         | expect.
         | 
         | The other option is that there's a difference in the thing that
         | people object to. My _impression_ is that artists seem to be
         | reacting to the idea that they could be automated out of a job,
         | where programmers are mostly objecting to blatant copyright
         | violation. (Not universally in either case, but often.) If that
         | is the case, then those are genuinely different arguments made
         | by different people.
        
         | lucideer wrote:
         | I don't know specifically what DALL-E was trained on, but if
         | it's art for which the artists' have not consented to it being
         | used to train AI then that's problematic. I haven't seen any
         | objections to DALL-E _on that basis specifically_ though,
         | whereas all the discussion of Copilot is around the fact that
         | code authorship  & Github accounts are not intrinsically tied
         | together, making it impossible to have code authors consent to
         | their code being used, regardless of what ToS someone's agreed
         | to.
         | 
         | > _For myself, I am skeptical of intellectual property in the
         | first place. I say go for it._
         | 
         | I'm in a similar boat but this is precisely the reason I object
         | so strongly to Copilot. IP has been invented &
         | perpetuated/extended to protect large corporate interests,
         | under the guise of protecting & sustaining innovators &
         | creative individuals. Copilot is a perfect example of large
         | corporate interest ignoring IP _when it suits them_ to exploit
         | individuals.
         | 
         | In other words: the reason I'm skeptical of IP is the same
         | reason I'm skeptical of Copilot.
        
           | __alexs wrote:
           | Stable Diffusion and DallE were both trained on copyrighted
           | content scraped from the internet with no consent from the
           | publishers.
           | 
           | It's quite a common complaint because some of the most
           | popular prompts involve just appending an artist's name to
           | something to get it to copy their style.
        
         | dawnerd wrote:
         | In theory AI should never return an exact copy of a copyrighted
         | work or even anything close enough you could argue is the
         | original "just changed". If the styles are the same I think
         | that's fine, no different than someone else cloning it. But
         | there's definitely outputs from stable diffusion that looks
         | like the original with some weird artifacts.
         | 
         | We need regulation around it.
        
           | XorNot wrote:
           | > there's definitely outputs from stable diffusion that looks
           | like the original with some weird artifacts.
           | 
           | Do you have examples? Because SD will generate photoreal
           | outputs and then get subtle details (hands, faces) wrong, but
           | unless you have the source image in hand then you've no way
           | of knowing whether it's a "source image" or not.
        
           | rtkwe wrote:
           | Code is much easier to do that because the avenues for
           | expression are significantly limited compared to just
           | creating an image. For it to be useful copilot has to produce
           | compiling and reasonably terse and understandable code. The
           | compiler in particular is a big bottle neck to the range of
           | the output.
        
         | ghoward wrote:
         | I am a programmer who has written extensively on my blog and HN
         | against Copilot.
         | 
         | I am also not a hypocrite; I do not like DALL-E or Stable
         | Diffusion either.
         | 
         | As a sibling comment implies, these AI tools give more power to
         | people who control data, i.e., big companies or wealthy people,
         | while at the same time, they take power away from individuals.
         | 
         | Copilot is bad for society. DALL-E and Stable Diffusion are bad
         | for society.
         | 
         | I don't know what the answer is, but I sure wish I had the
         | resources to sue these powerful entities.
        
           | vghfgk1000 wrote:
        
           | akudha wrote:
           | _but I sure wish I had the resources to sue these powerful
           | entities._
           | 
           | I wonder if there is a crowdfunding platform like gofundme,
           | for lawsuits. Or can gofundme itself can be used for this
           | purpose? It would be fantastic to sue the mega polluters,
           | lying media like Fox etc.
           | 
           | That said, even with a lot of money, are these cases
           | winnable? Especially given the current state of Supreme Court
           | and other federal courts?
        
           | williamcotton wrote:
           | I'm a programmer and a songwriter and I am not worried about
           | these tools and I don't think they are bad for society.
           | 
           | What did the photograph do to the portrait artist? What did
           | the recording do to the live musician?
           | 
           | Here's some highfalutin art theory on the matter, from almost
           | a hundred years ago: https://en.wikipedia.org/wiki/The_Work_o
           | f_Art_in_the_Age_of_...
        
             | snarfy wrote:
             | > What did the recording do to the live musician?
             | 
             | The recording destroyed the occupation of being a live
             | musician. People still do it for what amounts to tip money,
             | but it used to be a real job that people could make a
             | living off of. If you had a business and wanted to
             | differentiate it by having music, you had to pay people to
             | play it live. It was the only way.
        
             | __alexs wrote:
             | > What did the photograph do to the portrait artist?
             | 
             | It completely destroyed the jobs of photo realistic
             | portrait artists. You only have stylised portrait painting
             | now and now that is going to be ripped off too.
        
             | SamoyedFurFluff wrote:
             | But this isn't like photography and portrait artistry. This
             | is more like a wealthy person stealing your entire art
             | catalog, laundering it in some fancy way, and then claiming
             | they are the original creator. Stable Diffusion has
             | literally been used to create new art by screenshotting
             | someone's live-streamed art creation process as the seed.
             | While creating derivative work has always been considered
             | art(such as deletion poetry and collage), it's extremely
             | uncommon and blase to never attribute the original(s).
        
               | insanitybit wrote:
               | > This is more like a wealthy person stealing your entire
               | art catalog, laundering it in some fancy way, and then
               | claiming they are the original creator.
               | 
               | If I take a song, cut it up, and sing over it, my release
               | is valid. If I parody your work, that's my work. If you
               | paint a picture of a building and I go to that spot and
               | take a photograph of that building it is my work.
               | 
               | I can derive all sorts of things, things that I own, from
               | things that others have made.
               | 
               | Fair use is a thing: https://www.copyright.gov/fair-use/
               | 
               | As for talking about the originals, would an artist
               | credit every piece of inspiration they have ever
               | encountered over a lifetime? Publishing a seed seems fine
               | as a _nice_ thing to do, but pointing at the billion
               | pictures that went into the drawing seems silly.
        
               | tremon wrote:
               | Fair use is an affirmative defense. Others can still sue
               | you for copying, and you will have to hope a judge agrees
               | with your defense. How do you think Google v. Oracle
               | would have turned out if Google's defense was "no your
               | honor, we didn't copy the Java sources. We just used
               | those sources as input to our creative algorithms, and
               | this is what they independently produced"?
        
             | ghoward wrote:
             | Do you know what's different about the photograph or the
             | recording?
             | 
             |  _They are still their own separate works!_
             | 
             | If a painter paints a person for commission, and then that
             | person also commissions a photographer to take a picture of
             | them, is the photographer infringing on the copyright of
             | the painter? Absolutely not; the works are separate.
             | 
             | If a recording artist records a public domain song that
             | another artist performs live, is the recording artist
             | infringing on the live artist? Heavens, no; the works are
             | separate.
             | 
             | On the other hand, these "AI's" are taking existing works
             | and reusing them.
             | 
             | Say I write a song, and in that song, I use one stanza from
             | the chorus of one of your songs. Verbatim. Would you have a
             | copyright claim against me for that? Of course, you would!
             | 
             | That's what these AI's do; they copy portions and mix them.
             | Sometimes they are not substantial portions. Sometimes,
             | they are, with verbatim comments (code), identical
             | structure (also code), watermarks (images), composition
             | (also images), lyrics (songs), or motifs (also songs).
             | 
             | In the reverse of your painter and photographer example, we
             | saw US courts hand down judgment against an artist who
             | blatantly copied a photograph. [1]
             | 
             | Anyway, that's the difference between the tools of
             | photography (creates a new thing) and sound recording
             | (creates a new thing) versus AI (mixes existing things).
             | 
             | And yes, sound mixing can easily stray into copyright
             | infringement. So can other copying of various copyrightable
             | things. I'm not saying humans don't infringe; I'm saying
             | that AI does _by construction_.
             | 
             | [1]: https://www.reuters.com/world/us/us-supreme-court-
             | hears-argu...
        
               | williamcotton wrote:
               | I'm not sure sure that originality is that different
               | between a human and a neural network. That is to say that
               | what a human artist is doing has always involved a lot of
               | mixing of existing creations. Art needs to have a certain
               | level of familiarity in order to be understood by an
               | audience. I didn't invent 4/4 time or a I-IV-V
               | progression and I certainly wasn't the first person to
               | tackle the rhyme schemes or subject matter of my songs. I
               | wouldn't be surprised if there were fragments from other
               | songs in my lyrics or melodies, either from something I
               | heard a long time ago or perhaps just out of coincidence.
               | There's only so much you can do with a folk song to begin
               | with!
               | 
               | BTW, what happened after the photograph is that there
               | were less portrait artists. And after the recording there
               | were less live musicians. There are certainly no less
               | artists nor musicians, though!
        
               | ghoward wrote:
               | > I'm not sure sure that originality is that different
               | between a human and a neural network. That is to say that
               | what a human artist is doing has always involved a lot of
               | mixing of existing creations.
               | 
               | I disagree, but this is a debate worth having.
               | 
               | This is why I disagree: humans don't copy _just_
               | copyrighted material.
               | 
               | I am in the middle of developing and writing a romance
               | short story. Why? Because my writing has a glaring
               | weakness: characters, and romance stands or falls on
               | characters. It's a good exercise to strengthen that
               | weakness.
               | 
               | Anyway, both of the two people in the (eventual) couple
               | developed from _my real life_ , and not from any
               | copyrighted material. For instance, the man will
               | basically be a less autistic and less selfish version of
               | myself. The woman will basically be the kind of person
               | that annoys me the most in real life: bright, bubbly,
               | always touching people, etc.
               | 
               | There is no copyrighted material I am getting these
               | characters from.
               | 
               | In addition, their situation is not typical of such
               | stories, but it _does_ have connections to my life. They
               | will (eventually) end up in a ballroom dance competition.
               | Why that? So the male character hates it. I hate ballroom
               | dance during a three-week ballroom dancing course in 6th
               | grade, the girls made me hate ballroom dancing. I won 't
               | say how, but they did.
               | 
               | That's the difference between humans and machines:
               | machines can only copyright and mix other copyrightable
               | material; humans can copy _real life_. In other words,
               | machines can only copy a representation; humans can copy
               | the real thing.
               | 
               | Oh, and the other difference is emotion. I've heard that
               | people without the emotional center of their brains can
               | take _six hours_ to choose between blue and black pens.
               | There is something about emotions that drives decision-
               | making, and it 's decision-making that drives art.
               | 
               | When you consider that brain chemistry, which is a
               | function of genetics and previous choices, is a big part
               | of emotions, then it's obvious that those two things,
               | genetics and previous choices, are _also_ inputs to the
               | creative process. Machines don 't have those inputs.
               | 
               | Those are the non-religious reasons why I think humans
               | have more originality than machines, including neural
               | networks.
        
           | c7b wrote:
           | > these AI tools give more power to people who control data,
           | i.e., big companies or wealthy people, while at the same
           | time, they take power away from individuals.
           | 
           | Not sure I agree, but I can at least see the point for
           | Copilot and DALL-E - but Stable Diffusion? It's open source,
           | it runs on (some) home-use laptops. How is that taking away
           | power from indies?
           | 
           | Just look at the sheer number of apps building on or
           | extending SD that were published on HN, and that's probably
           | just the tip of the iceberg. Quite a few of them at least
           | looked like side projects by solo devs.
        
             | ghoward wrote:
             | SD is better than the other two, but it will still
             | centralize control.
             | 
             | I imagine that Disney would take issue with SD if material
             | that Disney owned the copyright to was used in SD. They
             | would sue. SD would have to be taken off the market.
             | 
             | Thus, Disney has the power to ensure that their copyrighted
             | material remains protected from outside interests, and they
             | can still create unique things that bring in audiences.
             | 
             | Any small-time artist that produces something unique will
             | find their material eaten up by SD in time, and then,
             | because of the sheer _number_ of people using SD, that
             | original material will soon have companions that are like
             | it _because they are based on it in some form_. Then, the
             | original won 't be as unique.
             | 
             | Anyone using SD will not, by definition, be creating
             | anything unique.
             | 
             | And when it comes to art, music, photography, and movies,
             | _uniqueness_ is the best selling point; once something is
             | not unique, it becomes worth less because something like it
             | could be gotten somewhere else.
             | 
             | SD still has the power to devalue original work; it just
             | gives normal people that power on top of giving it to the
             | big companies, while the original works of big companies
             | remain safe because of their armies of lawyers.
        
               | c7b wrote:
               | > I imagine that Disney would take issue with SD if
               | material that Disney owned the copyright to was used in
               | SD. They would sue. SD would have to be taken off the
               | market.
               | 
               | Are you sure?
               | 
               | I'm not familiar with the exact data set they used for SD
               | and whether or not Disney art was included, but my
               | understanding is that their claim to legality comes from
               | arguing that the use of images as training data is 'fair
               | use'.
               | 
               | Anyone can use Disney art for their projects as long as
               | it's fair use, so even if they happened to not include
               | Disney art in SD, it doesn't fully validate your point,
               | because they could have done so if they wanted. As long
               | as training constitutes fair use, which I think it should
               | - it's pretty much the AI equivalent of 'looking at
               | others' works', which is part of a human artist's
               | training as well.
        
               | ghoward wrote:
               | > Are you sure?
               | 
               | Yes, I'm sure.
               | 
               | > I'm not familiar with the exact data set they used for
               | SD and whether or not Disney art was included, but my
               | understanding is that their claim to legality comes from
               | arguing that the use of images as training data is 'fair
               | use'.
               | 
               | They could argue that. But since the American court
               | system is currently (almost) de facto "richest wins,"
               | their argument will probably not mean much.
               | 
               | The way to tell if something was in the dataset would be
               | to use the name of a famous Disney character and see what
               | it pulls up. If it's there, then once the Disney beast
               | finds out, I'm sure they'll take issue with it.
               | 
               | And by the way, I don't buy all of the arguments for
               | machine learning as fair use. Sure, for the training
               | itself, yes, but once the model is used by others, you
               | now have a distribution problem.
               | 
               | More in my whitepaper against Copilot at [1].
               | 
               | [1]: https://gavinhoward.com/uploads/copilot.pdf
        
           | cmdialog wrote:
           | Obviously this is a matter of philosophy. I am using Copilot
           | as an assistant, and for that it works out very nicely. It's
           | fancy code completion. I don't know who is trying to use this
           | to write non-trivial code but that's as bad an idea as trying
           | to pass off writing AI "prompts" as a type of engineering.
           | 
           | These things are tools to make more involved things. You're
           | not going to be remembered for all the AI art you prompted
           | into existence, no matter how many "good ones" you manage to
           | generate. No one is going to put you into the Guggenheim for
           | it.
           | 
           | Likewise, programmers aren't going to become more depraved or
           | something by using Copilot. I think that kind of prescriptive
           | purism needs to Go Away Forever, personally.
        
         | bayindirh wrote:
         | I, with my software developer hat, am not excited by AI. Not a
         | bit, honestly. Esp. about these big models trained on huge
         | amount of data, without any consent.
         | 
         | Let me be perfectly clear. I'm all for the tech. The
         | capabilities are nice. The thing I'm _strongly against_ is
         | training these models on any data without any consent.
         | 
         | GPT-3 is OK, training it with public stuff regardless of its
         | license is not.
         | 
         | Copilot is OK, training on with GPL/LGPL licensed code without
         | consent is not.
         | 
         | DALL-E/MidJourney/Stable Diffusion is OK. Training it with non
         | public domain or CC0 images is not.
         | 
         | "We're doing something amazing, hence we need no permission" is
         | ugly to put it very lightly.
         | 
         | I've left GitHub because of CoPilot. Will leave any photo
         | hosting platform if they hint any similar thing with my
         | photography, period.
        
           | psychphysic wrote:
           | I disagree.
           | 
           | Those are effectively cases of cryptomnesia[0]. Part and
           | parcel of learning.
           | 
           | If you don't want broad access your work, don't upload it to
           | a public repository. It's very simple. Good on you for
           | recognising that you don't agree with what GitHub looks at
           | data in public repos, but it's not their problem.
           | 
           | [0] https://en.m.wikipedia.org/wiki/Cryptomnesia
        
             | bayindirh wrote:
             | > Those are effectively cases of cryptomnesia.
             | 
             | Disagree, outputting training data as-is is not
             | cryptomnesia. This is not Copilot's first case. It also
             | reproduced ID software's fast inverse square root function
             | as-is, including its comments, but without its license.
             | 
             | > If you don't want broad access your work, don't upload it
             | to a public repository. It's very simple.
             | 
             | This is actually both funny and absurd. This is why we have
             | licenses at this point. If all the licenses is moot, then
             | this opens a very big can of worms...
             | 
             | My terms are simple. If you derive, share the derivation
             | (xGPL). Copilot is deriving my code. If you use my code as
             | a derivation point, honor the license, mark the derivation
             | with GPL license. This voids your business case? I don't
             | care. These are my terms.
             | 
             | If any public item can be used without any limitations,
             | Getty Images (or any other stock photo business) is
             | illegal. CC licensing shouldn't exist. GPL is moot. Even
             | the most litigious software companies' cases (Oracle, SCO,
             | Microsoft, Adobe, etc.) is moot. Just don't put it on
             | public servers, eh?
             | 
             | Similarly, music and other fine arts are generally publicly
             | accessible. So copyright on any and every production is
             | also invalid as you say, because it's publicly available.
             | 
             | Why not put your case forward with attorneys of Disney, WB,
             | Netflix and others? I'm sure they'll provide all their
             | archives for training your video/image AI. Similarly
             | Microsoft, Adobe, Mathworks, et al. will be thrilled to
             | support your CoPilot competitor with their code, because a)
             | Any similar code will be just cryptomnesia, b) The software
             | produced from that code is publicly accessible anyway.
             | 
             | At this point, I even didn't touch to the fact that humans
             | are trained much more differently than neural networks.
        
         | matheusmoreira wrote:
         | > For myself, I am skeptical of intellectual property in the
         | first place. I say go for it.
         | 
         | Me too. I think copyright and these silly restrictions should
         | be abolished.
         | 
         | At the same time, I can't get over the fact these self-serving
         | corporations are all about "all rights reserved" when it
         | benefits them while at the same time undermining other people's
         | rights. Microsoft absolutely knows that what they're doing is
         | wrong. Recently it was pointed out to me that Microsoft
         | employees can't even look at GPL source code, lest they
         | subcounsciously reproduce it. Yet they think their software can
         | look at other people's code and reproduce it?
        
         | wzdd wrote:
         | This talking point seems to come up often, but since it's
         | basically saying that people are hypocrites I think it is a bad
         | faith thing to say without reasonable proof that it's not a
         | fringe opinion (or completely invented).
         | 
         | For what it's worth, the people I know who are opposed to this
         | sort of "useful tool" don't discriminate by profession.
        
         | teddyh wrote:
         | An accusation of hypocrisy _is not an argument_ ; at least not
         | a relevant one.
        
         | pclmulqdq wrote:
         | I think the distinction is that only one of those classes tends
         | to produce exact copies of work. Programmers get very upset at
         | DALL-E and Stable Diffusion producing exact (and near-exact)
         | copies of artwork too. In contrast to exact copying, production
         | of imitations (not exact copies, but "X in the style of Y") is
         | something that artists have been doing for centuries, and is
         | widely thought of as part of arts education.
         | 
         | For some reason, code seems to lend itself to exact copying by
         | AIs (and also some humans) rather than comprehension and
         | imitation.
        
           | XorNot wrote:
           | I'm mildly suspicious that this example is an implementation
           | of a generic matrix functionality though: you couldn't patent
           | this sort of work, because it's not patentable - it's a
           | mathematics. It's fundamentally a basic operation, that would
           | have to be implemented with a similar structure regardless of
           | how you do it.
        
             | pclmulqdq wrote:
             | Patents and copyrights are totally different, and should be
             | treated as such. The issue isn't about whether someone
             | copies the algorithm, it's whether they copy the written
             | code. Nothing in an algorithms textbook is patentable
             | either, but if you copy the words describing an algorithm
             | from it, you are stealing their description.
        
             | heavyset_go wrote:
             | Mathematics is not patentable, but you can patent the steps
             | a computer takes to compute the results of that particular
             | algorithm.
        
         | ChildOfChaos wrote:
         | I think sadly it's just people being protective, the technology
         | is interesting so if it doesn't hit their line of work, it's
         | fantastic, if it does, then it's terrible.
         | 
         | There is no arguing against it though, you can't stop it, all
         | this stuff is coming eventually to all of these areas, might as
         | well try and find ways to use the oppurutinies while you can
         | while some of this is still new.
        
           | naillo wrote:
           | I mean we definitely _can_ stop it. Laws are a pretty strong
           | deterrent.
        
             | ghaff wrote:
             | "We" maybe can't stop it. But if there were the political
             | will to kneecap many uses of machine learning, it's not
             | obvious there's any reason it _couldn 't_ be done even if
             | not 100% effective. Whether that would be a good thing is a
             | different question.
        
             | faeriechangling wrote:
             | You can slow this, you can't stop it whatsoever. It's about
             | as ultimately futile as an effort as trying to stop piracy.
             | People are ALREADY running salesforce codegen and stable
             | diffusion at home, you can't put the genie back in the
             | bottle, what we'll have 20 years from now is going to make
             | critics of these tools have nightmares.
             | 
             | If you try to outlaw it, the day before the laws come into
             | effect, I'm going to download the very best models out
             | there and run it on my home computer. I'll start organising
             | with other scofflaws and building our own AI projects in
             | the fashion of leelachesszero with donated compute time.
             | 
             | You can shut down the commercial versions of these tools.
             | You can scare large corporations from banning the use of
             | these tools by corporations. You can pull an uno reverse
             | card and use modified versions of the tools to CHECK for
             | copyright infringement and sue people under existing laws
             | AND you'll probably even be able to statistically prove
             | somebody is an AI user. But STOPPING the use of these
             | tools? Go ahead and try, won't happen.
        
               | tablespoon wrote:
               | > You can slow this, you can't stop it whatsoever. It's
               | about as ultimately futile as an effort as trying to stop
               | piracy. ... But STOPPING the use of these tools? Go ahead
               | and try, won't happen.
               | 
               | So? No one needs to _stop it totally_. The world isn 't
               | black and white, pushing it to the fringes is almost
               | certainly a sufficient success.
               | 
               | Outlawing murder hasn't stopped murder, but no one's
               | given up on enforcing those laws because of the futility
               | of perfect success.
               | 
               | > If you try to outlaw it, the day before the laws come
               | into effect, I'm going to download the very best models
               | out there and run it on my home computer. I'll start
               | organising with other scofflaws and building our own AI
               | projects in the fashion of leelachesszero with donated
               | compute time.
               | 
               | That sounds like a cyberpunk fantasy.
        
               | faeriechangling wrote:
               | Cyberpunk sure, but fantasy? Not at all.
        
               | throwaway675309 wrote:
               | You'll never be able to push it to the fringes because
               | there will never be a legal universal agreement even from
               | country to country on where to draw the line.
               | 
               | And as computers get more powerful and the models get
               | more efficient it'll become easier and easier to self
               | host and run them on your own dime. There are already one
               | click installers for generative models such as stable
               | diffusion that run on modest hardware from a few years
               | back.
        
             | tpm wrote:
             | What would the law do? Forbid automatic data collection
             | and/or indexing and further use without explicit copyright
             | holder agreement? That would essentially ban the whole
             | internet as we know it, not saying that would be bad, but
             | this is never going to happen, too much accumulated
             | momentum in the opposite direction.
        
               | chiefalchemist wrote:
               | To your point, the law can do a lot of things. The issue
               | here is the clarity and ability to enforce the law.
        
               | [deleted]
        
         | machinekob wrote:
         | I'm pretty sure DALL-E was trained only on not copyright
         | material ( they say so :| ).
         | 
         | But to be honest if your code is open source im pretty sure
         | Microsoft don't care about licence they'll just use it cause
         | "reasons" same about stable diffusion they don't give a fuk
         | about data if its in internet they'll use it so its topic that
         | probably will be regulated in few years.
         | 
         | Until then lets hope they'll get milked (both Microsoft and
         | NovelAI) for illegal content usage and I srsly hope at least
         | few layers will try milking it asap especially NovelAI which
         | illegally usage a lot of copyrighted art in the training data.
        
           | msbarnett wrote:
           | > I'm pretty sure DALL-E was trained only on not copyright
           | material
           | 
           | Nope. DALL-E generates images with the Getty Watermark, so
           | clearly there's copyrighted materials in its training set: ht
           | tps://www.reddit.com/r/dalle2/comments/xdjinf/its_pretty_o...
        
             | pclmulqdq wrote:
             | Lots of people ironically put the Getty watermark on
             | pictures and memes that they make to satirically imply that
             | they are pulling stock photos off the internet with the
             | printscreen function instead of paying for them.
        
               | msbarnett wrote:
               | Memes generally would not fall under the category of non-
               | copyrighted material; they're most of the time extremely
               | copyrighted material just being used without permission.
               | And even a wholly original work an artist sarcastically
               | puts a Getty watermark and then licensed under Creative
               | Commons or something would fall into very murky territory
               | - the Getty watermark itself is the intellectual property
               | of Getty. The original image author might plead fair use
               | as satire, but satirical intentions aren't really a
               | defence available to DALL-E.
               | 
               | So even if we're assuming these were wholly original
               | works that the author placed under something like a
               | Creative Commons license, the fact that it incorporated
               | an image they had no rights to would at the very least
               | create a fairly tangled copyright situation that any
               | really rigorous evaluation of the copyright status of
               | every image in the training set would tend to argue
               | towards rejecting as not worth the risk of litigation.
               | 
               | But the more likely scenario here is that they did
               | minimal at best filtering of the training set for
               | copyrights.
        
               | pclmulqdq wrote:
               | You could argue that mocking the Getty logo like that is
               | some form of fair use, which would be a backdoor through
               | which it can end up as a legitimate element of a public
               | domain work, in which case it would be fair game.
               | 
               | I agree with you that it is also possible that people
               | posted Getty thumbnails to some sites as though they are
               | public domain, and that is how the AIs learned the
               | watermark.
        
             | nottorp wrote:
             | Dunno about Getty, but I've been shown the cover for
             | Beatles' Yellow Submarine done in different colors as some
             | great AI advancement.
        
             | machinekob wrote:
             | Thanks for posting this out never see that before. If they
             | use copyright images they should also get punished in the
             | original paper they say no copyright content was used but
             | it can be just lies who know data speak for itself and if
             | they can prove this in court they should get punished ( so
             | again Microsoft getting rekt for that will be good to see
             | :] ).
        
         | tpxl wrote:
         | When Joe Rando plays a song from 1640 on a violin he gets a
         | copyright claim on Youtube. When Jane Rando uses devtools to
         | check a website source code she gets sued.
         | 
         | When Microsoft steals all code on their platform and sells it,
         | they get lauded. When "Open" AI steals thousands of copyrighted
         | images and sells them, they get lauded.
         | 
         | I am skeptical of imaginary property myself, but fuck this one
         | set of rules for the poor, another set of rules for the masses.
        
           | lo_zamoyski wrote:
           | The poor are the masses, or at least part of the masses.
        
           | gw99 wrote:
           | If this is the new status quo then I suggest we find out how
           | to fuck up the corpus as best as possible.
        
           | a4isms wrote:
           | > one set of rules for the poor, another set of rules for the
           | masses.
           | 
           |  _Conservatism consists of exactly one proposition, to wit:_
           | 
           |  _There must be in-groups whom the law protects but does not
           | bind, alongside out-groups whom the law binds but does not
           | protect._
           | 
           | --Composer Frank Wilhoit[1]
           | 
           | [1]: https://crookedtimber.org/2018/03/21/liberals-against-
           | progre...
        
             | thrown_22 wrote:
        
             | sbuttgereit wrote:
             | Thanks for posting the link to the quote. Having said that,
             | I don't think it's possible to quote that bit and get an
             | understanding of the idea being conveyed without it's
             | opening context. Indeed, it's likely to cause a false idea
             | of what's being conveyed. From earlier in the same post:
             | 
             |  _" There is no such thing as liberalism -- or
             | progressivism, etc.
             | 
             | There is only conservatism. No other political philosophy
             | actually exists; by the political analogue of Gresham's
             | Law, conservatism has driven every other idea out of
             | circulation."_
        
               | a4isms wrote:
               | I agree that adds considerable depth to the value of the
               | quote, and connects it to the conversation he appeared to
               | be having, which is about the first line you've quoted:
               | 
               | There is no such thing as being a Liberal or Progressive,
               | there is only being a Conservative or anti-Conservative,
               | and while there is much nuance and policy to debate about
               | that, it boils down to deciding whether you actually
               | support or abhor the idea of "the law" (which is a much
               | broader concept than just the legal system) existing to
               | enforce or erase the distinction between in-groups and
               | out-groups.
               | 
               | But that's just my read on it. Getting back to
               | intellectual property, it has become a bitter joke on
               | artists and creatives, who are held up as the
               | beneficiaries of intellectual property laws in theory,
               | but in practice are just as much of an out-group as
               | everyone else.
               | 
               | We are bound by the law--see patent trolls, for example--
               | but not protected by it unless we have pockets deep
               | enough to sue Disney for not paying us.
        
           | stickfigure wrote:
           | Yeah, inequality sucks. So how about we focus on making the
           | world better for everyone instead of making the world equally
           | shitty for everyone?
        
             | imwillofficial wrote:
             | This makes no sense.
             | 
             | Absolutely nobody is arguing to make the world shittier
        
             | zopa wrote:
             | Because we're not the ones with the power. People with
             | limited power pick the fights they might win, not the
             | fights that maximize total welfare for everyone including
             | large copyright holders. There's no moral obligation to be
             | a philosopher king unless you're actually on a throne.
        
           | foobarbecue wrote:
           | > one set of rules for the poor, another set of rules for the
           | masses
           | 
           | Presumably by "the masses" you meant "the large
           | corporations"?
        
           | rtkwe wrote:
           | I think copilot is a clearer copyright violation than any of
           | the stable diffusion projects though because code has a much
           | narrower band of expression than images. It's really easy to
           | look at the output of CoPilot and match it back to the
           | original source and say these are the same. With stable
           | diffusion it's much closer to someone remixing and aping the
           | images than it is reproducing originals.
           | 
           | I haven't been following super closely but I don't know of
           | any claims or examples where input images were recreated to a
           | significant degree by stable diffusion.
        
           | e40 wrote:
           | Preach. So incredibly annoyed when I tried to send a video of
           | my son playing Beethoven to his grandparents and it was taken
           | down due to a copyright violation.
        
           | c7b wrote:
           | > When Joe Rando plays a song from 1640 on a violin he gets a
           | copyright claim on Youtube. When Jane Rando uses devtools to
           | check a website source code she gets sued.
           | 
           | Do you have any evidence for those claims, or anything
           | resembling those examples?
           | 
           | Music copyright has long expired for classical music, and big
           | shots are definitely not exempt from where it applies. Just
           | look at how much heat Ed Sheeran, one of the biggest
           | contemporary pop stars, got for "stealing" a phrase that was
           | literally just chanting "Oh-I" a few times (just to be clear,
           | I am familiar with the case and find it infuriating that this
           | petty rent-seeking attempt went to trial at all, even if
           | Sheeran ended up being completely cleared, but to great
           | personal distress as he said afterwards).
           | 
           | And who ever got sued for using dev tools? Is there even a
           | way to find that out?
        
             | banana_giraffe wrote:
             | https://twitter.com/mpoessel/status/1545178842385489923
             | 
             | Among many others. Classical music may have fallen into
             | public domain, but modern performances of it is
             | copyrightable, and some of the big companies use copyright
             | matching systems, including YouTube's, that often flags new
             | performances as copies of recordings.
        
             | codefreakxff wrote:
             | There have been a number of stories about musicians being
             | copyright claims. Here is the first result on Google
             | 
             | https://www.radioclash.com/archives/2021/05/02/youtuber-
             | gets...
             | 
             | For being sued for looking at source here is the first
             | result on Google
             | 
             | https://www.wired.com/story/missouri-threatens-sue-
             | reporter-...
        
               | frob wrote:
               | Just to be clear, because it's in the title, the reporter
               | was threatened with a lawsuit for looking at source code.
               | I cannot find anyone acually sued for it. BTW, here's an
               | article saying said reporter wasn't sued: https://www.the
               | register.com/AMP/2022/02/15/missouri_html_hac...
               | 
               | Anyone with a mouth can run it and threaten a lawsuit. If
               | fact, I threaten to sue you for misinformation right now
               | unless you correct your post. Fat lot of good my threat
               | will do because no judge in their right mind would
               | entertain said lawsuit because it's baseless.
        
               | c7b wrote:
               | Ok - it is a true shame that the YouTube copyright claim
               | system is so broken as to enable those shady practices,
               | and that politicians still haven't upped their knowledge
               | of the internet beyond a 'series of tubes'.
               | 
               | But surely the answer should be to fix the broken YT
               | system and to educate politicians to abstain from
               | baseless threats, not to make AI researchers pay for it?
        
           | insanitybit wrote:
           | > Joe Rando plays a song from 1640 on a violin he gets a
           | copyright claim on Youtube
           | 
           | That can't possibly be a valid claim, right? AFAIK copyright
           | is "gone" after the original author dies + ~70 years. Before
           | fairly recently it was even shorter. Something from 1640
           | surely can't be claimed under copyright protection. There are
           | much more recent changes where that might not be the case,
           | but 1640?
           | 
           | > When Jane Rando uses devtools to check a website source
           | code she gets sued.
           | 
           | Again, that doesn't sound like a valid suit. Surely she would
           | win? In the few cases I've heard of where suits like this are
           | brought against someone they've easily won them.
        
             | cipherboy wrote:
             | The poster isn't claiming that this is a valid DMCA suit.
             | Nearly everyone who is at a mildly decent level and has
             | posted their own recordings of classical musical to YouTube
             | have received these claims _in their Copyright section_.
             | YouTube itself prefixes this with some lengthy disclaimer
             | about how this isn't the DMCA process but that they reserve
             | the right to kick you off their site based on fraudulent
             | matches made by their algorithms.
             | 
             | They are absolutely completely and utterly bullshit. Nobody
             | with half an ear for music will mistake my playing of
             | Bach's G Minor Sonata with Arthur Grumiaux (too many out of
             | tune notes :-D). But yet, YouTube still manages to match
             | this to my playing, probably because they have never heard
             | it before now (I recorded it mere minutes before).
             | 
             | So no, it isn't a valid claim, but this algorithm trained
             | on certain examples of work, manages to make bad
             | classifications with potentially devastating ramifications
             | for the creator (I'm not a monetized YouTube artist, but if
             | this triggered a complete lockout of my Google account(s),
             | this likely end Very Badly).
             | 
             | I think it's a very relevant comparison to the GP's
             | examples.
        
             | alxlaz wrote:
             | > That can't possibly be a valid claim, right?
             | 
             | It's not, but good luck talking to a human at Youtube when
             | the video gets taken down.
             | 
             | > Again, that doesn't sound like a valid suit. Surely she
             | would win?
             | 
             | Assuming she could afford the lawyer, and that she lives
             | through the stress and occasional mistreatment by the
             | authority, yes, probably. Both are big ifs, though.
        
             | lbotos wrote:
             | > That can't possibly be a valid claim, right?
             | 
             | I'm not a lawyer, but my understanding is that while the
             | "1640's violin composition" _itself_ may be out of
             | copyright, if I record myself playing it, _my recording of
             | that piece is my copyright_. So if you took my file
             | (somehow) and used it without my permission, and I could
             | prove it, I could claim copyright infringement.
             | 
             | That's my understanding, and I've personally operated that
             | way to avoid any issues since it errs on the side of
             | safety. (Want to use old music, make sure the license of
             | the recording explicitly says public domain or has license
             | info)
        
               | vghfgk1000 wrote:
        
               | insanitybit wrote:
               | Yes, that sounds right to me. But that's not relevant to
               | "Joe Whoever played it and got sued".
        
               | lupire wrote:
               | The problem is that YouTube AI thinks your recording is
               | the same as every other recording, because it doesn't
               | understand the difference between composition and
               | recording.
        
             | Rimintil wrote:
             | > That can't possibly be a valid claim, right?
             | 
             | It has indeed happened.
             | 
             | https://boingboing.net/2018/09/05/mozart-bach-sorta-
             | mach.htm...
             | 
             | Sony later withdrew their copyright claim.
             | 
             | There are two pieces to copyright when it comes to public
             | domain:
             | 
             | * The work (song) itself -- can't copyright that
             | 
             | * The recording -- you are the copyright owner. No one,
             | without your permission, can re-post your recording
             | 
             | And of course, there is derivative work. You own any
             | portion that is derivative of the original work.
        
               | insanitybit wrote:
               | > Sony later withdrew their copyright claim.
               | 
               | Right, that's my point... I can sue anyone for anything,
               | doesn't mean I'll win.
        
               | imwillofficial wrote:
               | It worked out justified in this case.
               | 
               | The VAST majority of cases it does not.
        
               | sumedh wrote:
               | > I can sue anyone for anything, doesn't mean I'll win.
               | 
               | You cant sue if you dont have money, a big corp can sue
               | even if they know they are wrong.
        
             | pessimizer wrote:
             | > Again, that doesn't sound like a valid suit. Surely she
             | would win? In the few cases I've heard of where suits like
             | this are brought against someone they've easily won them.
             | 
             | That's freedom of speech for everyone who can afford a
             | lawyer to bring suit against a music rights-management
             | company.
        
               | insanitybit wrote:
               | Yes, this is a problem with the legal system in general.
        
             | kevin_thibedeau wrote:
             | The songwriter copyright is expired but there is still a
             | freshly minted copyright on the video and the audio
             | performance.
             | 
             | This becomes particularly onerous when trolls claim
             | copyright on published recordings of environmental sounds
             | that happen to be similar but not identical to someone
             | else's but they do have a legitimate claim on the original
             | recording.
        
             | Rodeoclash wrote:
             | This isn't a legal copyright claim, it's a "YouTube"
             | copyright claim which is entirely owned and enforced by
             | YouTube.
        
               | insanitybit wrote:
               | OK but then we're just talking about content moderation,
               | which seems like a separate issue. I think using "YouTube
               | copyright claim" as a proxy for "legal copyright claim"
               | is more to the parent's point, especially since that's
               | how YouTube purports the claim to work. Otherwise it
               | feels irrelevant.
        
               | cipherboy wrote:
               | Copyright claims are a form of content moderation, by
               | preventing reuse of content that others own.
               | 
               | But it can still be weaponized to prevent legitimate
               | resubmissions of parallel works, that can potentially
               | deplatform legitimate users, depending on the reviewer
               | and the clarity of the rebuttal.
        
               | lupire wrote:
               | YouTube does this moderation in order to avoid legal
               | pressure from copyright holders, as in
               | 
               | https://en.m.wikipedia.org/wiki/Viacom_International_Inc.
               | _v.....
        
           | cyanydeez wrote:
           | Basically, copyright is for people with copyright lawyers
        
             | kodah wrote:
             | That's not even a joke. One of the premises of a copyright
             | is that you defend your intellectual property or lose it.
             | If the system were more equitable then it would defend your
             | copyright.
        
               | heavyset_go wrote:
               | You're thinking of trademarks.
        
               | eropple wrote:
               | This is an inaccurate description of copyright, at least
               | in the United States.
               | 
               | Trademarks require active defense to avoid
               | genericization. Copyright may be asserted at the holder's
               | discretion.
        
         | heavyset_go wrote:
         | Your post is a good example of the _tu quoque_ fallacy[1].
         | 
         | [1] https://en.wikipedia.org/wiki/Tu_quoque
        
         | tablespoon wrote:
         | > I've noticed that people tend to disapprove of AI trained on
         | their profession's data, but are usually indifferent or
         | positive about other applications of AI.
         | 
         | In other words: the banal observation that people care far more
         | when their stuff is stolen than when some stranger has their
         | stuff stolen.
        
       | lerpgame wrote:
        
       | deworms wrote:
       | As an asie, this code is an unreadable mess, for a guy
       | brandishing his credentials even in his github username you'd
       | think he'd know a thing or two about clean code.
        
         | stonogo wrote:
         | Feel free to send patches.
        
           | deworms wrote:
           | Why would I waste time doing this?
        
             | kortilla wrote:
             | Because you were already willing to waste time panning his
             | code on a public forum. Maybe do something constructive
             | instead of destructive if your time is so precious.
        
       | [deleted]
        
       | ahmedbaracat wrote:
       | " AI-focused products/startups lack a business model aligning the
       | incentives of both the company and the domain experts (Data
       | Dignity)"
       | 
       | https://blog.barac.at/a-business-experiment-in-data-dignity
       | 
       | Yes I am quoting myself
        
       | faeriechangling wrote:
       | Not your repo, not your code.
       | 
       | I celebrate Microsofts shameless plundering of Github to create
       | new products that increase productivity. The incredible thing is
       | that people trusted Microsoft to use their code on their terms to
       | begin with. This is a company who has been finding ways to make
       | open source code into a proprietary product since the 90s.
       | 
       | Nobody can stop people from replicating what Microsoft did in the
       | long run anyways. Eventually any consumer with enough access to
       | source code will be able to make their own copilot. Even if
       | copilot is criminalised Microsoft can just sell access to the
       | entire GitHub dataset and let other people commit the "crime".
       | Then you're right back where we started with having to sue the
       | end users of copilot for infringement instead of Microsoft.
       | 
       | Use private repos or face the inevitability that copilot-like
       | products will scrape your code.
        
       | ilrwbwrkhv wrote:
       | Of course it does. What are you going to do? Sue them?
        
       | ralph84 wrote:
       | Ok. So instead of whining about it on Twitter sue GitHub. No
       | matter what you think of Copilot, establishing some case law on
       | AI-generated code will be beneficial to everyone.
        
         | mjr00 wrote:
         | Whining about it on Twitter = free and easy
         | 
         | Suing Github = signing up for a ~decade long incredibly
         | expensive and time-consuming legal battle against one of the
         | richest companies in the world
         | 
         | There may be a slight difference in effort between these two
         | options.
        
           | anonydsfsfs wrote:
           | Not to mention Microsoft could countersue using their
           | enormous patent war chest, which they have a history of
           | doing[0]
           | 
           | [0] https://techcrunch.com/2012/03/22/microsoft-and-tivo-
           | drop-th...
        
         | ghaff wrote:
         | It goes beyond code. Also photos, art, text, etc. Be careful
         | what you wish for. Whether you like it or not, with a stroke of
         | a pen Congress or the Supreme Court in the US could probably
         | wipe out the legal use of a huge amount of the training data
         | used for ML.
        
           | adastra22 wrote:
           | Good.
        
           | Jevon23 wrote:
           | Good! Large corporations shouldn't be able to profit off of
           | other people's data without consent or compensation.
        
             | drstewart wrote:
             | Great! I assume you believe all search engines should be
             | illegal then?
        
               | belorn wrote:
               | Accessing a computer system without permission is
               | illegal. Search engines operate under the assumption that
               | they have permission to access any public available
               | server unless explicitly forbidden.
               | 
               | If a company or person assume they got copyright
               | permission to any work public accessible then they will
               | quickly find out that such assumption is wrong, and that
               | they require explicit permission.
        
               | ghaff wrote:
               | >Search engines operate under the assumption that they
               | have permission to access any public available server
               | unless explicitly forbidden.
               | 
               | And why should opt-out be a reasonable norm? To be clear,
               | the internet (among many other things) breaks down if
               | every exchange of information is opt-in. Sharing of
               | photographs taken in public places is another example.
               | But the internet basically functions because people share
               | information on an opt-out basis (that may or may not even
               | be respected).
        
               | ghoward wrote:
               | Search engines don't sell the information of others; they
               | sell certain _metadata_ of that information, namely, the
               | _location_ of that information.
        
               | ghaff wrote:
               | And excerpts of that information in many cases.
        
       | res0nat0r wrote:
       | The repo he linked to on twitter is a public repo though. Am I
       | missing something?
       | 
       | https://twitter.com/DocSparse/status/1581462433335762944
        
         | tpxl wrote:
         | Public != copyright free.
        
         | taspeotis wrote:
         | > The repo he linked to on twitter is a public repo though. Am
         | I missing something?
         | 
         | I dunno the title says it used public code when it was meant to
         | block public code.
        
         | kurtoid wrote:
         | I think they're more concerned about it repeating code w/o
         | ownership/copyright labels
        
       | Waterluvian wrote:
       | I think people may be drastically over-valuing their code. If it
       | was emitting an entire meaningful product, that would be
       | something else. But it's emitting nuts and bolts.
       | 
       | If the issue is more specifically copyright infringement, then
       | leverage the legal apparatus in place for that. Their lawyers
       | might listen better.
       | 
       | This is not a strongly held opinion and if you disagree I would
       | love to hear your constructive thoughts!
        
         | jacooper wrote:
         | I mean it starts like this, but if Copilot gets a pass,
         | companies might just use AI as a way to launder code and avoid
         | complying with Free licenses.
        
         | chiefalchemist wrote:
         | To some extent I agree with your opening. That is, plenty of
         | cases CP is showing how mundane most code is. It's one
         | commodity stitched to another stitched to another.
         | 
         | That's not considering any legal / license issues, just a
         | simple statement about the data used to train CP.
        
       | mjr00 wrote:
       | Same issue with Stable Diffusion/NovelAI and certain people's
       | artwork (eg Greg Rutkowski) being obviously used as part of the
       | training set. More noticeable in Copilot since the output needs
       | to be a lot more precise.
       | 
       | Lawmakers need to jump on this stuff ASAP. Some say that it's no
       | different from a person looking at existing code or art and
       | recreating it from memory or using it as inspiration. But the law
       | changes when technology gets involved already, anyway. There's no
       | law against you and I having a conversation, but I may not be
       | able to record it depending on the jurisdiction. Similarly,
       | there's no law against you looking at artwork that I post online,
       | but it's not out of question that a law could exist preventing
       | you from using it as part of an ML training dataset.
        
         | SrslyJosh wrote:
         | > Some say that it's no different from a person looking at
         | existing code or art and recreating it from memory or using it
         | as inspiration.
         | 
         | Hah, no, the model encodes the code that it was trained on.
         | This is not "recreating from memory", this is "making a copy of
         | the code in a different format." (Modulo some variable
         | renaming, which it's probably programmed to do to in order to
         | obscure the source of the code.)
        
       | CapsAdmin wrote:
       | I would imagine the root problem here is people taking
       | copyrighted code, pasting it in their project and disregarding
       | the license. To me this seems common, especially when it comes to
       | toy, test and hobby projects.
       | 
       | I don't see how copilot or similar tools can solve this problem
       | without vetting each project.
        
         | yjftsjthsd-h wrote:
         | That's an entirely plausible explanation, but it doesn't mean
         | that Microsoft has any less of a legal nightmare on their
         | hands.
        
           | CapsAdmin wrote:
           | I'm not really sure what I think about this. How responsible
           | should Microsoft be for someone's badly licensed code on
           | their platform? If they somehow had the ability to ban
           | projects using stolen snippets of code, I don't think I'd
           | dare to host my hobby projects there.
           | 
           | If you can't trust that the code in a project is compatible
           | with the license of the project then the only option I see is
           | that copilot cannot exist.
           | 
           | I love free software and whatnot, but I have a feeling this
           | situation would've been quite different if copilot was made
           | by the free software community and accidentally trained on
           | some non free code..
        
             | yjftsjthsd-h wrote:
             | > I love free software and whatnot, but I have a feeling
             | this situation would've been quite different if copilot was
             | made by the free software community and accidentally
             | trained on some non free code..
             | 
             |  _Precisely._ Would it be okay for me to publish some code
             | as GPL because my buddy gave it to me and promised that it
             | was totally legit and I could use it and it definitely wasn
             | 't copy-pasted from one of the Windows source leaks?
             | 
             | > If you can't trust that the code in a project is
             | compatible with the license of the project then the only
             | option I see is that copilot cannot exist.
             | 
             | It might be possible to feed it only manually-vetted
             | inputs, but yes; as it currently is, Copilot appears to be
             | little but a massive copyright-infringement engine.
        
               | CapsAdmin wrote:
               | > Precisely. Would it be okay for me to publish some code
               | as GPL because my buddy gave it to me and promised that
               | it was totally legit and I could use it and it definitely
               | wasn't copy-pasted from one of the Windows source leaks?
               | 
               | But where do you draw the line? What if you accidentally
               | came up with the same or similar solution to something in
               | windows? The code might not be from your friend either,
               | it could be from N steps of copy paste, rework,
               | reformating, refactoring, etc.
        
               | yjftsjthsd-h wrote:
               | > But where do you draw the line? What if you
               | accidentally came up with the same or similar solution to
               | something in windows?
               | 
               | Yes, I agree that it's unclear how to deal with that in
               | the general case at scale. Although cases like OP make me
               | think that we could maybe worry about the grey area after
               | we've dealt with the blatant copies.
               | 
               | > The code might not be from your friend either, it could
               | be from N steps of copy paste, rework, reformating,
               | refactoring, etc.
               | 
               | Well, my personal tendency would be to apply the same
               | standard to Microsoft that they would apply to us. How
               | many steps of removal is needed to copy MS proprietary
               | code and it be okay?
        
         | [deleted]
        
       | williamcotton wrote:
       | Is the code in question even covered by copyright in the first
       | place? It seems utilitarian in nature.
       | 
       | Oh, the comments! Those are covered by copyright for sure.
        
         | williamcotton wrote:
         | You know, I make it a habit of not trying to get upset by
         | downvotes but this is really absurd. What am I saying that is
         | incorrect? Am I being rude? What exactly do you disagree with?
        
           | williamcotton wrote:
           | Like, should I just stop interacting with people on this
           | website? Is that the intent? To make me just go away?
        
       ___________________________________________________________________
       (page generated 2022-10-16 23:00 UTC)