[HN Gopher] GitHub Copilot, with "public code" blocked, emits my... ___________________________________________________________________ GitHub Copilot, with "public code" blocked, emits my copyrighted code Author : davidgerard Score : 262 points Date : 2022-10-16 19:33 UTC (3 hours ago) (HTM) web link (twitter.com) (TXT) w3m dump (twitter.com) | [deleted] | naillo wrote: | It's interesting if the consequence of this will be people open | sourcing things way less. Would give another layer of irony to | openais name. | D13Fd wrote: | If the code is public, my guess is that someone else stole it and | added it to an open source repo without authorization. Microsoft | may have then picked it up from there. | SrslyJosh wrote: | This just means that if you use Copilot for work, you're | exposing yourself and/or your employer to unknown legal | liability. =) | teaearlgraycold wrote: | Not sure how you'd get caught if your code is kept private. | jen20 wrote: | Not entirely sure how this could happen! "naikrovek" assured me | not three days ago on this very site that I was "detached from | reality" [1] for thinking that this would happen again. | | To be fair I thought it might be at least a week or two. | | [1]: https://news.ycombinator.com/item?id=33194643 | an1sotropy wrote: | This is a huge and looming legal problem. I wonder if what should | be a big uproar about it is muted by the widespread | acceptance/approval of github and related products, in which case | its a nice example of how monopolies damage communities. | jeroenhd wrote: | I think it won't become a legal problem until Copilot steals | code from a leaked repository (i.e. the Windows XP source code) | and that code gets reused in public. | | Only then will we see an answer to the question "is making an | AI write your stolen code a viable excuse". | | I very much approve of the idea of Copilot as long as the | copied code is annotated with the right license. I understand | this is a difficult challenge but just because this is | difficult doesn't mean such a requirement should become | optional; rather, it should encourage companies to fix their | questionable IP problems before releasing these products into | the wild, especially if they do so in exchange for payment. | jijji wrote: | if you're posting your code publicly on the web, its hard to get | upset that people are seeing/using it | betaby wrote: | It's under the very specific license though. With your logic | it's OK to train AI on a leaked Windows code then. It is/was | publicly on the web. | galleywest200 wrote: | While I agree with the first portion of your rebuttal, the | second portion makes no sense as leaked code is not "you" | putting it on the internet. It would be a nefarious actor | doing so. | lerpgame wrote: | code licenses will be irrelevant in a few years if you are | able to refactor anything you want using ai. | betaby wrote: | Unless lawyers from the music industry step in. | themoonisachees wrote: | Yes? Its production code that is supposed to work (keyword | supposed). I'd like the code-suggestor to also be trained on | AAA games source leaks. | qu4z-2 wrote: | Can I introduce you to a concept called copyright? | | Is it fine if an author publishes a short story publicly on the | web for someone else to submit it to a contest as their own | work? | Hamuko wrote: | https://nedroidcomics.tumblr.com/post/41879001445/the-intern... | deepspace wrote: | This shows how copyright is all screwed up. Let's say the code in | question is based on a published algorithm, maybe Yuster and | Zwick, (I did not check). | | What exactly gives Davis a better claim to the copyright than the | inventors of the algorithm? Yes, I know software is copyrightable | while algorithms are not, but it is not at all clear to my why | that should be the case. The effort of translating an algorithm | into code is trivial compared to designing the algorithm in the | first place, no? | clnq wrote: | To be honest, it would probably benefit all of humanity if we | stopped rewriting the same code to then fix the same bugs in | it, and instead just used each other's algorithms to do | meaningful work. | | I work for a large tech company whose lawyers definitely care | that my code doesn't train an AI model somewhere much more than | I do. On the contrary, I would really like to open source all | of my work - it would make it more impactful and would | demonstrate my skills. It makes me a bit sad that my life's | work is going to be behind lock and key, visible to relatively | few people. Not to mention that the hundreds of thousands of | work hours, energy and effort that will be spent to replicate | it all over my industry in all other lock-and-key companies | makes the industry as a whole tremendously inefficient. | | I hope that AI models like Copilot will finally show to the | very litigious tech companies that their intellectual property | has been all over the public domain from the start. And we can | get over a lot of the petty algorithm IP suits that probably | hold back all tech in aggregate. We should all be working | together, not racing against each other in the pursuit of | shareholder value. | | Historically, mathematicians used to keep their solutions | secret in the interest of employment in the middle ages. So | there used to be mathematicians that could, for example, solve | certain quadratic equations but it took centuries before all | humanity could not benefit from this knowledge. I believe this | is what is happening with algorithms now. And it is very | counter-progress in my opinion. | heavyset_go wrote: | You can patent an algorithm if you want to protect it. | jeroenhd wrote: | (in some countries) | jonnycomputer wrote: | Microsoft should just train it on all their proprietary code | instead. See how sanguine they are about it then. | jacooper wrote: | They avoided answering this question at all costs. | | Because it exposes their direct hypocrisy in this, its fair use | for OSS but not for us. | | Questions here are very important, and its no surprise GitHub | avoided answering anything about CoPilot's legality: | | https://sfconservancy.org/GiveUpGitHub/ | naikrovek wrote: | who said they haven't. | | for something to show up verbatim in the output of a textual AI | model it needs to be an input many times. | | I wonder if the problem is not copilot, but many people using | this person's code without license or credit, and copilot being | trained on those pieces of code as well. copilot may just be | exposing a problem rather than creating one. | | I don't know much about AI, and I don't use copilot. | make3 wrote: | there's exactly no way they have | akudha wrote: | With the amount of resources that Microsoft has, how hard can | it be for them to exclude proprietary code that other people | have stolen? I'd bet it is easy for them, but they won't do | it. Because they don't care, because who is gonna take on | them? | | Will they "accidentally" include proprietary code from say, | Oracle? Nope. They'll make sure of it. But Joe Random? Sure | belorn wrote: | Microsoft have a public statement that they don't use | proprietary code, only public code with public licenses. They | have a lot of companies as customers who uses github, and | they also use a lot third-party code in their own products. | stefan_ wrote: | Even BSD et. al. have attribution requirements - that must | be a vanishingly small amount of code to be used. Me thinks | the people who run GitHub (who have apparently decided to | abandon the core business for the latest fun project) | aren't being entirely upfront. | eyelidlessness wrote: | As a thought experiment: what do we all suppose would be the | impact to Microsoft if they deliberately made public the | proprietary source code for all of their publicly available | commercial products and efforts (including licensed software, | services; excluding private contracts, research), but the rest | of their intellectual property and trade secrets remained | private? | | Since I'm posing the question, here's my guess: | | - Their stock would take at least a short term hit because it's | an unconventional and uncharacteristic move | | - The code would reveal more about their strategic interests to | competitors than they'd like, but probably nothing revelatory | | - It might confirm or reinforce some negative perceptions of | their business practices | | - It might dispel some too | | - It may reduce some competitive advantage amongst enormous | businesses, and may elevate some very large firms to potential | competitors | | - It would provide little to no new advantage to smaller | players who aren't already in reach of competing with them | and/or don't have the resources to capitalize on access to the | code | | - It would probably significantly improve public perception of | the company and its future intentions, at least among | developers and the broader tech community | | In other words, a wash. Overall business impact would be | roughly neutral. The code has more strategic than technical | value, there are few who could leverage the technical value | that is any kind of revenue center with growth potential. Any | disadvantage would be negated by the public image goodwill it | generated. | | Maybe my take is naive though! Maybe it would really hurt | Microsoft long term if suddenly everyone can fork Windows 11, | or steal ideas for their idiosyncratic office suite, or get | really clever about how to get funded to go head to head with | Azure armed with code everyone else can access too. | 8note wrote: | If they released all the source, I'd be able to run the nice | drawing app from windows inkspaces again, unkilling the app | they want dead | mccorrinall wrote: | If they'd open source their software I wouldn't have to wait | two months till they finally release the pdbs for the kernel | after every 2XH1 / 2XH2 update. | | It's so annoying that they are sooooo slow at this and we | have to keep our users from upgrading after every release. | thorum wrote: | What might be going on here is that Copilot pulls code it thinks | may be relevant from other files in your VS Code project. You | have the original code open in another tab, so Copilot uses it as | a reference example. For evidence, see the generated comment | under the Copilot completion: "Compare this snippet from | Untitled-1.cpp" - that's the AI accidentally repeating the prompt | it was given by Copilot. | ianbutler wrote: | I just tested it myself, and I most certainly do not have his | source open, and it reproduced his code verbatim with just the | function header in a random test c file I created in the middle | of a rust project I'm working on. | thorum wrote: | Ah ok. | naikrovek wrote: | stefan_ wrote: | Seems other people tried it? | https://twitter.com/larrygritz/status/1581713252144517120 | zaps wrote: | Drunk conspiracy theory: Nat knew Copilot would be a complete | nightmare and bailed. | [deleted] | colesantiago wrote: | Github Copilot is not AI at all, it is just a dumb code | regurgitator that just sells you code you wrote on GitHub and | takes all the credit for it shamelessly. | davidgerard wrote: | it's totally AI, in the "legal responsibility laundering" | sense. This is the main present day use case for saying "AI". | Jevon23 wrote: | Hopefully you understand how artists feel about DALL-E and | Midjourney now. | pessimizer wrote: | I like that if you prompt these with specific artists names, | they try their best to rip those particular artists off. | lolinder wrote: | I use copilot in my work every day, but only in places where I | know the code cannot be regurgitated because what I'm doing has | never been done before. | | I can write an HTML form, then prompt copilot to generate a | serializable class that can be used to deserialize that form on | the server. I can write a test for one of our internal apis, | and for every subsequent test I can just write the name of what | I expect it to check and it generates a test that _correctly_ | uses our internal APIs and verifies the expected behavior. | | You can have problems with the ethics of how GitHub and OpenAI | produced what they did, but to describe it the way that you did | requires never having really attempted to use it seriously. | ianbutler wrote: | I just tested it myself on a random c file I created in the | middle of a rust project I'm working on, it reproduced his full | code verbatim from just the function header so clearly it does | regurgitate proprietary code unlike some people have said, I do | not have his source so co-pilot isn't just using existing | context. | | I've been finding co-pilot really useful but I'll be pausing it | for now, and I'm glad I have only been using it on personal | projects and not anything for work. This crosses the line in my | head from legal ambiguity to legal "yeah that's gonna have to | stop". | naikrovek wrote: | what proprietary code? the guy on Twitter is seeing his own GPL | code bring reproduced. nothing proprietary there. | | do you have the "don't reproduce code verbatim" preference set? | webstrand wrote: | He owns the copyright to the code, and the code is not in the | public domain, therefore it is proprietary code. | yjftsjthsd-h wrote: | That's not how anybody uses the word proprietary when | dealing with software licensing. It's a term of art that | stands in contrast to open source licenses. | ianbutler wrote: | For the record, I don't typically think in terms of the | open source community. | | I grant that if most people are using it one way here I | was likely wrong for the way it is typically used by the | normal open source community, I followed up with a reply | saying it would likely be more correct for me to have | said "improperly licensed" to be included in the training | set. | | Still it being private means it probably shouldn't be in | the training set anyway regardless of license, because in | the future, truly proprietary code could be included, or | code without any license which reserves all right to the | creator. | ianbutler wrote: | Sorry it would likely be more correct to say "improperly | licensed" code and not proprietary. Still for someone like | me, the possibility of having LGPL, or any GPL licensed code | generated in their project is a solid no thanks. I know | others may think differently but those are toxic licenses to | me. | | Not to mention this code wasn't public so it's kind of moot, | having someone's private code be generated into my project is | very bad. | | As to the option, I do not, I wasn't even aware of the | option, but it's pretty silly to me that's not on by default, | or even really an option. That should probably be enabled | with no way to toggle it without editing the extension. | shadowgovt wrote: | Searching for the function names in his libraries, I'm seeing | some 32,000 hits. | | I suspect he has a different problem which (thanks to | Microsoft) is now a problem he has to care about: his code | probably shows up in one or more repos copy-pasted with | improper LGPL attribution. There'd be no way for Copilot to | know that had happened, and it would have mixed in the code. | | (As a side note: understanding _why_ an ML engine outputs a | particular result is still an open area of research AFAIK.) | chiefalchemist wrote: | I thought the same thing. But then shouldn't CP look at | things it's not supposed to use and see if that's happened? | How is that any different than you committing your API to | Platform X and shortly thereafter Platform X reaches out to | you...because GH let them know? | ianbutler wrote: | Yeah that's a mess, but that's way too much legal baggage for | me, an otherwise innocent end user, to want to take on. | Especially when I personally tend to try and monetize a lot | of my work. | | I understand there's no way for the model to know, but it's | really on Microsoft then to ensure no private, or poorly | licensed or proprietary code is included in the training set. | That sounds like a very tall order, but I think they're going | to have to otherwise they're eventually going to run into | legal problems with someone who has enough money to make it | hurt for them. | shadowgovt wrote: | Agreed. Silver lining: MS is now heavily incentivized to | invest in solutions for an open research problem. | [deleted] | enragedcacti wrote: | Expanding on that, even if Microsoft sees the error of their | ways and retrains copilot against permissively licensed | source or with explicit opt-in, it may get trained on | proprietary code the old version of copilot inserted into a | permissively licensed project. | | You would have to just hope that you can take down every | instance of your code and keep it down, all while copilot | keeps making more instances for the next version to train on | and plagiarize. | [deleted] | mdaniel wrote: | I didn't feel like weighing into that Twitter thread, but in the | screenshot one will notice that the code generated by Copilot has | secretly(?) swapped the order of the interior parameters to | "cs_done". Maybe that's fine, but maybe it's not, how in the | world would a Copilot consumer know to watch out for that? Double | extra good if a separate prompt for "cs_done" comingles multiple | implementations where some care and some don't. Partying ensues! | | Not to detract from the well founded licensing discussion, but | who is it that finds this madlibs approach useful in coding? | bmitc wrote: | What does | | > with "public code" blocked | | mean? Are you able set a setting in GitHub to tell GitHub that | you don't want your code used for Copilot training data? Is this | an abuse of the license you sign with GitHub, or did they update | it at some point to allow your code to be automatically used in | Copilot? I'm not crazy about the idea of paying GitHub for them | to make money off of my code/data. | galleywest200 wrote: | The option to omit "public code" means it should, in theory, | omit code that is licensed under such banners as the GPL. It | does not mean "omit private repositories". | [deleted] | _the_inflator wrote: | Well, this can impose a serious risk to companies and their cloud | strategy based on GitHub. | | Can these enterprises really make sure, that their code won't be | used to train Copilot? I am skeptical. | deworms wrote: | It prints this code because you have it open in another editor | tab. Wish people who don't know at all how it works stopped | acting all outraged when they're laughably wrong. | yjftsjthsd-h wrote: | > It prints this code because you have it open in another | editor tab. | | People upthread have reproduced and demonstrated that that's | not the issue here. | | EDIT: Actually, OP says "The variant it produces is not on my | machine." - | https://twitter.com/DocSparse/status/1581560976398114822 | | > Wish people who don't know at all how it works stopped acting | all outraged when they're laughably wrong. | | Physician, heal thyself. | lupire wrote: | Can you link to more info about this? If this is accurate, many | people aren't aware. | Traubenfuchs wrote: | What keeps him from suing if he is so sure? | | Those pretty little licenses are a waste of storage if no one | enforces them. | SamoyedFurFluff wrote: | Money. Suing is often survival of the richest. | psychphysic wrote: | Hot take, AI will steal all our jobs. Get over it. | kweingar wrote: | I've noticed that people tend to disapprove of AI trained on | their profession's data, but are usually indifferent or positive | about other applications of AI. | | For example, I know artists who are vehemently against DALL-E, | Stable Diffusion, etc. and regard it as stealing, but they view | Copilot and GPT-3 as merely useful tools. I also know software | devs who are extremely excited about AI art and GPT-3 but are | outraged by Copilot. | | For myself, I am skeptical of intellectual property in the first | place. I say go for it. | bcrosby95 wrote: | I look at IP differently. | | For copyright, the act of me creating something doesn't deprive | you of anything, except the ability to consume or use the thing | I created. If I were influenced by something, you can still be | influenced by that same thing - I do not exhaust any resources | I used. | | This is wholely different from physical objects. If I create a | knife, I deprive you of the ability to make something else from | those natural resources. Natural resources that I didn't create | - I merely exploited them. | | Because of this, I'm fine with copyright (patents are another | story). But I have some issues with physical property. | joecot wrote: | > For myself, I am skeptical of intellectual property in the | first place. I say go for it. | | If we didn't live in a Capitalist society, that would be fair. | But we currently do. That Capitalist society cares little about | the well being of artists unless it can find a way to make | their art profitable. Projects like DALL-E and Midjourney | pillage centuries of human art and sell it back to us for a | profit, while taking away work from artists who struggle to | make ends meet as it is. Software Developers are generally less | concerned about Copilot because they're still making 6 figures | a year, but they'll start to get concerned if the technology | gets smart enough that society needs less Developers. | | An automated future _should_ be a good thing. It should mean | that computers can take care of most tasks and humans can have | more leisure time to relax and pursue their passions. The | reason that artists and developers panic over things like this | is that they are watching themselves be automated out of | existence, and have seen how society treats people who aren 't | useful anymore. | yjftsjthsd-h wrote: | I can think of two explanations for that off the top of my | head. | | The first is that people only recognize the problems with the | things that they're familiar with, which you would kind of | expect. | | The other option is that there's a difference in the thing that | people object to. My _impression_ is that artists seem to be | reacting to the idea that they could be automated out of a job, | where programmers are mostly objecting to blatant copyright | violation. (Not universally in either case, but often.) If that | is the case, then those are genuinely different arguments made | by different people. | lucideer wrote: | I don't know specifically what DALL-E was trained on, but if | it's art for which the artists' have not consented to it being | used to train AI then that's problematic. I haven't seen any | objections to DALL-E _on that basis specifically_ though, | whereas all the discussion of Copilot is around the fact that | code authorship & Github accounts are not intrinsically tied | together, making it impossible to have code authors consent to | their code being used, regardless of what ToS someone's agreed | to. | | > _For myself, I am skeptical of intellectual property in the | first place. I say go for it._ | | I'm in a similar boat but this is precisely the reason I object | so strongly to Copilot. IP has been invented & | perpetuated/extended to protect large corporate interests, | under the guise of protecting & sustaining innovators & | creative individuals. Copilot is a perfect example of large | corporate interest ignoring IP _when it suits them_ to exploit | individuals. | | In other words: the reason I'm skeptical of IP is the same | reason I'm skeptical of Copilot. | __alexs wrote: | Stable Diffusion and DallE were both trained on copyrighted | content scraped from the internet with no consent from the | publishers. | | It's quite a common complaint because some of the most | popular prompts involve just appending an artist's name to | something to get it to copy their style. | dawnerd wrote: | In theory AI should never return an exact copy of a copyrighted | work or even anything close enough you could argue is the | original "just changed". If the styles are the same I think | that's fine, no different than someone else cloning it. But | there's definitely outputs from stable diffusion that looks | like the original with some weird artifacts. | | We need regulation around it. | XorNot wrote: | > there's definitely outputs from stable diffusion that looks | like the original with some weird artifacts. | | Do you have examples? Because SD will generate photoreal | outputs and then get subtle details (hands, faces) wrong, but | unless you have the source image in hand then you've no way | of knowing whether it's a "source image" or not. | rtkwe wrote: | Code is much easier to do that because the avenues for | expression are significantly limited compared to just | creating an image. For it to be useful copilot has to produce | compiling and reasonably terse and understandable code. The | compiler in particular is a big bottle neck to the range of | the output. | ghoward wrote: | I am a programmer who has written extensively on my blog and HN | against Copilot. | | I am also not a hypocrite; I do not like DALL-E or Stable | Diffusion either. | | As a sibling comment implies, these AI tools give more power to | people who control data, i.e., big companies or wealthy people, | while at the same time, they take power away from individuals. | | Copilot is bad for society. DALL-E and Stable Diffusion are bad | for society. | | I don't know what the answer is, but I sure wish I had the | resources to sue these powerful entities. | vghfgk1000 wrote: | akudha wrote: | _but I sure wish I had the resources to sue these powerful | entities._ | | I wonder if there is a crowdfunding platform like gofundme, | for lawsuits. Or can gofundme itself can be used for this | purpose? It would be fantastic to sue the mega polluters, | lying media like Fox etc. | | That said, even with a lot of money, are these cases | winnable? Especially given the current state of Supreme Court | and other federal courts? | williamcotton wrote: | I'm a programmer and a songwriter and I am not worried about | these tools and I don't think they are bad for society. | | What did the photograph do to the portrait artist? What did | the recording do to the live musician? | | Here's some highfalutin art theory on the matter, from almost | a hundred years ago: https://en.wikipedia.org/wiki/The_Work_o | f_Art_in_the_Age_of_... | snarfy wrote: | > What did the recording do to the live musician? | | The recording destroyed the occupation of being a live | musician. People still do it for what amounts to tip money, | but it used to be a real job that people could make a | living off of. If you had a business and wanted to | differentiate it by having music, you had to pay people to | play it live. It was the only way. | __alexs wrote: | > What did the photograph do to the portrait artist? | | It completely destroyed the jobs of photo realistic | portrait artists. You only have stylised portrait painting | now and now that is going to be ripped off too. | SamoyedFurFluff wrote: | But this isn't like photography and portrait artistry. This | is more like a wealthy person stealing your entire art | catalog, laundering it in some fancy way, and then claiming | they are the original creator. Stable Diffusion has | literally been used to create new art by screenshotting | someone's live-streamed art creation process as the seed. | While creating derivative work has always been considered | art(such as deletion poetry and collage), it's extremely | uncommon and blase to never attribute the original(s). | insanitybit wrote: | > This is more like a wealthy person stealing your entire | art catalog, laundering it in some fancy way, and then | claiming they are the original creator. | | If I take a song, cut it up, and sing over it, my release | is valid. If I parody your work, that's my work. If you | paint a picture of a building and I go to that spot and | take a photograph of that building it is my work. | | I can derive all sorts of things, things that I own, from | things that others have made. | | Fair use is a thing: https://www.copyright.gov/fair-use/ | | As for talking about the originals, would an artist | credit every piece of inspiration they have ever | encountered over a lifetime? Publishing a seed seems fine | as a _nice_ thing to do, but pointing at the billion | pictures that went into the drawing seems silly. | tremon wrote: | Fair use is an affirmative defense. Others can still sue | you for copying, and you will have to hope a judge agrees | with your defense. How do you think Google v. Oracle | would have turned out if Google's defense was "no your | honor, we didn't copy the Java sources. We just used | those sources as input to our creative algorithms, and | this is what they independently produced"? | ghoward wrote: | Do you know what's different about the photograph or the | recording? | | _They are still their own separate works!_ | | If a painter paints a person for commission, and then that | person also commissions a photographer to take a picture of | them, is the photographer infringing on the copyright of | the painter? Absolutely not; the works are separate. | | If a recording artist records a public domain song that | another artist performs live, is the recording artist | infringing on the live artist? Heavens, no; the works are | separate. | | On the other hand, these "AI's" are taking existing works | and reusing them. | | Say I write a song, and in that song, I use one stanza from | the chorus of one of your songs. Verbatim. Would you have a | copyright claim against me for that? Of course, you would! | | That's what these AI's do; they copy portions and mix them. | Sometimes they are not substantial portions. Sometimes, | they are, with verbatim comments (code), identical | structure (also code), watermarks (images), composition | (also images), lyrics (songs), or motifs (also songs). | | In the reverse of your painter and photographer example, we | saw US courts hand down judgment against an artist who | blatantly copied a photograph. [1] | | Anyway, that's the difference between the tools of | photography (creates a new thing) and sound recording | (creates a new thing) versus AI (mixes existing things). | | And yes, sound mixing can easily stray into copyright | infringement. So can other copying of various copyrightable | things. I'm not saying humans don't infringe; I'm saying | that AI does _by construction_. | | [1]: https://www.reuters.com/world/us/us-supreme-court- | hears-argu... | williamcotton wrote: | I'm not sure sure that originality is that different | between a human and a neural network. That is to say that | what a human artist is doing has always involved a lot of | mixing of existing creations. Art needs to have a certain | level of familiarity in order to be understood by an | audience. I didn't invent 4/4 time or a I-IV-V | progression and I certainly wasn't the first person to | tackle the rhyme schemes or subject matter of my songs. I | wouldn't be surprised if there were fragments from other | songs in my lyrics or melodies, either from something I | heard a long time ago or perhaps just out of coincidence. | There's only so much you can do with a folk song to begin | with! | | BTW, what happened after the photograph is that there | were less portrait artists. And after the recording there | were less live musicians. There are certainly no less | artists nor musicians, though! | ghoward wrote: | > I'm not sure sure that originality is that different | between a human and a neural network. That is to say that | what a human artist is doing has always involved a lot of | mixing of existing creations. | | I disagree, but this is a debate worth having. | | This is why I disagree: humans don't copy _just_ | copyrighted material. | | I am in the middle of developing and writing a romance | short story. Why? Because my writing has a glaring | weakness: characters, and romance stands or falls on | characters. It's a good exercise to strengthen that | weakness. | | Anyway, both of the two people in the (eventual) couple | developed from _my real life_ , and not from any | copyrighted material. For instance, the man will | basically be a less autistic and less selfish version of | myself. The woman will basically be the kind of person | that annoys me the most in real life: bright, bubbly, | always touching people, etc. | | There is no copyrighted material I am getting these | characters from. | | In addition, their situation is not typical of such | stories, but it _does_ have connections to my life. They | will (eventually) end up in a ballroom dance competition. | Why that? So the male character hates it. I hate ballroom | dance during a three-week ballroom dancing course in 6th | grade, the girls made me hate ballroom dancing. I won 't | say how, but they did. | | That's the difference between humans and machines: | machines can only copyright and mix other copyrightable | material; humans can copy _real life_. In other words, | machines can only copy a representation; humans can copy | the real thing. | | Oh, and the other difference is emotion. I've heard that | people without the emotional center of their brains can | take _six hours_ to choose between blue and black pens. | There is something about emotions that drives decision- | making, and it 's decision-making that drives art. | | When you consider that brain chemistry, which is a | function of genetics and previous choices, is a big part | of emotions, then it's obvious that those two things, | genetics and previous choices, are _also_ inputs to the | creative process. Machines don 't have those inputs. | | Those are the non-religious reasons why I think humans | have more originality than machines, including neural | networks. | c7b wrote: | > these AI tools give more power to people who control data, | i.e., big companies or wealthy people, while at the same | time, they take power away from individuals. | | Not sure I agree, but I can at least see the point for | Copilot and DALL-E - but Stable Diffusion? It's open source, | it runs on (some) home-use laptops. How is that taking away | power from indies? | | Just look at the sheer number of apps building on or | extending SD that were published on HN, and that's probably | just the tip of the iceberg. Quite a few of them at least | looked like side projects by solo devs. | ghoward wrote: | SD is better than the other two, but it will still | centralize control. | | I imagine that Disney would take issue with SD if material | that Disney owned the copyright to was used in SD. They | would sue. SD would have to be taken off the market. | | Thus, Disney has the power to ensure that their copyrighted | material remains protected from outside interests, and they | can still create unique things that bring in audiences. | | Any small-time artist that produces something unique will | find their material eaten up by SD in time, and then, | because of the sheer _number_ of people using SD, that | original material will soon have companions that are like | it _because they are based on it in some form_. Then, the | original won 't be as unique. | | Anyone using SD will not, by definition, be creating | anything unique. | | And when it comes to art, music, photography, and movies, | _uniqueness_ is the best selling point; once something is | not unique, it becomes worth less because something like it | could be gotten somewhere else. | | SD still has the power to devalue original work; it just | gives normal people that power on top of giving it to the | big companies, while the original works of big companies | remain safe because of their armies of lawyers. | c7b wrote: | > I imagine that Disney would take issue with SD if | material that Disney owned the copyright to was used in | SD. They would sue. SD would have to be taken off the | market. | | Are you sure? | | I'm not familiar with the exact data set they used for SD | and whether or not Disney art was included, but my | understanding is that their claim to legality comes from | arguing that the use of images as training data is 'fair | use'. | | Anyone can use Disney art for their projects as long as | it's fair use, so even if they happened to not include | Disney art in SD, it doesn't fully validate your point, | because they could have done so if they wanted. As long | as training constitutes fair use, which I think it should | - it's pretty much the AI equivalent of 'looking at | others' works', which is part of a human artist's | training as well. | ghoward wrote: | > Are you sure? | | Yes, I'm sure. | | > I'm not familiar with the exact data set they used for | SD and whether or not Disney art was included, but my | understanding is that their claim to legality comes from | arguing that the use of images as training data is 'fair | use'. | | They could argue that. But since the American court | system is currently (almost) de facto "richest wins," | their argument will probably not mean much. | | The way to tell if something was in the dataset would be | to use the name of a famous Disney character and see what | it pulls up. If it's there, then once the Disney beast | finds out, I'm sure they'll take issue with it. | | And by the way, I don't buy all of the arguments for | machine learning as fair use. Sure, for the training | itself, yes, but once the model is used by others, you | now have a distribution problem. | | More in my whitepaper against Copilot at [1]. | | [1]: https://gavinhoward.com/uploads/copilot.pdf | cmdialog wrote: | Obviously this is a matter of philosophy. I am using Copilot | as an assistant, and for that it works out very nicely. It's | fancy code completion. I don't know who is trying to use this | to write non-trivial code but that's as bad an idea as trying | to pass off writing AI "prompts" as a type of engineering. | | These things are tools to make more involved things. You're | not going to be remembered for all the AI art you prompted | into existence, no matter how many "good ones" you manage to | generate. No one is going to put you into the Guggenheim for | it. | | Likewise, programmers aren't going to become more depraved or | something by using Copilot. I think that kind of prescriptive | purism needs to Go Away Forever, personally. | bayindirh wrote: | I, with my software developer hat, am not excited by AI. Not a | bit, honestly. Esp. about these big models trained on huge | amount of data, without any consent. | | Let me be perfectly clear. I'm all for the tech. The | capabilities are nice. The thing I'm _strongly against_ is | training these models on any data without any consent. | | GPT-3 is OK, training it with public stuff regardless of its | license is not. | | Copilot is OK, training on with GPL/LGPL licensed code without | consent is not. | | DALL-E/MidJourney/Stable Diffusion is OK. Training it with non | public domain or CC0 images is not. | | "We're doing something amazing, hence we need no permission" is | ugly to put it very lightly. | | I've left GitHub because of CoPilot. Will leave any photo | hosting platform if they hint any similar thing with my | photography, period. | psychphysic wrote: | I disagree. | | Those are effectively cases of cryptomnesia[0]. Part and | parcel of learning. | | If you don't want broad access your work, don't upload it to | a public repository. It's very simple. Good on you for | recognising that you don't agree with what GitHub looks at | data in public repos, but it's not their problem. | | [0] https://en.m.wikipedia.org/wiki/Cryptomnesia | bayindirh wrote: | > Those are effectively cases of cryptomnesia. | | Disagree, outputting training data as-is is not | cryptomnesia. This is not Copilot's first case. It also | reproduced ID software's fast inverse square root function | as-is, including its comments, but without its license. | | > If you don't want broad access your work, don't upload it | to a public repository. It's very simple. | | This is actually both funny and absurd. This is why we have | licenses at this point. If all the licenses is moot, then | this opens a very big can of worms... | | My terms are simple. If you derive, share the derivation | (xGPL). Copilot is deriving my code. If you use my code as | a derivation point, honor the license, mark the derivation | with GPL license. This voids your business case? I don't | care. These are my terms. | | If any public item can be used without any limitations, | Getty Images (or any other stock photo business) is | illegal. CC licensing shouldn't exist. GPL is moot. Even | the most litigious software companies' cases (Oracle, SCO, | Microsoft, Adobe, etc.) is moot. Just don't put it on | public servers, eh? | | Similarly, music and other fine arts are generally publicly | accessible. So copyright on any and every production is | also invalid as you say, because it's publicly available. | | Why not put your case forward with attorneys of Disney, WB, | Netflix and others? I'm sure they'll provide all their | archives for training your video/image AI. Similarly | Microsoft, Adobe, Mathworks, et al. will be thrilled to | support your CoPilot competitor with their code, because a) | Any similar code will be just cryptomnesia, b) The software | produced from that code is publicly accessible anyway. | | At this point, I even didn't touch to the fact that humans | are trained much more differently than neural networks. | matheusmoreira wrote: | > For myself, I am skeptical of intellectual property in the | first place. I say go for it. | | Me too. I think copyright and these silly restrictions should | be abolished. | | At the same time, I can't get over the fact these self-serving | corporations are all about "all rights reserved" when it | benefits them while at the same time undermining other people's | rights. Microsoft absolutely knows that what they're doing is | wrong. Recently it was pointed out to me that Microsoft | employees can't even look at GPL source code, lest they | subcounsciously reproduce it. Yet they think their software can | look at other people's code and reproduce it? | wzdd wrote: | This talking point seems to come up often, but since it's | basically saying that people are hypocrites I think it is a bad | faith thing to say without reasonable proof that it's not a | fringe opinion (or completely invented). | | For what it's worth, the people I know who are opposed to this | sort of "useful tool" don't discriminate by profession. | teddyh wrote: | An accusation of hypocrisy _is not an argument_ ; at least not | a relevant one. | pclmulqdq wrote: | I think the distinction is that only one of those classes tends | to produce exact copies of work. Programmers get very upset at | DALL-E and Stable Diffusion producing exact (and near-exact) | copies of artwork too. In contrast to exact copying, production | of imitations (not exact copies, but "X in the style of Y") is | something that artists have been doing for centuries, and is | widely thought of as part of arts education. | | For some reason, code seems to lend itself to exact copying by | AIs (and also some humans) rather than comprehension and | imitation. | XorNot wrote: | I'm mildly suspicious that this example is an implementation | of a generic matrix functionality though: you couldn't patent | this sort of work, because it's not patentable - it's a | mathematics. It's fundamentally a basic operation, that would | have to be implemented with a similar structure regardless of | how you do it. | pclmulqdq wrote: | Patents and copyrights are totally different, and should be | treated as such. The issue isn't about whether someone | copies the algorithm, it's whether they copy the written | code. Nothing in an algorithms textbook is patentable | either, but if you copy the words describing an algorithm | from it, you are stealing their description. | heavyset_go wrote: | Mathematics is not patentable, but you can patent the steps | a computer takes to compute the results of that particular | algorithm. | ChildOfChaos wrote: | I think sadly it's just people being protective, the technology | is interesting so if it doesn't hit their line of work, it's | fantastic, if it does, then it's terrible. | | There is no arguing against it though, you can't stop it, all | this stuff is coming eventually to all of these areas, might as | well try and find ways to use the oppurutinies while you can | while some of this is still new. | naillo wrote: | I mean we definitely _can_ stop it. Laws are a pretty strong | deterrent. | ghaff wrote: | "We" maybe can't stop it. But if there were the political | will to kneecap many uses of machine learning, it's not | obvious there's any reason it _couldn 't_ be done even if | not 100% effective. Whether that would be a good thing is a | different question. | faeriechangling wrote: | You can slow this, you can't stop it whatsoever. It's about | as ultimately futile as an effort as trying to stop piracy. | People are ALREADY running salesforce codegen and stable | diffusion at home, you can't put the genie back in the | bottle, what we'll have 20 years from now is going to make | critics of these tools have nightmares. | | If you try to outlaw it, the day before the laws come into | effect, I'm going to download the very best models out | there and run it on my home computer. I'll start organising | with other scofflaws and building our own AI projects in | the fashion of leelachesszero with donated compute time. | | You can shut down the commercial versions of these tools. | You can scare large corporations from banning the use of | these tools by corporations. You can pull an uno reverse | card and use modified versions of the tools to CHECK for | copyright infringement and sue people under existing laws | AND you'll probably even be able to statistically prove | somebody is an AI user. But STOPPING the use of these | tools? Go ahead and try, won't happen. | tablespoon wrote: | > You can slow this, you can't stop it whatsoever. It's | about as ultimately futile as an effort as trying to stop | piracy. ... But STOPPING the use of these tools? Go ahead | and try, won't happen. | | So? No one needs to _stop it totally_. The world isn 't | black and white, pushing it to the fringes is almost | certainly a sufficient success. | | Outlawing murder hasn't stopped murder, but no one's | given up on enforcing those laws because of the futility | of perfect success. | | > If you try to outlaw it, the day before the laws come | into effect, I'm going to download the very best models | out there and run it on my home computer. I'll start | organising with other scofflaws and building our own AI | projects in the fashion of leelachesszero with donated | compute time. | | That sounds like a cyberpunk fantasy. | faeriechangling wrote: | Cyberpunk sure, but fantasy? Not at all. | throwaway675309 wrote: | You'll never be able to push it to the fringes because | there will never be a legal universal agreement even from | country to country on where to draw the line. | | And as computers get more powerful and the models get | more efficient it'll become easier and easier to self | host and run them on your own dime. There are already one | click installers for generative models such as stable | diffusion that run on modest hardware from a few years | back. | tpm wrote: | What would the law do? Forbid automatic data collection | and/or indexing and further use without explicit copyright | holder agreement? That would essentially ban the whole | internet as we know it, not saying that would be bad, but | this is never going to happen, too much accumulated | momentum in the opposite direction. | chiefalchemist wrote: | To your point, the law can do a lot of things. The issue | here is the clarity and ability to enforce the law. | [deleted] | machinekob wrote: | I'm pretty sure DALL-E was trained only on not copyright | material ( they say so :| ). | | But to be honest if your code is open source im pretty sure | Microsoft don't care about licence they'll just use it cause | "reasons" same about stable diffusion they don't give a fuk | about data if its in internet they'll use it so its topic that | probably will be regulated in few years. | | Until then lets hope they'll get milked (both Microsoft and | NovelAI) for illegal content usage and I srsly hope at least | few layers will try milking it asap especially NovelAI which | illegally usage a lot of copyrighted art in the training data. | msbarnett wrote: | > I'm pretty sure DALL-E was trained only on not copyright | material | | Nope. DALL-E generates images with the Getty Watermark, so | clearly there's copyrighted materials in its training set: ht | tps://www.reddit.com/r/dalle2/comments/xdjinf/its_pretty_o... | pclmulqdq wrote: | Lots of people ironically put the Getty watermark on | pictures and memes that they make to satirically imply that | they are pulling stock photos off the internet with the | printscreen function instead of paying for them. | msbarnett wrote: | Memes generally would not fall under the category of non- | copyrighted material; they're most of the time extremely | copyrighted material just being used without permission. | And even a wholly original work an artist sarcastically | puts a Getty watermark and then licensed under Creative | Commons or something would fall into very murky territory | - the Getty watermark itself is the intellectual property | of Getty. The original image author might plead fair use | as satire, but satirical intentions aren't really a | defence available to DALL-E. | | So even if we're assuming these were wholly original | works that the author placed under something like a | Creative Commons license, the fact that it incorporated | an image they had no rights to would at the very least | create a fairly tangled copyright situation that any | really rigorous evaluation of the copyright status of | every image in the training set would tend to argue | towards rejecting as not worth the risk of litigation. | | But the more likely scenario here is that they did | minimal at best filtering of the training set for | copyrights. | pclmulqdq wrote: | You could argue that mocking the Getty logo like that is | some form of fair use, which would be a backdoor through | which it can end up as a legitimate element of a public | domain work, in which case it would be fair game. | | I agree with you that it is also possible that people | posted Getty thumbnails to some sites as though they are | public domain, and that is how the AIs learned the | watermark. | nottorp wrote: | Dunno about Getty, but I've been shown the cover for | Beatles' Yellow Submarine done in different colors as some | great AI advancement. | machinekob wrote: | Thanks for posting this out never see that before. If they | use copyright images they should also get punished in the | original paper they say no copyright content was used but | it can be just lies who know data speak for itself and if | they can prove this in court they should get punished ( so | again Microsoft getting rekt for that will be good to see | :] ). | tpxl wrote: | When Joe Rando plays a song from 1640 on a violin he gets a | copyright claim on Youtube. When Jane Rando uses devtools to | check a website source code she gets sued. | | When Microsoft steals all code on their platform and sells it, | they get lauded. When "Open" AI steals thousands of copyrighted | images and sells them, they get lauded. | | I am skeptical of imaginary property myself, but fuck this one | set of rules for the poor, another set of rules for the masses. | lo_zamoyski wrote: | The poor are the masses, or at least part of the masses. | gw99 wrote: | If this is the new status quo then I suggest we find out how | to fuck up the corpus as best as possible. | a4isms wrote: | > one set of rules for the poor, another set of rules for the | masses. | | _Conservatism consists of exactly one proposition, to wit:_ | | _There must be in-groups whom the law protects but does not | bind, alongside out-groups whom the law binds but does not | protect._ | | --Composer Frank Wilhoit[1] | | [1]: https://crookedtimber.org/2018/03/21/liberals-against- | progre... | thrown_22 wrote: | sbuttgereit wrote: | Thanks for posting the link to the quote. Having said that, | I don't think it's possible to quote that bit and get an | understanding of the idea being conveyed without it's | opening context. Indeed, it's likely to cause a false idea | of what's being conveyed. From earlier in the same post: | | _" There is no such thing as liberalism -- or | progressivism, etc. | | There is only conservatism. No other political philosophy | actually exists; by the political analogue of Gresham's | Law, conservatism has driven every other idea out of | circulation."_ | a4isms wrote: | I agree that adds considerable depth to the value of the | quote, and connects it to the conversation he appeared to | be having, which is about the first line you've quoted: | | There is no such thing as being a Liberal or Progressive, | there is only being a Conservative or anti-Conservative, | and while there is much nuance and policy to debate about | that, it boils down to deciding whether you actually | support or abhor the idea of "the law" (which is a much | broader concept than just the legal system) existing to | enforce or erase the distinction between in-groups and | out-groups. | | But that's just my read on it. Getting back to | intellectual property, it has become a bitter joke on | artists and creatives, who are held up as the | beneficiaries of intellectual property laws in theory, | but in practice are just as much of an out-group as | everyone else. | | We are bound by the law--see patent trolls, for example-- | but not protected by it unless we have pockets deep | enough to sue Disney for not paying us. | stickfigure wrote: | Yeah, inequality sucks. So how about we focus on making the | world better for everyone instead of making the world equally | shitty for everyone? | imwillofficial wrote: | This makes no sense. | | Absolutely nobody is arguing to make the world shittier | zopa wrote: | Because we're not the ones with the power. People with | limited power pick the fights they might win, not the | fights that maximize total welfare for everyone including | large copyright holders. There's no moral obligation to be | a philosopher king unless you're actually on a throne. | foobarbecue wrote: | > one set of rules for the poor, another set of rules for the | masses | | Presumably by "the masses" you meant "the large | corporations"? | rtkwe wrote: | I think copilot is a clearer copyright violation than any of | the stable diffusion projects though because code has a much | narrower band of expression than images. It's really easy to | look at the output of CoPilot and match it back to the | original source and say these are the same. With stable | diffusion it's much closer to someone remixing and aping the | images than it is reproducing originals. | | I haven't been following super closely but I don't know of | any claims or examples where input images were recreated to a | significant degree by stable diffusion. | e40 wrote: | Preach. So incredibly annoyed when I tried to send a video of | my son playing Beethoven to his grandparents and it was taken | down due to a copyright violation. | c7b wrote: | > When Joe Rando plays a song from 1640 on a violin he gets a | copyright claim on Youtube. When Jane Rando uses devtools to | check a website source code she gets sued. | | Do you have any evidence for those claims, or anything | resembling those examples? | | Music copyright has long expired for classical music, and big | shots are definitely not exempt from where it applies. Just | look at how much heat Ed Sheeran, one of the biggest | contemporary pop stars, got for "stealing" a phrase that was | literally just chanting "Oh-I" a few times (just to be clear, | I am familiar with the case and find it infuriating that this | petty rent-seeking attempt went to trial at all, even if | Sheeran ended up being completely cleared, but to great | personal distress as he said afterwards). | | And who ever got sued for using dev tools? Is there even a | way to find that out? | banana_giraffe wrote: | https://twitter.com/mpoessel/status/1545178842385489923 | | Among many others. Classical music may have fallen into | public domain, but modern performances of it is | copyrightable, and some of the big companies use copyright | matching systems, including YouTube's, that often flags new | performances as copies of recordings. | codefreakxff wrote: | There have been a number of stories about musicians being | copyright claims. Here is the first result on Google | | https://www.radioclash.com/archives/2021/05/02/youtuber- | gets... | | For being sued for looking at source here is the first | result on Google | | https://www.wired.com/story/missouri-threatens-sue- | reporter-... | frob wrote: | Just to be clear, because it's in the title, the reporter | was threatened with a lawsuit for looking at source code. | I cannot find anyone acually sued for it. BTW, here's an | article saying said reporter wasn't sued: https://www.the | register.com/AMP/2022/02/15/missouri_html_hac... | | Anyone with a mouth can run it and threaten a lawsuit. If | fact, I threaten to sue you for misinformation right now | unless you correct your post. Fat lot of good my threat | will do because no judge in their right mind would | entertain said lawsuit because it's baseless. | c7b wrote: | Ok - it is a true shame that the YouTube copyright claim | system is so broken as to enable those shady practices, | and that politicians still haven't upped their knowledge | of the internet beyond a 'series of tubes'. | | But surely the answer should be to fix the broken YT | system and to educate politicians to abstain from | baseless threats, not to make AI researchers pay for it? | insanitybit wrote: | > Joe Rando plays a song from 1640 on a violin he gets a | copyright claim on Youtube | | That can't possibly be a valid claim, right? AFAIK copyright | is "gone" after the original author dies + ~70 years. Before | fairly recently it was even shorter. Something from 1640 | surely can't be claimed under copyright protection. There are | much more recent changes where that might not be the case, | but 1640? | | > When Jane Rando uses devtools to check a website source | code she gets sued. | | Again, that doesn't sound like a valid suit. Surely she would | win? In the few cases I've heard of where suits like this are | brought against someone they've easily won them. | cipherboy wrote: | The poster isn't claiming that this is a valid DMCA suit. | Nearly everyone who is at a mildly decent level and has | posted their own recordings of classical musical to YouTube | have received these claims _in their Copyright section_. | YouTube itself prefixes this with some lengthy disclaimer | about how this isn't the DMCA process but that they reserve | the right to kick you off their site based on fraudulent | matches made by their algorithms. | | They are absolutely completely and utterly bullshit. Nobody | with half an ear for music will mistake my playing of | Bach's G Minor Sonata with Arthur Grumiaux (too many out of | tune notes :-D). But yet, YouTube still manages to match | this to my playing, probably because they have never heard | it before now (I recorded it mere minutes before). | | So no, it isn't a valid claim, but this algorithm trained | on certain examples of work, manages to make bad | classifications with potentially devastating ramifications | for the creator (I'm not a monetized YouTube artist, but if | this triggered a complete lockout of my Google account(s), | this likely end Very Badly). | | I think it's a very relevant comparison to the GP's | examples. | alxlaz wrote: | > That can't possibly be a valid claim, right? | | It's not, but good luck talking to a human at Youtube when | the video gets taken down. | | > Again, that doesn't sound like a valid suit. Surely she | would win? | | Assuming she could afford the lawyer, and that she lives | through the stress and occasional mistreatment by the | authority, yes, probably. Both are big ifs, though. | lbotos wrote: | > That can't possibly be a valid claim, right? | | I'm not a lawyer, but my understanding is that while the | "1640's violin composition" _itself_ may be out of | copyright, if I record myself playing it, _my recording of | that piece is my copyright_. So if you took my file | (somehow) and used it without my permission, and I could | prove it, I could claim copyright infringement. | | That's my understanding, and I've personally operated that | way to avoid any issues since it errs on the side of | safety. (Want to use old music, make sure the license of | the recording explicitly says public domain or has license | info) | vghfgk1000 wrote: | insanitybit wrote: | Yes, that sounds right to me. But that's not relevant to | "Joe Whoever played it and got sued". | lupire wrote: | The problem is that YouTube AI thinks your recording is | the same as every other recording, because it doesn't | understand the difference between composition and | recording. | Rimintil wrote: | > That can't possibly be a valid claim, right? | | It has indeed happened. | | https://boingboing.net/2018/09/05/mozart-bach-sorta- | mach.htm... | | Sony later withdrew their copyright claim. | | There are two pieces to copyright when it comes to public | domain: | | * The work (song) itself -- can't copyright that | | * The recording -- you are the copyright owner. No one, | without your permission, can re-post your recording | | And of course, there is derivative work. You own any | portion that is derivative of the original work. | insanitybit wrote: | > Sony later withdrew their copyright claim. | | Right, that's my point... I can sue anyone for anything, | doesn't mean I'll win. | imwillofficial wrote: | It worked out justified in this case. | | The VAST majority of cases it does not. | sumedh wrote: | > I can sue anyone for anything, doesn't mean I'll win. | | You cant sue if you dont have money, a big corp can sue | even if they know they are wrong. | pessimizer wrote: | > Again, that doesn't sound like a valid suit. Surely she | would win? In the few cases I've heard of where suits like | this are brought against someone they've easily won them. | | That's freedom of speech for everyone who can afford a | lawyer to bring suit against a music rights-management | company. | insanitybit wrote: | Yes, this is a problem with the legal system in general. | kevin_thibedeau wrote: | The songwriter copyright is expired but there is still a | freshly minted copyright on the video and the audio | performance. | | This becomes particularly onerous when trolls claim | copyright on published recordings of environmental sounds | that happen to be similar but not identical to someone | else's but they do have a legitimate claim on the original | recording. | Rodeoclash wrote: | This isn't a legal copyright claim, it's a "YouTube" | copyright claim which is entirely owned and enforced by | YouTube. | insanitybit wrote: | OK but then we're just talking about content moderation, | which seems like a separate issue. I think using "YouTube | copyright claim" as a proxy for "legal copyright claim" | is more to the parent's point, especially since that's | how YouTube purports the claim to work. Otherwise it | feels irrelevant. | cipherboy wrote: | Copyright claims are a form of content moderation, by | preventing reuse of content that others own. | | But it can still be weaponized to prevent legitimate | resubmissions of parallel works, that can potentially | deplatform legitimate users, depending on the reviewer | and the clarity of the rebuttal. | lupire wrote: | YouTube does this moderation in order to avoid legal | pressure from copyright holders, as in | | https://en.m.wikipedia.org/wiki/Viacom_International_Inc. | _v..... | cyanydeez wrote: | Basically, copyright is for people with copyright lawyers | kodah wrote: | That's not even a joke. One of the premises of a copyright | is that you defend your intellectual property or lose it. | If the system were more equitable then it would defend your | copyright. | heavyset_go wrote: | You're thinking of trademarks. | eropple wrote: | This is an inaccurate description of copyright, at least | in the United States. | | Trademarks require active defense to avoid | genericization. Copyright may be asserted at the holder's | discretion. | heavyset_go wrote: | Your post is a good example of the _tu quoque_ fallacy[1]. | | [1] https://en.wikipedia.org/wiki/Tu_quoque | tablespoon wrote: | > I've noticed that people tend to disapprove of AI trained on | their profession's data, but are usually indifferent or | positive about other applications of AI. | | In other words: the banal observation that people care far more | when their stuff is stolen than when some stranger has their | stuff stolen. | lerpgame wrote: | deworms wrote: | As an asie, this code is an unreadable mess, for a guy | brandishing his credentials even in his github username you'd | think he'd know a thing or two about clean code. | stonogo wrote: | Feel free to send patches. | deworms wrote: | Why would I waste time doing this? | kortilla wrote: | Because you were already willing to waste time panning his | code on a public forum. Maybe do something constructive | instead of destructive if your time is so precious. | [deleted] | ahmedbaracat wrote: | " AI-focused products/startups lack a business model aligning the | incentives of both the company and the domain experts (Data | Dignity)" | | https://blog.barac.at/a-business-experiment-in-data-dignity | | Yes I am quoting myself | faeriechangling wrote: | Not your repo, not your code. | | I celebrate Microsofts shameless plundering of Github to create | new products that increase productivity. The incredible thing is | that people trusted Microsoft to use their code on their terms to | begin with. This is a company who has been finding ways to make | open source code into a proprietary product since the 90s. | | Nobody can stop people from replicating what Microsoft did in the | long run anyways. Eventually any consumer with enough access to | source code will be able to make their own copilot. Even if | copilot is criminalised Microsoft can just sell access to the | entire GitHub dataset and let other people commit the "crime". | Then you're right back where we started with having to sue the | end users of copilot for infringement instead of Microsoft. | | Use private repos or face the inevitability that copilot-like | products will scrape your code. | ilrwbwrkhv wrote: | Of course it does. What are you going to do? Sue them? | ralph84 wrote: | Ok. So instead of whining about it on Twitter sue GitHub. No | matter what you think of Copilot, establishing some case law on | AI-generated code will be beneficial to everyone. | mjr00 wrote: | Whining about it on Twitter = free and easy | | Suing Github = signing up for a ~decade long incredibly | expensive and time-consuming legal battle against one of the | richest companies in the world | | There may be a slight difference in effort between these two | options. | anonydsfsfs wrote: | Not to mention Microsoft could countersue using their | enormous patent war chest, which they have a history of | doing[0] | | [0] https://techcrunch.com/2012/03/22/microsoft-and-tivo- | drop-th... | ghaff wrote: | It goes beyond code. Also photos, art, text, etc. Be careful | what you wish for. Whether you like it or not, with a stroke of | a pen Congress or the Supreme Court in the US could probably | wipe out the legal use of a huge amount of the training data | used for ML. | adastra22 wrote: | Good. | Jevon23 wrote: | Good! Large corporations shouldn't be able to profit off of | other people's data without consent or compensation. | drstewart wrote: | Great! I assume you believe all search engines should be | illegal then? | belorn wrote: | Accessing a computer system without permission is | illegal. Search engines operate under the assumption that | they have permission to access any public available | server unless explicitly forbidden. | | If a company or person assume they got copyright | permission to any work public accessible then they will | quickly find out that such assumption is wrong, and that | they require explicit permission. | ghaff wrote: | >Search engines operate under the assumption that they | have permission to access any public available server | unless explicitly forbidden. | | And why should opt-out be a reasonable norm? To be clear, | the internet (among many other things) breaks down if | every exchange of information is opt-in. Sharing of | photographs taken in public places is another example. | But the internet basically functions because people share | information on an opt-out basis (that may or may not even | be respected). | ghoward wrote: | Search engines don't sell the information of others; they | sell certain _metadata_ of that information, namely, the | _location_ of that information. | ghaff wrote: | And excerpts of that information in many cases. | res0nat0r wrote: | The repo he linked to on twitter is a public repo though. Am I | missing something? | | https://twitter.com/DocSparse/status/1581462433335762944 | tpxl wrote: | Public != copyright free. | taspeotis wrote: | > The repo he linked to on twitter is a public repo though. Am | I missing something? | | I dunno the title says it used public code when it was meant to | block public code. | kurtoid wrote: | I think they're more concerned about it repeating code w/o | ownership/copyright labels | Waterluvian wrote: | I think people may be drastically over-valuing their code. If it | was emitting an entire meaningful product, that would be | something else. But it's emitting nuts and bolts. | | If the issue is more specifically copyright infringement, then | leverage the legal apparatus in place for that. Their lawyers | might listen better. | | This is not a strongly held opinion and if you disagree I would | love to hear your constructive thoughts! | jacooper wrote: | I mean it starts like this, but if Copilot gets a pass, | companies might just use AI as a way to launder code and avoid | complying with Free licenses. | chiefalchemist wrote: | To some extent I agree with your opening. That is, plenty of | cases CP is showing how mundane most code is. It's one | commodity stitched to another stitched to another. | | That's not considering any legal / license issues, just a | simple statement about the data used to train CP. | mjr00 wrote: | Same issue with Stable Diffusion/NovelAI and certain people's | artwork (eg Greg Rutkowski) being obviously used as part of the | training set. More noticeable in Copilot since the output needs | to be a lot more precise. | | Lawmakers need to jump on this stuff ASAP. Some say that it's no | different from a person looking at existing code or art and | recreating it from memory or using it as inspiration. But the law | changes when technology gets involved already, anyway. There's no | law against you and I having a conversation, but I may not be | able to record it depending on the jurisdiction. Similarly, | there's no law against you looking at artwork that I post online, | but it's not out of question that a law could exist preventing | you from using it as part of an ML training dataset. | SrslyJosh wrote: | > Some say that it's no different from a person looking at | existing code or art and recreating it from memory or using it | as inspiration. | | Hah, no, the model encodes the code that it was trained on. | This is not "recreating from memory", this is "making a copy of | the code in a different format." (Modulo some variable | renaming, which it's probably programmed to do to in order to | obscure the source of the code.) | CapsAdmin wrote: | I would imagine the root problem here is people taking | copyrighted code, pasting it in their project and disregarding | the license. To me this seems common, especially when it comes to | toy, test and hobby projects. | | I don't see how copilot or similar tools can solve this problem | without vetting each project. | yjftsjthsd-h wrote: | That's an entirely plausible explanation, but it doesn't mean | that Microsoft has any less of a legal nightmare on their | hands. | CapsAdmin wrote: | I'm not really sure what I think about this. How responsible | should Microsoft be for someone's badly licensed code on | their platform? If they somehow had the ability to ban | projects using stolen snippets of code, I don't think I'd | dare to host my hobby projects there. | | If you can't trust that the code in a project is compatible | with the license of the project then the only option I see is | that copilot cannot exist. | | I love free software and whatnot, but I have a feeling this | situation would've been quite different if copilot was made | by the free software community and accidentally trained on | some non free code.. | yjftsjthsd-h wrote: | > I love free software and whatnot, but I have a feeling | this situation would've been quite different if copilot was | made by the free software community and accidentally | trained on some non free code.. | | _Precisely._ Would it be okay for me to publish some code | as GPL because my buddy gave it to me and promised that it | was totally legit and I could use it and it definitely wasn | 't copy-pasted from one of the Windows source leaks? | | > If you can't trust that the code in a project is | compatible with the license of the project then the only | option I see is that copilot cannot exist. | | It might be possible to feed it only manually-vetted | inputs, but yes; as it currently is, Copilot appears to be | little but a massive copyright-infringement engine. | CapsAdmin wrote: | > Precisely. Would it be okay for me to publish some code | as GPL because my buddy gave it to me and promised that | it was totally legit and I could use it and it definitely | wasn't copy-pasted from one of the Windows source leaks? | | But where do you draw the line? What if you accidentally | came up with the same or similar solution to something in | windows? The code might not be from your friend either, | it could be from N steps of copy paste, rework, | reformating, refactoring, etc. | yjftsjthsd-h wrote: | > But where do you draw the line? What if you | accidentally came up with the same or similar solution to | something in windows? | | Yes, I agree that it's unclear how to deal with that in | the general case at scale. Although cases like OP make me | think that we could maybe worry about the grey area after | we've dealt with the blatant copies. | | > The code might not be from your friend either, it could | be from N steps of copy paste, rework, reformating, | refactoring, etc. | | Well, my personal tendency would be to apply the same | standard to Microsoft that they would apply to us. How | many steps of removal is needed to copy MS proprietary | code and it be okay? | [deleted] | williamcotton wrote: | Is the code in question even covered by copyright in the first | place? It seems utilitarian in nature. | | Oh, the comments! Those are covered by copyright for sure. | williamcotton wrote: | You know, I make it a habit of not trying to get upset by | downvotes but this is really absurd. What am I saying that is | incorrect? Am I being rude? What exactly do you disagree with? | williamcotton wrote: | Like, should I just stop interacting with people on this | website? Is that the intent? To make me just go away? ___________________________________________________________________ (page generated 2022-10-16 23:00 UTC)