[HN Gopher] In the LLM space, "open source" is being used to mea...
       ___________________________________________________________________
        
       In the LLM space, "open source" is being used to mean "downloadable
       weights"
        
       Author : FanaHOVA
       Score  : 289 points
       Date   : 2023-07-21 15:49 UTC (7 hours ago)
        
 (HTM) web link (www.alessiofanelli.com)
 (TXT) w3m dump (www.alessiofanelli.com)
        
       | spullara wrote:
       | It remains to be seen in court whether weights are even
       | copyrightable potentially making all the various licenses and
       | their restrictions moot.
        
         | humanistbot wrote:
         | And it also remains to be seen if various legislatures will
         | pass laws that explicitly declare the copyright status of model
         | weights. It is important to remember that what is or is not
         | copyrightable can change.
        
           | rvcdbn wrote:
           | At least in the US copyright is established by the
           | constitution so not sure how much it's possible to change via
           | the normal legislative process.
        
             | gpm wrote:
             | The US constitution grants congress the ability to create
             | copyright ("To promote the progress of science and useful
             | arts, by securing for limited times to authors and
             | inventors the exclusive right to their respective writings
             | and discoveries"), but it doesn't create copyright law
             | itself. That's a broad clause that gives Congress pretty
             | free reign to change how copyright is defined.
        
               | rvcdbn wrote:
               | Constitutionality is also about how previous cases have
               | been evaluated for example see the bit about how
               | photography copyright was established here: https://const
               | itution.congress.gov/browse/essay/artI-S8-C8-3-...
        
               | rvcdbn wrote:
               | specifically:
               | 
               | > A century later, in Feist Publications v. Rural
               | Telephone Service Co., the Supreme Court confirmed that
               | originality is a constitutional requirement
        
         | ljdcfsafsa wrote:
         | 1. Why wouldn't they be and 2. Does that even matter? If you
         | enter into a contract saying don't do X, and you do X, you're
         | violating the contract.
        
           | sebzim4500 wrote:
           | I assume GP was talking about a scenario in which you had not
           | entered into a contract with Meta. E.g. if I just downloaded
           | the weights from someone else.
        
           | rvcdbn wrote:
           | 1 - because they lack originality, see: https://constitution.
           | congress.gov/browse/essay/artI-S8-C8-3-...
        
         | dvdkon wrote:
         | In a similar vein, the common "you may not use this model's
         | output to improve another model" clause is AFAIK unenforceable
         | under copyright, so it's _at best_ a contractual clause binding
         | a particular user. Anyone using that improved model afterward
         | is in the clear.
        
           | ljdcfsafsa wrote:
           | > it's at best a contractual clause binding a particular
           | user. Anyone using that improved model afterward is in the
           | clear.
           | 
           | That's... not really accurate. See the concept of tortious
           | interference with a contract.
        
             | dvdkon wrote:
             | Hm, I don't know much about common law, but I don't think
             | this would apply if, say, an ML enthusiast trained a model
             | from LLaMA2 outputs, made it freely available, then someone
             | else commercialised it. The later user never caused the
             | original developer to breach any contract, they simply
             | profited from an existing breach.
             | 
             | That said, doing this inside one company or with
             | subsiduaries probably wouldn't fly.
        
           | taneq wrote:
           | And of course anyone using a model improved by this is
           | entirely unworried by these clauses if their improved model
           | takes off hard.
        
           | banana_feather wrote:
           | The idea is that if you violate the terms of the license to
           | develop your own model, you lose your rights under the
           | license and are creating an infringing derivative work. If I
           | clone a GPL'd work and ship a derivative work under a
           | commercial license, downstream users can't just integrate the
           | derivative work into a product without abiding by the GPL
           | terms and say "well we're downstream relative to the party
           | who actually copied the GPL'd work, so the GPL terms don't
           | apply to us".
        
             | dvdkon wrote:
             | Thing is, the outputs of a computer program aren't
             | copyrightable, so it doesn't matter if your improved model
             | is a derivative work. What you say would apply if you
             | derived something from the weights themselves (assuming
             | they are copyrightable, of course).
        
             | blendergeek wrote:
             | If such a "derivative" model is a derivative work, then
             | aren't all these LLMs just mass copyright infringement?
        
               | banana_feather wrote:
               | At the end of the day it's not black and white, but
               | there's a large and obvious difference in degree that
               | would plausibly permit someone to find that one is and
               | the other isn't. It's fairly easy to argue that using the
               | outputs of LLM X to create a slightly more refined LLM Y
               | creates a derivative work. The argument that a model is a
               | derivative work relative to the training data is not so
               | clear cut.
        
               | dragonwriter wrote:
               | If model weights aren't copyrightable, derivative model
               | weights are not a "work", derivative or otherwise, for
               | copyright purposes.
               | 
               | If they are, and the license allows creating finetuned
               | models but not using the output to improve the model,
               | then the derived model is not a violation, but it might
               | be a derivative work.
        
               | dTal wrote:
               | Exactly this. What's good for the goose is good for the
               | gander!
        
             | rodoxcasta wrote:
             | If the weights are not copyrighteable, you don't need a
             | licence do use them, they are just data. There's not a
             | right to infringe if these numbers have no author. Of
             | course, to use openAI API you must abide to their terms.
             | But if you publish your generations and I download them, I
             | have nothing to do to the contract you have with openAI
             | since I'm no part of it. They can't impede me to use it to
             | improve my models.
        
             | diffeomorphism wrote:
             | Really?
             | 
             | Your customers bought that product under license A.
             | Afterwards it turned out that you pirated some artwork from
             | disney. Then your customer can sue you (not disney) to make
             | things right. The specific license of the original work
             | seems quite irrelevant here.
        
               | pessimizer wrote:
               | Not at all. The reason your customer can sue you is
               | because Disney can sue your customer. Disney would be
               | suing your customer under the specific license of the
               | original work.
               | 
               | edit: you seem to see the customer as the primary victim
               | here instead of Disney, but if Disney weren't a victim
               | the customer wouldn't have a case.
        
             | stale2002 wrote:
             | No, because the premise of the hypothetical is that the
             | weights aren't protected by copyright.
             | 
             | So, no matter what they TOS says, it's not an infringing
             | work.
             | 
             | > Downstream users can't just integrate the derivative work
             | into a product without abiding by the GPL terms
             | 
             | You absolutely could do this if the original work is not
             | protected by copyright, or if you use it in a way that is
             | transformative and fair use.
        
               | mattl wrote:
               | Something under the GPL is also copyrighted. The GPL is a
               | copyright license.
        
               | stale2002 wrote:
               | If the underlying work is not protected by copyright, it
               | doesn't matter what license someone tries to put on it.
               | 
               | Similarly, if someone creates a fair use/transformative
               | work then the license can also be ignored.
        
         | FanaHOVA wrote:
         | Yep, same with SSPL. GPL has been tested in FSF vs Cisco
         | (2008), but none of the more restrictive licenses have.
        
         | jrockway wrote:
         | It seems like a dangerous clause to me.
         | 
         | 1) "Dear artists, the model cannot infringe upon your copyright
         | because it's merely learning like a human does. If it
         | accidentally outputs parts of your book, you know, it just
         | accidentally plagiarized. We all do it haha! Our attorneys
         | remind you that plagiarism is not illegal in the US."
         | 
         | 2) "Dear engineers, the output of our model is copyrighted and
         | thus if you use it to train your own model, we own it."
         | 
         | I am not sure how both of those can be true at the same time.
        
           | jimmaswell wrote:
           | We all truly do "accidentally plagiarize", especially
           | artists. Many guitarists realize they accidentally copied a
           | riff they thought they'd come up with on their own for
           | example.
        
             | jrockway wrote:
             | I, for one, welcome our new plagiarism overlords.
             | 
             | Oops.
             | 
             | I added the "haha" in there because the probability of a
             | human doing this kind of goes way down as the length of the
             | text increases. Can you type, verbatim, an entire chapter
             | of a book? I can't. But, I bet the AI can be convinced in
             | rare cases to do that.
             | 
             | The whole thing is very interesting to me. There was an
             | article on here a couple days ago about using gzip as a
             | language model. Of course, gzipping a book doesn't remove
             | the copyright. So how low does the probability of
             | outputting the input verbatim have to be before copyright
             | is lost?
             | 
             | Reading the book and benefitting from what you learned?
             | Obviously not copyright infringement. Putting the book into
             | gzip and sending your friend the result? Obviously
             | copyright infringement. Now we're in the grey area and ...
             | nobody knows what the law is, or honestly, even how to
             | reason about what the law wants here. Fun times.
             | 
             | (Personally, I lean towards "not copyright infringement",
             | but I'm not a big believer in copyright myself. In the case
             | of AI training, it just makes it impossible for small
             | actors to compete. Google can just buy a license from every
             | book distributor. SmolStartup can't. So if we want to make
             | AI that is only for the rich and powerful, copyright is the
             | perfect tool to enable that. I don't think we want that,
             | though.
             | 
             | My take is that the rest of society kind of hates Tech
             | right now ("I don't really like my Facebook friends, so
             | someone should take away Mark Zuckerberg's money."), so
             | it's likely that protectionist laws will soon be created
             | that ruin it for everyone. The net effect of that is that
             | Europe and the US will simply flat-out lose to China, which
             | doesn't care about IP.)
        
               | spullara wrote:
               | There are people that can type, verbatim, the entire
               | chapters of books.
        
             | Der_Einzige wrote:
             | The overwhelming majority of all human advancement is in
             | the form of interpolation. Real extrapolation is extremely
             | rare and most don't even know when it's happening. This is
             | why it's extremely hypocritical for artists of any sort to
             | be upset about Generative AI. Their own minds are doing the
             | same exact thing they get upset about the model doing.
             | 
             | This is why fundamental "interpolative" techniques like
             | ChatGPT (whose weights are in theory frozen) is still
             | basically super-intelligent.
        
               | polotics wrote:
               | Wow you appear to know a great deal about how human minds
               | work: "doing the same exact thing they get upset about
               | the model doing"... May I query you put up a list of
               | publications on the subject of how minds work?
        
               | Der_Einzige wrote:
               | My insights are widely accepted theories from various
               | fields, all available in the public domain.
               | 
               | It's a well-understood concept that our minds function by
               | making sense of the world through patterns. This is the
               | essence of interpolation - taking two known points and
               | making an educated guess about what lies in between. Ever
               | caught yourself finishing someone's sentence in your mind
               | before they do? That's your brain extrapolating based on
               | previous patterns of speech and context. These processes
               | are at the heart of human creativity.
               | 
               | The field of Cognitive Science has extensively documented
               | our tendency for interpolation and pattern recognition.
               | Works like The Handbook of Imagination and Mental
               | Simulation by Markman and Klein, or even "How Creativity
               | Works in the Brain" by the National Endowment for the
               | Arts all attest to this.
               | 
               | When artists create, they draw from their experiences,
               | their knowledge, their understanding of the world - a
               | process overwhelmingly of interpolation.
               | 
               | Now, I can see how you might be confused about my
               | reference to ChatGPT being "super-intelligent". Perhaps
               | "hyper-competent" would be more appropriate? It has the
               | ability to generate text that appears intelligent because
               | it's interpolating from a massive amount of data - far
               | more than any human could consciously process. It's the
               | ultimate pattern finder.
               | 
               | And that, my friend, is my version of "publications on
               | the subject of how minds work." I may not be an
               | illustrious scholar, but hey, even a clock is right twice
               | a day! And who knows, maybe I'm on to something after
               | all.
        
             | saghm wrote:
             | There was a famous case where John Fogerty (formerly of
             | Creedence Clearwater Revivial) ended up getting sued by
             | CCR's record label, claiming a later solo song he did with
             | a different label was too similar to a CCR song that he
             | wrote, and they won. So legally speaking, you can even get
             | in trouble for coming up with the same thing twice if don't
             | own the copyright of the first one.
        
               | rcxdude wrote:
               | The copyright situation with music is kinda broken,
               | different parts of the performance get quite different
               | priority when it comes to copyright (many core elements
               | of a performance get basically no protection, whereas the
               | threshold for what counds as a protectable melody is
               | absurdly low). Especially this means its less than
               | worthless for some genres/traditions: for jazz and blues,
               | especially, a huge part of the genre and culture is
               | adapting and playing with a shared language of common
               | riffs.
        
           | luma wrote:
           | 2) doesn't line up with the US court's current stance that
           | only a human can hold copyright, and thus anything created by
           | a not-human cannot have copyright applied. This applies to
           | animals, inanimate objects, and presumably, AI.
           | 
           | I have no idea how this impacts the encodability of the
           | license from FB which may rely on things other than
           | copyright, but as of right now, the output absolutely cannot
           | be copyrighted.
        
             | jrockway wrote:
             | That's an extremely good point. The output of software is
             | never copyrightable. What makes language models not
             | software?
        
               | danielbln wrote:
               | Isn't Photoshop software?
        
               | pessimizer wrote:
               | Photoshop's output has been completely guided (until
               | recent additions) by a human who can hold a copyright.
               | 
               | That being said, isn't a prompt guidance?
        
         | sangnoir wrote:
         | If they are nor copyrightable, that'll be the end of publicly-
         | released weights by for-profit companies. All subsequent models
         | will be served behind an API.
        
           | dragonwriter wrote:
           | > If they are nor copyrightable, that'll be the end of
           | publicly-released weights by for-profit companies
           | 
           | I don't see why, for-profit companies release permissively-
           | licensed ooen-source code all the time, and noncopyrightable
           | models aren't practically much different than that.
        
             | bilbo0s wrote:
             | Because the courts will have determined their business
             | models for them.
             | 
             | As mercenary as it may sound, what these companies are
             | trying to do is find a business model that is as friendly
             | to themselves as it is hostile to their competitors.
             | 
             | This is all part of the jockeying.
        
               | dragonwriter wrote:
               | And, sure, lack of copyrightability changes the
               | parameters and will change behavior. What I think you
               | have failed to support is that the _particular_ change
               | that it will induce will eliminate all such releases.
        
             | sangnoir wrote:
             | I debated whether to be more specific and verbose in my
             | earlier comment and brevity won at the expense of clarity.
             | I meant large models that cost 6 or 7 digits to train
             | likely won't be released if the donor company can't control
             | how the models are used.
             | 
             | > I don't see why, for-profit companies release
             | permissively-licensed ooen-source code all the time
             | 
             | I agree with this - however, they tend to open-source non-
             | core components - Google won't release search engine code,
             | Amazon wont release scalable-virtualization-in-a-box, etc.
             | 
             | I'm confident that Facebook won't release a hypothetical
             | Llama 5 in a manner that enables it to be used to improve
             | ChatGPT 8 - the aim will be unchanged from today, byt the
             | mechanism will shift from licensing to rate-limiting,
             | authentication & IP-bans.
        
         | weinzierl wrote:
         | I find the idea that weights are not copyrightable very
         | fascinating - appealing even. I have a hard time imagining a
         | world where this is the case, though.
         | 
         | Can you summarize why weights would not be copyrightable or
         | give me pointers to sources that support that view.
        
           | cbm-vic-20 wrote:
           | An analog to this might be the settings of knobs and switches
           | for an audio synthesizer, or guitar effects settings. If you
           | wanted to get the "Led Zeppelin sound" from a guitar, you
           | could take a picture of the knobs on the various pedals and
           | their configuration, and replicate that yourself. You then
           | create a new song that uses those settings. Is that something
           | that is allowed under copyright?
           | 
           | What if there were billions of knobs, tuned after years of
           | feedback and observations of the sound output?
        
             | paxys wrote:
             | I don't think that's a good analogy. A piano has N keys.
             | You can press certain ones in certain combinations and
             | write it down. That result is still copyrightable, because
             | you can prove that it was an original and creative work.
             | Setting knobs for a machine is no different, but the key
             | differentiator is if you did it yourself or if an algorithm
             | did it for you.
        
               | cbm-vic-20 wrote:
               | In my analogy, it's not the sequence of the notes or the
               | composition, which I agree is copyrightable. But are the
               | settings of the knobs and switches on synthesizers and
               | effects devices used in a recording equivalent to the
               | weights of a neural network or LLM? And if so, are those
               | settings or weights copyrighitable?
        
             | rvcdbn wrote:
             | That's a bad analogy because a human chose the values of
             | those settings using their creative mind. That's not at all
             | the case with weights. This originality is the heart of
             | copyright law.
        
           | slimsag wrote:
           | Speculating (I am not a lawyer) I see two options:
           | 
           | 1. Model weights are the output of mathematical principles,
           | in the US facts are not copyrightable, so in general math is
           | not copyrightable.
           | 
           | 2. Model weights are the derivative work of all copyrighted
           | works it was trained on - in which case, it would be similar
           | to creating a new picture which contains every other picture
           | in the world inside of it. Who is the copyright owner? Well,
           | everyone, since it includes so many other copyright holders'
           | works in it.
        
             | humanistbot wrote:
             | Your second argument, if true, disproves your first
             | argument.
        
               | slimsag wrote:
               | Doesn't matter. A court decides in the end, and the two
               | choices I presented could lead to OPs scenario. If a
               | court decides that, they decide that, period. I'm not
               | 'making an argument' with those points - I'm presenting
               | options a court might choose from when setting precedent.
        
             | FishInTheWater wrote:
             | Remember that database rights are a thing.
             | 
             | One cannot hold copyright facts, but one can "copyright" a
             | collection of facts like a search index or a map.
        
             | earleybird wrote:
             | Your second question asks: "Who owns the Infinite
             | Library[0]?"
             | 
             | related, there was a presentation (i've lost the reference)
             | on automatic song (tune?) generation where the presenter
             | claimed (rather humourusly) that he'd generated all the
             | songs that had ever been and will ever be so that while he
             | was infringing on a large but finite number of songs, he
             | was non infringing on an infinite number of future songs.
             | So, on balance he was in a favourable position.
             | 
             | [0] https://en.wikipedia.org/wiki/The_Library_of_Babel
        
           | sebzim4500 wrote:
           | Generally the output of a machine is not copyrightable.
           | Similarly, the contents of a phone book is not copyrightable
           | in the US even if the formatting/layout is. So I could take a
           | phonebook and publish another one with identical phone
           | numbers as long as I laid it out slightly differently.
        
             | xxpor wrote:
             | Work also has to be "creative" in order for it to be
             | eligible for copyright. This is why photomasks have
             | special, explicit protection in US law; they're not really
             | "creative" in that way.
             | 
             | https://en.wikipedia.org/wiki/Integrated_circuit_layout_des
             | i...
        
             | cal85 wrote:
             | What about compiled binaries? If I write my own original
             | source code (and thus automatically own the copyright to
             | it), and compile it to binary, is the binary not protected
             | to?
        
               | sebzim4500 wrote:
               | No, because you the input to that process was a bunch of
               | work that you did.
               | 
               | In the case of an LLM, I don't think that the work of
               | compiling the training data probably would qualify by
               | analogy to the phonebook example.
        
             | humanistbot wrote:
             | By that logic, if you convert a copyrighted song or movie
             | from one codec to another, then that would not be
             | copyrightable because it is the output of a machine.
        
               | xxpor wrote:
               | The song itself isn't output by the machine.
        
               | humanistbot wrote:
               | Neither was the original training data, which was
               | copyrighted books, art, etc.
        
               | dragonwriter wrote:
               | > Neither was the original training data, which was
               | copyrighted books, art, etc.
               | 
               | If the original training data is a copyrightable
               | (derivative or not) work, perhaps eligible for a
               | compilation copyright, the model weights might be a form
               | of lossy mechanical copy of _that_ work, and be both
               | subject to its copyright and an infringing unauthorized
               | derivative if it is.
               | 
               | If its not, then I think even before fair use is
               | considered the only violation would be the weights
               | potentially infringing copyrights on original works, but
               | I don't think _incomplete_ copy automatically works for
               | them the way it would for an aggregate; I'd think you 'd
               | have to demonstrate reproduction of the creative elements
               | protected by copyright from _individual_ source works to
               | make the claim that it infringed them.
        
               | xxpor wrote:
               | The _output_ of the training though is unrecognizable.
        
               | SideburnsOfDoom wrote:
               | Sometimes, the output is a recognisable plagiarism of a
               | specific input.
               | 
               | If it isn't recognisable, then it's merely _distributed_
               | plagiarism. A million output, each of which are 0.0001%
               | plagiarising each of million inputs.
        
               | dragonwriter wrote:
               | It _isn't_ independently copyrightable.
               | 
               | Its a mechanical copy subject to the copyright on the
               | original, though.
        
               | danShumway wrote:
               | Correct that it would not be copyrightable, but you're
               | missing the point.
               | 
               | A codec conversion is not copyrightable. The original
               | _song_ which is still present enough in the conversion to
               | impact its ability to be distributed, is still
               | copyrightable. But you don 't get some kind of new
               | copyright just because did a conversion.
               | 
               | For comparison, if you take a public domain book off of
               | Gutenberg and convert it from an EPUB to a KEPUB, you
               | don't suddenly own a copyright on the result. You can't
               | prevent someone else from later converting that EPUB to a
               | KEPUB again. Copyright protects creative decisions, not
               | mathematical operations.
               | 
               | So if there is a copyright to be held on model weights,
               | that copyright would be downstream of a creative decision
               | -- ie, which data was it trained on and who owned the
               | copyright of the data. However, this creates a weird
               | problem -- if we're saying that the artifact of
               | performing a mathematical operation on a series of inputs
               | is still covered by the copyright of the components of
               | that database, then it's somewhat tricky to argue that
               | the creative decision of what to include in that database
               | should be covered by copyright but that copyrights of the
               | actual content in that database don't matter.
               | 
               | Or to put it more simply, if the database copyright
               | status impacts models, then that's kind of a problem
               | because most of the content of that training database is
               | unlicensed 3rd party data that is itself copyrighted. It
               | would absolutely be copyright infringement for
               | OpenAI/Meta to distribute its training dataset
               | unmodified.
               | 
               | AI companies are kind of trying to have their cake and
               | eat it too. They want to say that model weights are
               | transformed to such a degree that the original copyright
               | of the database doesn't matter -- ie, it doesn't matter
               | that the model was trained on copyrighted work. But they
               | also want to claim that the database copyright does
               | matter, that because the model was trained on a
               | collection where the decision of what to include in that
               | collection was covered by copyright, therefore the model
               | weights are copyrightable.
               | 
               | Well, which is it? If model weights are just a
               | transformation of a database and the original copyrights
               | still apply, then we need to have a conversation about
               | the amount of copyrighted material that's in that
               | database. If the copyright status of the database doesn't
               | matter and the resulting output is something new, then
               | no, running code on a GPU is not enough to grant you
               | copyright and never really has been. Copyright does not
               | protect algorithmic output, it protects human creative
               | decisions.
               | 
               | Notably, even if the copyright of the database was enough
               | to add copyright to the final weights and even if we
               | ignore that this would imply that the models themselves
               | are committing copyright infringement in regards to the
               | original data/artwork -- even in the best case scenario
               | for AI companies, that doesn't mean the weights are fully
               | protected because the only copyright a company can claim
               | is based on the decision of what data they chose to
               | include in the training set.
               | 
               | A phone book is covered by copyright if there are
               | creative decisions about how that phone book was
               | compiled. The numbers within the phone book are not.
               | Factual information can not be copyrighted. Factual
               | observations can not be copyrighted. So we have to ask
               | the same question about model weights -- are individual
               | model weights an artistic expression or are they a fact
               | derived from a database that are used to produce an
               | output? If they're not individually an artistic
               | expression, well... it's not really copyright
               | infringement to use a phone book as a data reference to
               | build another phone book.
        
           | paxys wrote:
           | It's a complicated question and I don't think anyone can give
           | a clear yes or no answer before some court has ruled on it.
           | One school of thought is that copyright is designed to
           | protect original works of creativity, but weights are
           | generated by an algorithm and not direct human expression. AI
           | generated art, for example, has already been ruled ineligible
           | for copyright.
        
           | rvcdbn wrote:
           | I have a hard time imagining a world where it is not the case
           | at least in the US i.e. where copyright is extended to a work
           | with no originality in direct contradiction to copyright
           | clause in the constitution.
        
             | bilbo0s wrote:
             | It's all kind of irrelevant. If they are not copyrightable,
             | then most companies will simply hide them behind an API.
             | There is no law saying these companies _must_ release their
             | weights. The companies are releasing their weights because
             | they felt they could charge for and control other things.
             | Like the output from their models.
             | 
             | If they can't charge for and control those other things,
             | then we'll likely see far fewer companies releasing
             | weights. Most of this stuff will move behind APIs in that
             | scenario.
        
               | rvcdbn wrote:
               | Maybe, maybe not. Companies are not monoliths. For all we
               | know, internally it's already well known that model
               | weights likely aren't copyrightable and the only reason
               | for the restrictions is to give the appearance of being
               | responsible to appease the AI doomers.
        
           | appplication wrote:
           | Let's take a simple linear regression model with a handful of
           | parameters. The weights could be an array of maybe 5 numbers.
           | Should that be copyrightable? What if someone else uses the
           | same data sources (e.g. OSS data sets) and architecture and
           | arrives at the same weights? Is this a Copyright violation?
           | 
           | Let's talk about more complex models. What if my model shares
           | 5% of the same weights with your model? What about 50%? What
           | about 99%? How much do these have to change before you're in
           | the clear? What if I take your exact model and run it through
           | some extra layers that don't do anything, but dilute the
           | significance of your weights?
           | 
           | It's a murky area, and I'm inclined to think copyright is not
           | at all the right tool to handle the legality of these models
           | (especially given the glaring irony they are almost all
           | trained using copyrighted material). Patents, perhaps better
           | suited, but I'm also not sold.
        
       | paulmd wrote:
       | > While it's mostly open, there are caveats such as you can't use
       | the model commercially if you had more than 700M MAUs as of the
       | release date, and you also cannot use the model output to train
       | another large language model. These types of restrictions don't
       | play well with the open source ethos
       | 
       | No, CC-NC-ND is a thing, and even GPL applies restrictions on
       | derivation as well.
       | 
       | "Open source" doesn't mean BSD/MIT. There is even open-source
       | that you cannot freely redistribute at all - not all open-source
       | is FOSS!
       | 
       | I always think it's a testament to how much copyleft has
       | succeeded that in many cases people think of GPL and BSD/MIT as
       | being the baseline.
        
       | Taek wrote:
       | I didn't realize that the llama license forbids you from using
       | its outputs to train other models. That's essentially a
       | dealbreaker, synthetic data is going to be the most important
       | type of training data from here on out. Any model that prohibits
       | use of synthetic data to train new models is crippled.
        
         | Der_Einzige wrote:
         | It's exactly the opposite. We have better ways to combine the
         | knowledge of several models together than sampling them. (i.e.
         | mixture of experts, model merges, etc) Relying on synthetic
         | data from one LLM to train another LLM is in general a terrible
         | idea and will lead to a race to the bottom.
        
         | zarzavat wrote:
         | A contract ordinarily has to have consideration. Since LLaMa
         | weights are not copyrightable by Meta and are freely available,
         | what exactly is the consideration? The bandwidth they provide?
        
         | SanderNL wrote:
         | Good luck enforcing that, though. How would they ever know?
        
           | denlekke wrote:
           | i wonder if they could include some marker prompt and
           | response that wouldn't occur "naturally" from any other model
           | or training data
        
             | ortusdux wrote:
             | https://en.wikipedia.org/wiki/Trap_street
        
               | nsplayer wrote:
               | They could have picked up the LLM equivalent from LLM
               | generated posts online however. How do you prove they
               | didn't?
        
               | denlekke wrote:
               | as a layman, i imagine for someone at the scale required
               | it may not be worth the risk or the added effort vs
               | paying or using a different model but it'd be funny if we
               | see companies creating a subsidiary that just acts as a
               | web-passthrough to "legalize" llama2 output as training
               | data
        
               | mcny wrote:
               | Level1Techs "link show" (because we can't call it news
               | anymore) kind of touched this topic. I would like to read
               | what you guys make of this:
               | 
               | > Supreme Court rejects Genius lawsuit claiming Google
               | stole song lyrics SCOTUS won't overturn ruling that US
               | copyright law preempts Genius' claim.
               | 
               | > The song lyrics website Genius' allegations that Google
               | "stole" its work in violation of a contract will not be
               | heard by the US Supreme Court. The top US court denied
               | Genius' petition for certiorari in an order list issued
               | today, leaving in place lower-court rulings that went in
               | Google's favor.
               | 
               | > Genius previously lost rulings in US District Court for
               | the Eastern District of New York and the US Court of
               | Appeals for the 2nd Circuit. In August 2020, US District
               | Judge Margo Brodie ruled that Genius' claim is preempted
               | by the US Copyright Act. The appeals court upheld the
               | ruling in March 2022.
               | 
               | > "Plaintiff's argument is, in essence, that it has
               | created a derivative work of the original lyrics in
               | applying its own labor and resources to transcribe the
               | lyrics, and thus, retains some ownership over and has
               | rights in the transcriptions distinct from the exclusive
               | rights of the copyright owners... Plaintiff likely makes
               | this argument without explicitly referring to the lyrics
               | transcriptions as derivative works because the case law
               | is clear that only the original copyright owner has
               | exclusive rights to authorize derivative works," Brodie
               | wrote in the August 2020 ruling.
               | 
               | > Google search results routinely display song lyrics via
               | the service LyricFind. Genius alleged that LyricFind
               | copied Genius transcriptions and licensed them to Google.
               | 
               | > Brodie found that Genius' claim must fail even if one
               | accepts the argument that it "added a separate and
               | distinct value to the lyrics by transcribing them such
               | that the lyrics are essentially derivative works." Since
               | Genius "does not allege that it received an assignment of
               | the copyright owners' rights in the lyrics displayed on
               | its website, Plaintiff's claim is preempted by the
               | Copyright Act because, at its core, it is a claim that
               | Defendants created an unauthorized reproduction of
               | Plaintiff's derivative work, which is itself conduct that
               | violates an exclusive right of the copyright owner under
               | federal copyright law," Brodie wrote.
               | 
               | https://arstechnica.com/tech-policy/2023/06/supreme-
               | court-re...
        
               | rcxdude wrote:
               | The basic idea is whether an unauthorised derivative work
               | is itself entitled to copyright protection: could the
               | creator of the derivative work prevent copying by the
               | original creator (or anyone else) of the work on which it
               | is based, even though they themselves have no permission
               | to distribute it? (if the work is authorised, this is
               | generally considered to be the case). It looks like from
               | this the conclusion is 'no', at the very least in this
               | case. I'm not sure this matches most people's moral
               | intuitions: every now and again a big company includes
               | some fan art in their own official release without
               | permission (usually not as a result of a general policy,
               | but because of someone getting lazy and the rest of the
               | system failing to catch it), and generally speaking the
               | reaction is negative.
        
               | joshuaissac wrote:
               | > whether an unauthorised derivative work is itself
               | entitled to copyright protection
               | 
               | That is not what this court case was about. Genius had
               | already settled the case of unauthorised transcriptions
               | and had bought licences for its lyrics after a lawsuit
               | 2014, so its own work was no longer unauthorised. In the
               | case cited above, Genius was trying to enforce its claims
               | against Google via contract law rather than copyright
               | law. The court ruled that the alleged violations were
               | covered by copyright law, so they could only pursued via
               | copyright law, and that only the copyright holder (or
               | assignee) of the lyrics that were copied could sue Google
               | under it.
        
           | criddell wrote:
           | Disgruntled current or former employee turning in their
           | employer for the reward? That's how Microsoft and the BSA
           | used to bust people before the days of always online
           | software.
        
         | moffkalast wrote:
         | I'm not sure why anyone would even do that in the first place,
         | LLama doesn't generate synthetic data that would be even
         | remotely good enough. Even GPT 3.5 and 4 are already very
         | borderline for it, with lots of wrong and censored answers. And
         | at best you make a model that's as good as LLama is, i.e. not
         | very.
        
           | jstarfish wrote:
           | Instruction-tuning is the obvious use case. That much has
           | nothing to do with subjectivity, alignment or censorship,
           | it's will-you-actually-show-this-as-JSON-if-asked.
        
             | moffkalast wrote:
             | That's tuning llama which is allowed from what I
             | understand. Otherwise why release it at all, it's not very
             | functional in its initial state anyway. What that applies
             | to is using llama outputs to train a completely new base
             | model which makes no practical sense.
             | 
             | As for generating jsons, that's more of a inference runtime
             | thing, since you need to pick the top tokens that result a
             | valid json instead of just hoping it returns something that
             | can be parsed. On top of extensive tuning of course.
        
         | lolinder wrote:
         | Not that it's okay for this to be in the license, but I'm
         | curious: what is the use case for synthetic data? Most of the
         | discussion I've seen has been about how to avoid accidentally
         | using LLM-generated data.
        
           | lmeyerov wrote:
           | Tuning a tiny classifier
        
         | dheera wrote:
         | > forbids you from using its outputs to train other models.
         | 
         | I don't know how one can even forbid this. As a human, I'm a
         | walking neural net, and I train myself on everything that I
         | see, without a choice. The only difference is I'm a carbon-
         | based neural net.
        
         | 6gvONxR4sf7o wrote:
         | It's hilarious that big players in this space seem to think
         | these are consistent views:
         | 
         | - It's okay to train a model on arbitrary internet data without
         | permission/license just because you can access it
         | 
         | - It's not okay train a model on our model
        
           | realusername wrote:
           | Yes, they have to pick one or the other. Until then I'm going
           | to assume that the model licence doesn't apply since the
           | first point would be invalid and the model could not be built
           | in the first place.
        
           | lhnz wrote:
           | It tells you that they think their moat is data
           | quality/quantity.
        
           | torstenvl wrote:
           | Those are perfectly consistent, despite what ideologically-
           | driven people may want to believe.
           | 
           | Copyright is literally the right to copy. Arbitrary Internet
           | data that is not _copied_ does not have any copyright
           | implications.
           | 
           | The difference is that LLaMa imposes additional contractual
           | obligations that, for ideological reasons (Freedom #0), open
           | source software does not.
           | 
           | This issue reminds me of the FSF/AGPL situation. At some
           | point you just have to accept that copyright law, in and of
           | itself, is not sufficient to control what people _do_ with
           | your software. If you want to do that, you have to limit end-
           | user freedom with an EULA.
           | 
           | If someone uses LLaMa output to train models, it is unlikely
           | they will be sued for copyright infringement. It is far more
           | likely they will be sued for breach of contract.
        
             | danShumway wrote:
             | > Arbitrary Internet data that is not copied does not have
             | any copyright implications.
             | 
             | Training a model on model output isn't copying.
             | 
             | There's no way to phrase this where training a model on
             | copyrighted _human_ -generated images/text isn't copying,
             | but training a model on _computer_ -generated images/text
             | is copying.
             | 
             | > If you want to do that, you have to limit end-user
             | freedom with an EULA.
             | 
             | If you want to limit end-user freedom with a EULA, you have
             | to figure out how to get users to sign it. Copyright is one
             | way to force them to do so, but doesn't really seem
             | relevant to this situation if training a model on
             | copyrighted material is fair use.
             | 
             | And again, if somebody generates a giant dataset with
             | LLaMA, if you want to argue that pushing that into another
             | LLM to train with is making a copy of that data, then
             | there's no way to get around the implication there that
             | training on a human-generated image is also making a copy
             | of that image.
        
               | [deleted]
        
               | [deleted]
        
               | torstenvl wrote:
               | > _Training a model on model output isn 't copying._
               | 
               | That's literally what I said.
               | 
               | > _There 's no way to phrase this where training a model
               | on copyrighted human-generated images/text isn't copying,
               | but training a model on computer-generated images/text is
               | copying._
               | 
               | Literally nobody is saying that.
               | 
               | > _If you want to limit end-user freedom with a EULA, you
               | have to figure out how to get users to sign it._
               | 
               | That is not true. ProCD v. Zeidenberg, 86 F.3d 1447 (7th
               | Cir. 1996).
               | 
               | You and others seem to have an over-the-top hostile
               | reaction to the idea that contract law can do things
               | copyright law cannot do. But it is objective and
               | unarguable fact.
        
               | danShumway wrote:
               | > Literally nobody is saying that.
               | 
               | Okay? Apologies for making that assumption. But if you're
               | not saying that, then your position here is even less
               | defensible. Arguing that model output isn't copyrightable
               | but that it's still covered by EULA if anyone anywhere
               | tries to use it is even more absurd than arguing that
               | it's covered by copyright. The interpretation that this
               | is covered by copyright is arguably the charitable
               | interpretation of what you wrote.
               | 
               | > That is not true. ProCD v. Zeidenberg, 86 F.3d 1447
               | (7th Cir. 1996).
               | 
               | ProCD is about shrinkwrap licenses, the court determined
               | that buying the software and installing it was the
               | equivalent of agreeing to the license.
               | 
               | In no way does that imply that licenses are enforceable
               | on people who never agreed to the licenses. The court
               | expanded what counts as agreement, it does not mean you
               | don't have to get people to agree to the EULA. I mean,
               | take pedantic issue with the word "sign" if you want
               | (sure, other types of agreement exist, you're correct),
               | but the basic point is still true -- if you want to
               | restrict people with a EULA, they need to actually agree
               | to the EULA.
               | 
               | And if you don't have IP law as a way to block access to
               | your stuff, then you don't really have a way to force
               | people to agree to the EULA. Someone using LLaMA output
               | to train a model may have never been in a position to
               | agree to that EULA, and Facebook doesn't have the legal
               | ability to say "hey, nobody can use output without
               | agreeing to this" because they don't have copyright over
               | that output. Can they get people to sign a EULA before
               | downloading the weights from them? Sure. Is that enough
               | to restrict everyone else who didn't download those
               | weights? No.
               | 
               | To go a step further, if you don't believe that weights
               | themselves are copyrightable, then putting a EULA in
               | front of them is even less effective because people can
               | just download the weights from someone else other than
               | Facebook.
               | 
               | You can host a project Gutenberg book and get people to
               | sign a EULA before they download it from you, even though
               | you don't own the copyright. And that EULA would be
               | binding, yes. But you cannot host a project Gutenberg
               | book, put a EULA in front of it, and then claim that
               | people who _don 't_ download it from you and instead just
               | grab it off of a mirror are still bound by that EULA.
               | 
               | Your ability to control access is what gives you the
               | ability to force people to sign the EULA. And that's kind
               | of dependent on IP law. If someone sticks the LLaMA 2.0
               | weights on a P2P site, and those weights aren't covered
               | by copyright, then no, under no interpretation of US law
               | would downloading those weights from a 3rd-party source
               | constitute an agreement with Facebook.
               | 
               | But even if you don't take that position, even if you
               | assume that model weights are copyrightable, if I
               | download a dataset generated by LLaMA, there is still no
               | shrinkwrap license on that data.
               | 
               | To your original point:
               | 
               | > If someone uses LLaMa output to train models, it is
               | unlikely they will be sued for copyright infringement. It
               | is far more likely they will be sued for breach of
               | contract.
               | 
               | It is incredibly unlikely that someone using a 3rd-party
               | database of LLaMA output would be found to be in
               | violation of contract law unless at the very least they
               | had actually agreed to the contract by downloading LLaMA
               | themselves. A restriction on the usage of LLaMA does not
               | mean anything for someone who is using LLaMA output but
               | has not taken any action that would imply agreement to
               | that EULA.
               | 
               | > You and others seem to have an over-the-top hostile
               | reaction to the idea that contract law can do things
               | copyright law cannot do. But it is objective and
               | unarguable fact.
               | 
               | No, what we have a hostile reaction to is the objectively
               | false idea that a EULA covers unrelated 3rd parties.
               | That's not a thing, it's never been a thing.
               | 
               | I don't know what to say if you disagree with that other
               | than that I'm putting a EULA in front of all of
               | Shakespeare's works that says you now have to pay me $20
               | before you use them no matter where you get them from,
               | and apparently that's a thing you believe I can do?
        
             | wwweston wrote:
             | > Arbitrary Internet data that is not copied
             | 
             | It's all but certainly copied, and not just in the "held in
             | memory" sense but actually stored along with the rest of
             | the training collection. What may not happen is
             | distribution. There's a difference in scale/nature of
             | copyright violation between the two but both could well be
             | construed that way.
             | 
             | Additionally, I think there's a reasonable argument that
             | use as training data is a novel one that should be treated
             | differently under the law. And if there's not:
             | 
             | > If you want to do that, you have to limit end-user
             | freedom with an EULA.
             | 
             | What will eventually happen -- at least without some kind
             | of worldwide convention -- is that someone who can
             | successfully dodge licensing obligations will be able to
             | take and redistribute weight-data and/or clean-room code.
             | 
             | At least, if we're adopting a "because we can" approach to
             | everything related.
        
             | owenfi wrote:
             | But you can publish the output, right? And then a "third
             | party" could train a different model on just that published
             | material without copying it or ever agreeing to a EULA.
        
               | torstenvl wrote:
               | If you believe that courts will find your shell game
               | convincing, you are free to try it and incur the legal
               | risk. I recommend you consult with an attorney before
               | doing so.
        
               | themoonisachees wrote:
               | You could simply train on the output straight up and
               | nobody would ever be able to tell anyway.
        
             | 6gvONxR4sf7o wrote:
             | One of the common elements of training sets for these
             | models (including LLama) is the Books3 dataset, which is a
             | huge number of pirated books from torrents. That's exactly
             | what you described.
             | 
             | Regardless, the lack of a license cannot give you _more_
             | permission than a restrictive license. You 're arguing that
             | if take a book out of a bookstore without paying (or
             | signing a contract), then I have more rights than if I sign
             | a contract and then leave with the book.
        
               | [deleted]
        
           | rahkiin wrote:
           | Like google is allowed to scrape the whole internet but
           | you're not allowed to scrape google. Rules for thee but not
           | for me
        
             | kgwgk wrote:
             | What rules? Google won't scrape your part of the internet
             | if you don't allow it, right?
        
               | makeitdouble wrote:
               | Google respects the "robot.txt" and asks you to use it to
               | opt out of their crawling.
               | 
               | Parent's point is if your own scaping army respects the
               | "scaping.txt" and goes down on Google as they don't opt-
               | out in their scraping.txt, it probably wouldn't fly.
        
               | kgwgk wrote:
               | I don't understand. What does "Rules for thee but not for
               | me" mean if "google is allowed to scrape" whatever people
               | allows Google to scrape but "you're not allowed to scrape
               | google" because using the same rules
               | google.com/robots.txt says                  User-agent: *
               | Disallow: /search        ....
        
               | makeitdouble wrote:
               | There's an imbalance because the robot.txt rule is
               | something Google pushed forward (didn't invent it, but
               | made it standard) and is opt-out. So yes, Google made up
               | their rules and won't let other people to make up their
               | own self-beneficial rules in a similar way.
        
               | kgwgk wrote:
               | > Google [...] won't let other people to make up their
               | own self-beneficial rules in a similar way.
               | 
               | What "other people"?
               | 
               | If it's the "you" who is not allowed to scrape google in
               | https://news.ycombinator.com/item?id=36817237 then you
               | can make your own "google is not allowed to scrape my
               | thing" rules if you think that's beneficial for you.
               | 
               | If it's somehow related to LLM providers or users I doubt
               | that's what the original comment was referring to.
               | 
               | To be clear, I understand the original comment as
               | LLM companies say "I can use your content and you cannot
               | not prevent me from doing so, but I won't allow you to
               | use the output of the LLM" just like Google says "I can
               | scrape your content and you cannot not prevent me from
               | doing so, but I won't allow you to scrape the output of
               | the search engine"
               | 
               | and that doesn't seem a valid analogy.
        
             | rvnx wrote:
             | Also the main business model of Google (and of search
             | engines in general) is to republish rearranged snippets of
             | copyrighted content and even serve whole copies of the
             | content (googleusercontent cache), without prior
             | authorization of the copyright holders, and for-profit.
             | 
             | It's completely illegal if you think about it.
             | 
             | So why LLMs who crawl the internet to present snippets and
             | information should be treated differently from Google ?
             | (who also reproduce verbatim the same content without
             | paying any compensation to the copyright owners (all types:
             | text, image, code)
        
               | bayindirh wrote:
               | Because search engines do not create mishmash of this
               | data to parrot some stuff about it. Also they don't strip
               | the source, the license, and stop scraping my site when I
               | tell them.
               | 
               | LLMs scrape my site and code, strip all identifying
               | information and license, and provide/sell that to others
               | for profit, without my consent.
               | 
               | There are so many wrongs here, at every level.
        
               | az226 wrote:
               | It wouldn't. Facebook is delusional if they think the
               | license can pass muster.
               | 
               | Presumably you can't build an LLM that is a competitor of
               | LlaMA using its outputs.
               | 
               | But AI weights are in legal gray zone for now. So it's
               | muddy waters and fair game for anyone who wants to take
               | on the legal risks.
        
               | panzi wrote:
               | Not wanting to defend the likes of Google, but search
               | engines link the original source (in contrast to LLMs).
               | Their basic idea is to direct people to your content.
               | There are countries where content companies didn't like
               | what Google does: Google took them out of the index ->
               | suddenly they where ok with it again so that Google put
               | them in again. (extremely simplified story)
        
               | pyrale wrote:
               | > Their basic idea is to direct people to your content.
               | 
               | This is less and less true, as evidenced by the
               | progression of 0-click searchs.
               | 
               | > There are countries where content companies didn't like
               | what Google does: Google took them out of the index ->
               | suddenly they where ok with it again so that Google put
               | them in again.
               | 
               | This story screams antitrust.
        
               | mschuster91 wrote:
               | > This story screams antitrust.
               | 
               | It does but the complainers are usually tabloid crap
               | pushers whom no one in power really supports.
        
               | Andrex wrote:
               | > It's completely illegal if you think about it.
               | 
               | Google would argue (and they won in federal court versus
               | the Author's Guild using this argument) that displaying
               | snippets of publicly-crawlable websites constitutes "fair
               | use." Profitability weighs against fair use but it
               | doesn't discount it outright.
               | 
               | They would also probably cite robots.txt as an easy and
               | widely-accepted "opt-out" method.
               | 
               | Overall, I'm not sure any court would rule against
               | Google's use of snippets for search. And since Google's
               | been around for over 20 years and they haven't lost a
               | lawsuit over it, I don't think it's accurate to say "it's
               | completely illegal if you think about it."
               | 
               | US copyright law is one of those things that might seem
               | simple, but really isn't. Hence many of the copyright
               | lawsuits clogging our judicial system.
        
               | gtirloni wrote:
               | It just likes a little imoral vs illegal confusion.
        
               | remram wrote:
               | You think search engines are immoral? You think we should
               | pay to view the snippets under the results we don't
               | click?
        
           | whatshisface wrote:
           | The belief that makes them consistent is that the authors of
           | a million Reddit posts have no way to assert their rights
           | while the big company that trained a Redditor model does.
        
             | LastTrain wrote:
             | Sure they do, albeit a shitty one: it's called a class-
             | action.
        
         | tasubotadas wrote:
         | Generate data using ai, save it, it cannot be copyrighted or
         | anything, data isn't a model, use it as much as you want for
         | training.
         | 
         | Ezpz
        
         | redox99 wrote:
         | It's so hypocritical, it's insane.
         | 
         | "Yes, we train our models on a good chunk of the internet
         | without asking permission, but don't you dare train on our
         | models' output without our permission!"
         | 
         | And OpenAI also has a similar restriction.
        
           | alerighi wrote:
           | In fact they can't (both Facebook and OpenAI) train their
           | models without asking permission. Just wait for someone to
           | start raising this concern. The EU is working on regulating
           | these kind of aspects, for example this is not compliant at
           | all with the GDPR (unless you train only on data that doesn't
           | contain personal data, that is more rare than you would
           | think).
        
         | concinds wrote:
         | Fundamentally untrue, and disheartening that it's the top
         | comment.
         | 
         | You can't use a model's output to train another model, it leads
         | to complete gibberish (termed "model collapse").
         | https://arxiv.org/abs/2305.17493v2
         | 
         | And the Llama 2 license allows users to train derivative
         | models, which is what people really care about.
         | https://github.com/facebookresearch/llama/blob/main/LICENSE
        
           | rgoldste wrote:
           | The truth is between these two. You can use a model's output
           | to train another model, but it has drawbacks, including model
           | collapse.
        
         | danShumway wrote:
         | I don't see how this would be enforceable in law without
         | killing almost every AI company on the market today.
         | 
         | The whole legal premise of these models is that training on
         | copyrighted material is fair use. If it's not, then... I mean
         | is Facebook trying to claim that including copyrighted material
         | in a dataset _isn 't_ fair use regardless of the author's
         | wishes? Because I have bad news for LLaMA then.
         | 
         | "You need permission to train on this" is an interesting legal
         | stance for any AI company to take.
        
           | doctorpangloss wrote:
           | > The whole legal premise of these models is that training on
           | copyrighted material is fair use.
           | 
           | Not to diminish the conversation here, but not even a Supreme
           | Court Justice knows what the legality is. You'd have to be a
           | whole 9 person Supreme Court to make an accurate statement
           | here. I don't think anyone really knows how Congress meant
           | today's laws to work in this scenario.
        
             | mschuster91 wrote:
             | > I don't think anyone really knows how Congress meant
             | today's laws to work in this scenario.
             | 
             | Congress, or more accurate, the drafters of the
             | Constitution, intended that Congress would work to keep the
             | Constitution updated to match the needs of modern times.
             | Instead, Congress ossified to the point it's unable to pass
             | basic laws because a bunch of far right morons hold the
             | House GQP hostage and an absurd amount of leverage was
             | passed to the executive and the Supreme Court as a result -
             | with the active aid of both parties by the way, who didn't
             | even think of passing actual laws to codify something as
             | important as equitable access to elections, fair elections,
             | or the right to have an abortion or to smoke weed. And on
             | top of that your Supreme Court and many Federal court picks
             | were hand-selected from a society that prefers a literal
             | viewpoint of the constitution.
             | 
             | But fear not, y'all are not alone in this kind of idiocy,
             | just look at us Germans and how we're still running on fax
             | machines.
        
           | rcxdude wrote:
           | From my non-legal-professional POV I can see an angle which
           | may work:
           | 
           | Firstly, llama is not just the weights, but also the code
           | alongside it. The weights may or may not be copyrightable,
           | but the code is (and possibly also the network structure
           | itself? that would be important if true but I don't know if
           | it would qualify).
           | 
           | Secondly, you can write what you want in a copyright license:
           | you could write that the license becomes null and void if the
           | licensee eats too much blue cheese if you want.
           | 
           | Following from that, if you were to train on the outputs of
           | the AI, you may not be guilty of copyright infringement in
           | terms of doing the training (both because AI output is not
           | copyrightable in the first place, something which seems
           | pretty set in precedent already, and possibly also because
           | even if it was, it gets established that it is fair use like
           | any other data), but if it means your license to the original
           | code is revoked then you will at the very least need to find
           | another implementation that can use the weights, or (if the
           | weights can be copyrighted, which I would argue is probably
           | not the case, if you follow the argument that the training is
           | fair use, especially if the reasoning is that the weights are
           | simply a collection of facts about the training data, but
           | it's very plausible that courts will rule differently here).
           | 
           | This could wind up with some strange situations where someone
           | generating output with the intent of using it for training
           | could be prosecuted (or at least forced to cease and desist)
           | but anyone actually using that output for training would be
           | in the clear.
           | 
           | I agree it is extremely "have your cake and eat it" on the
           | part of the AI companies: They wish to both bypass copyright
           | and also benefit from the restrictions of it (or, in the case
           | of OpenAI, build a moat by lobbying for restrictions on the
           | creation and use of the models themselves, by playing to
           | fears of AI danger).
        
             | danShumway wrote:
             | These are good points to bring up.
             | 
             | > This could wind up with some strange situations where
             | someone generating output with the intent of using it for
             | training could be prosecuted (or at least forced to cease
             | and desist) but anyone actually using that output for
             | training would be in the clear.
             | 
             | I'll add to this that it's not just output; say that
             | someone is using another service built on top of LLaMA.
             | Facebook itself launched LLaMA 2.0 with a public-facing
             | playground that doesn't require any license agreement or
             | login to use.
             | 
             | You can go right now and use their public-facing portal and
             | generate as much training data as you can before they IP-
             | block you, and... as far as I can tell you haven't done
             | anything in that scenario that I can see that would bind
             | you to this license agreement.
             | 
             | So I still feel like I'll be surprised if any AI company
             | that's serious about wanting bootstrapping itself off of
             | LLaMA is going to be too concerned about this license
             | (whether that's a good idea to do just because the training
             | data itself might be garbage is another conversation). It
             | just seems so easy to get around any restrictions.
        
           | Ajedi32 wrote:
           | I'd say it's enforceable in the sense that if you agree to
           | the license then violating those terms would be breach of
           | contract regardless of whether use of the LLaMA v2 output is
           | protected by copyright or not. But there's nothing stopping
           | someone else who didn't agree to the license from using
           | output you generate with LLaMA v2 to train their model.
        
             | danShumway wrote:
             | I don't want to dip too much into the conversation of
             | whether weights themselves are copyrightable, but note that
             | it's very easy in the case of LLaMA 1.0 to get the weights
             | and play with them without ever signing a contract.
             | 
             | If they turn out to be not copyrightable, then... all this
             | would mean is downloading LLaMA 2.0 weights from a mirror
             | instead of from Facebook.
        
         | renewiltord wrote:
         | I would just do it anyway. In fact, I can release a suitably
         | laundered version and you'd never know. If I release a few
         | million, each with slight variation, there's no way provenance
         | can be established. And then we're home-free.
        
         | objektif wrote:
         | I played with Llama2 for a bit and for a lot of the questions I
         | asked I got complete made up garbage stuff. Why would you want
         | to train on it?
        
       | heyzk wrote:
       | You see a similar loosening of the term in other fields e.g. open
       | source journalism. Although that seems to be more about
       | crowdsourcing than transparency or usage rights.
        
       | PreInternet01 wrote:
       | It's not just in the LLM space; even for 'older' models,
       | companies have aggressively embraced this approach. For example:
       | YOLOv3 has been appropriated by a company called Ultralytics,
       | which has subsequently released the 'YOLOv5' and 'YOLOv8'
       | "updates": https://github.com/ultralytics/ultralytics
       | 
       | There is no marked increase in model effectiveness in these 'new'
       | versions, but even if you just use the 'YOLOv8' Pytorch weights
       | (and no part of their Python toolchain, which _might_ have some
       | improvements), these will somehow try to download files from
       | Ultralytics servers. Possibly for a good reason, but most likely
       | to, let 's say, "pull an Oracle."
       | 
       | Serious AI researchers won't go anywhere near this stuff, but the
       | number of students-slash-potential-interns with "but it's on
       | GitHub!" expectations that I had to reject lately due to "nope,
       | we're not paying these guys for their Enterprise license just to
       | check out your project" is rather disheartening...
        
       | donretag wrote:
       | Since Open Source has been established in the tech ethos for a
       | while now, any deviation has been met with derision. It seems
       | like the community has been more tolerant of these "open"
       | licenses as of late. While must of the hate for projects that do
       | not fit the FOSS standard is mostly unwarranted, hopefully we are
       | not moving quickly in the "open" direction.
       | 
       | Here is another article on LLaMa2:
       | https://opensourceconnections.com/blog/2023/07/19/is-llama-2...
        
       | blueblimp wrote:
       | What's problematic is that there are big models that adopt truly
       | open source licenses, such as MPT-30b and Falcon-40b. As grateful
       | as I am for having access to the Llama2 weights, it feels unfair
       | that it gets credit for being "open source" when there are
       | competing models that really are open source, in the traditional
       | OSI sense.
       | 
       | The practical difference between the licenses is small enough
       | that I expect most people (including me) will choose Llama2
       | anyway, because the models are higher quality. But that incentive
       | may mean that we get stuck with these awkward pseudo-open
       | licenses.
        
       | indus wrote:
       | No wonder there is such "momentum" on watermarking.
        
       | sytse wrote:
       | Great point in the article. In
       | https://opencoreventures.com/blog/2023-06-27-ai-weights-are-... I
       | propose a framework to solve the confusion. From the post: "AI
       | licensing is extremely complex. Unlike software licensing, AI
       | isn't as simple as applying current proprietary/open source
       | software licenses. AI has multiple components--the source code,
       | weights, data, etc.--that are licensed differently. AI also poses
       | socio-ethical consequences that don't exist on the same scale as
       | computer software, necessitating more restrictions like
       | behavioral use restrictions, in some cases, and distribution
       | restrictions. Because of these complexities, AI licensing has
       | many layers, including multiple components and additional
       | licensing considerations."
        
       | [deleted]
        
       | danShumway wrote:
       | > For the foreseeable future, open source and open weights will
       | be used interchangeably, and I think that's okay.
       | 
       | This is a little weird given that directly above, the author puts
       | LLaMA into the "restricted weights" category. Even by the
       | definition the author proposes, LLaMA 2.0 isn't open source; we
       | shouldn't be calling it open source.
       | 
       | If open source in the LLM world means "you can get the weights"
       | and doesn't imply anything about restrictions on their usage,
       | then I don't think that's adapting terminology to a new context,
       | I think it's really cheapening the meaning of Open Source. If you
       | want to refer to specifically "open weights" as open source, I'm
       | a bit more sympathetic to that (although I don't think it's the
       | right terminology to use). But I see where people are coming from
       | -- I'm not too put off by people using open source to describe
       | weights you can download without restrictions on usage.
       | 
       | But LLaMA is not open weights. It's a closed, proprietary set of
       | weights[0] that at best could be compared to source available
       | software.
       | 
       | It is deceptive for Facebook to call LLaMA open source, and we
       | shouldn't go along with that narrative.
       | 
       | [0]: to the extent weights can be copyrighted at all, which I
       | would argue they can't be copyrighted, but that's another
       | conversation.
        
         | FanaHOVA wrote:
         | Author here. I agree with you. LLaMA2 isn't open source (as my
         | title says, the HN one was modified). My point is that the
         | average person will still call it "open source" because they
         | don't know any better, and it's hard to fix that. Rather than
         | just saying "this isn't open source", we should try to come up
         | with better terminology.
         | 
         | Also, while weights usage might be restricted, it's a very big
         | compute investment shared with the public. They use a 285:1
         | training tokens to params ratio, and the loss graphs show the
         | model wasn't yet saturated. This is valuable information for
         | other teams looking to train their own models.
         | 
         | LLaMA1 was highly restrictive, but the data mix mentioned in
         | the paper led to the creation of RedPajama, which was used in
         | the training of MPT. There's still plenty of value in this work
         | that will flow to open source, even if it doesn't fit in the
         | traditional labels.
        
           | danShumway wrote:
           | Thanks for replying! And agreed on the title change; I think
           | your original title is much, much better phrased and I'm
           | sorry that I glossed over it when reading the article
           | (although I'm not sure "doesn't matter" fully captures the
           | distinction you're making here) -- mods probably shouldn't
           | have changed it.
           | 
           | > There's still plenty of value in this work that will flow
           | to open source, even if it doesn't fit in the traditional
           | labels.
           | 
           | That is a good point; the fight over what is open source and
           | what is source available can get heated, and part of that is
           | a defense against the erosion of the term. But... in general
           | source available is better than closed source software. And
           | LLaMA 2 is a significant improvement over LLaMA 1 in that
           | regard, it really is. So I don't necessarily want to be down
           | on it, in some ways it's just backlash of being tired of
           | companies stretching definitions. But they're doing a thing
           | that will absolutely help improve open access to LLMs.
           | 
           | I'm always a little bit torn about how to go about this kind
           | of criticism of terminology, and I'm not trying to say that
           | people shouldn't be excited about LLaMA 2. But the way it
           | works out I'm often playing word police because the erosion
           | of the term does make it harder to refer to models with
           | actual open weights like StableLM. Facebook deserves real
           | praise for releasing a model with weights that can be used
           | commercially. It doesn't deserve to be treated as if what
           | it's doing is equivalent to what StabilityAI or RedPanda is
           | doing.
           | 
           | I do like your terminology of "open weights" and "restricted
           | weights", and I wouldn't be opposed to even breaking that
           | down even further, I think there's a clear difference between
           | LLaMA 1 and 2 in terms of user freedom, so I'm not opposed to
           | people trying to distinguish, just... it's not hitting the
           | bar of being open weights.
           | 
           | It's a bit like if the word vegetarian didn't exist, and if
           | everyone argued about how it's unhelpful to say that drinking
           | milk isn't vegan because it's still tangibly different from
           | eating meat. On one hand I agree, but on the other hand it's
           | better to have another category for it that means "not vegan,
           | but still not eating meat." There is an actual danger in
           | blurring a line so much that the line doesn't mean anything
           | anymore, and where people who mean something more rigorous no
           | longer have a term to communicate amongst themselves. If
           | average people get bothered by throwing LLaMA 2 into the
           | "restricted weights" category, it's better to introduce
           | another category between restricted and open that means
           | "restricted but not commercially".
           | 
           | Beyond that though... yeah, I agree. I don't really have a
           | problem with people calling open weights open source, my only
           | objection to that is kind of technical and pedantic, but I
           | don't think it causes any actual harm if someone wants to
           | call StableLM open source.
        
       | pk-protect-ai wrote:
       | llama2 is absolutely useless. From the small models the
       | guanaco-33b and guanaco-65b are the best (though they are derived
       | from llama).
        
         | Oranguru wrote:
         | Useless for what? Are you comparing the base model with chat-
         | tuned models?
         | 
         | Chat-tuned derivatives of LLaMa 2 are already appearing. Given
         | that the base LLaMa 2 model is more efficient than LLaMa 1, it
         | is reasonable to expect that these more refined chat-tuned
         | versions of the chat-tuned versions will outperform the ones
         | you mention.
        
         | monlockandkey wrote:
         | wait for the tuned models
        
         | ngai_aku wrote:
         | Is that just based on your experience, or do you have a link to
         | benchmarks?
        
           | pk-protect-ai wrote:
           | Try these prompts with different models. LLaMA 2 output is
           | pure garbage: ----1---- On a map sized (256,256), Karen is
           | currently located at position (33,33). Her mission is to
           | defeat the ogre positioned at (77,17). However, Karen only
           | has a 1/2 chance of succeeding in her task. To increase her
           | odds, she can: 1. Collect the nightshades at position
           | (122,133), which will improve her chances by 25%. 2. Obtain a
           | blessing from the elven priest in the elven village at
           | (230,23) in exchange for a fox fur, further increasing her
           | chances by additional 25% Foxes can be found in the forest
           | located between positions (55,33) and (230,90).
           | 
           | Find the optimal route for Karen's quest which maximizes her
           | chances of defeating the ogre to 100%. ----2---- Write a
           | python code using imageio.v3 to create a PNG image
           | representing the map way-points and the route of Karen in her
           | quest, each way-point must be of a different color and her
           | path must be a gradient of the colors between the waypoints.
           | ------------
           | 
           | I have a lot of cases those I test against different models
           | ... GPT-4 since one week is really degraded, GPT-3.5 became a
           | little bit better, and LLaMA2 is garbage.
        
       | bloppe wrote:
       | Why not just "downloadable"? It describes the actual difference
       | between LLaMA and GPT. Open-data is the only other distinction
       | that matters.
        
       | rvz wrote:
       | Yes (Unfortunately). But Llama 2 being released for free as a
       | downloadable AI model is much better than nothing. For now it is
       | a great start against the cloud-only AI models.
       | 
       | As for terms, we'll settle on '$0 downloadable AI models' which
       | are available today. Would rather use that over cloud-only AI
       | models which can fall over and break your app at any time and you
       | have zero control over that.
       | 
       | Stable Diffusion is a good example that fits the definition of
       | 'open-source AI' as we have the entire training data, weights
       | reproduciblity, etc and Llama 2 does not.
        
         | FanaHOVA wrote:
         | Agreed. I called it a "$3M of FLOPS donation" by Meta.
        
       | throwuwu wrote:
       | Should be good motivation to figure out what those numbers mean
        
       | mk_stjames wrote:
       | In the diagram, there is theoretically another category outside
       | the 'Restricted Weights' but maybe less than the 'Completely
       | Closed' superspace, and that would be something along the lines
       | of 'Blackbox weights and model' that is free to use but
       | essentially non inspectable or transferrable. This would be the
       | sister to 'free to use' closed-source software. An AI that is
       | free to use but provided as a binary blob would meet this
       | criterion. Or a module importable to python that calls
       | precompiled binaries for the inference engine + weights with no
       | source available. The traditional complement of this in the
       | current software world would be Linux drivers from 3rd parties
       | that are not open source. They are free, but not open.
       | 
       | We haven't seen this too much yet in the AI world, as mostly
       | people who open the weights are doing so in a research manner,
       | where the inference is decidedly needed to be open sourced- and
       | people with closed models do so in order to make money and thus
       | no reason to open source the inference side either, just charge
       | for an API ("OpenAI").
        
         | FanaHOVA wrote:
         | Yea I didn't include it, but that'd be the "free as in beer,
         | but not freedom" circle :)
        
       | rapatel0 wrote:
       | Fully reproducible model training might simply not be possible if
       | information from the training environment is not captured. In
       | addition to data and code you might have additional uncertainty
       | from:
       | 
       | - pseudo/true random number generator and initialization
       | 
       | - certain speculative optimizations associated with training
       | environments (distributed)
       | 
       | - Speculative optimizations associated with model compression
       | 
       | - Image decompression algorithm mismatch (basically this is
       | library versioning)
       | 
       | - ....things I'm forgetting...
       | 
       | It's just a lot of things to remember to capture, communicate,
       | and reproduce.
        
         | martincmartin wrote:
         | _pseudo /true random number generator and initialization_
         | 
         | It's not just the generator and initialization. If you do
         | anything multithreaded, like a producer/consumer queue, then
         | you need to know which pieces of work went to which thread in
         | which order.
         | 
         | It's a lot like reproducing subtle and rare race conditions.
        
         | monocasa wrote:
         | Most of the mature ML environments are pretty focused on
         | reproducible training though. It's pretty necessary for
         | debugging and iteration.
        
       | taneq wrote:
       | There's "open source" in the original sense, where the source was
       | available. Then there's "FOSS" where the source is not only
       | available, but it's under a copyleft license designed to protect
       | the IP from greedy individual humans. And then there's "open" in
       | the Shenzhen sense where you can find the source and other data
       | online and nobody's going to stop you building something based on
       | those. This is an interesting timeline.
        
         | pzo wrote:
         | On top of that there are also different OSS such as Apache and
         | MIT that the latter one can still restrict the user from using
         | because project owner might patented some algorithm and MIT
         | license doesn't have patent grant.
         | 
         | LGPL3.0 also pretty much is restricted in a way that not sure
         | if can be used to distribute software in App Store for iOS
         | legally.
        
         | risho wrote:
         | The original sense of open source is defined by the people who
         | fractured off from the Free Software movement in the mid 90's
         | and created it. It's just "Free Software" that has a focus on
         | practicality and utility rather than "Free Software"'s focus on
         | idealism and doing the right thing. It has NOTHING to do with
         | "source available" which is a movement that has recently been
         | co-opting the open source name.
         | 
         | "FOSS" has absolutely no requirement of it being copyleft. The
         | MIT license is just as FOSS as the GPL. Many of the free
         | software advocates do have an affinity for copyleft, but they
         | are not mutually exclusive. There are plenty of FOSS advocates
         | who also use and advocate for permissive licenses as well.
        
         | jordigh wrote:
         | > There's "open source" in the original sense
         | 
         | That original sense never existed. Virtually nobody said "open
         | source" before OSI's 1998 campaign for "Open Source", as
         | bankrolled by Tim O'Reilly.
         | 
         | https://thebaffler.com/salvos/the-meme-hustler
         | 
         | I know it's been a long time, and we've forgotten, but there is
         | virtually no record of anyone saying "open source" before 1998,
         | except in rare and obscure contexts and often unrelated to the
         | modern meaning.
        
           | teddyh wrote:
           | There's this one from September 10th, 1996, which I find
           | intriguing:
           | 
           | https://web.archive.org/web/20180402143912/http://www.xent.c.
           | ..
        
         | hiatus wrote:
         | > And then there's "open" in the Shenzhen sense where you can
         | find the source and other data online and nobody's going to
         | stop you building something based on those.
         | 
         | I believe there is a name for that: gongkai.
         | https://www.bunniestudios.com/blog/?page_id=3107
        
           | taneq wrote:
           | Ooh, thanks! I've watched a few of bunnie's things in the
           | past but that's a term I'll remember.
        
       | failuser wrote:
       | Of course, it's not open source. With proliferation of the cloud,
       | software has obtained an entirely new level of closeness: not
       | being able to see the program binaries. Having an ability to run
       | locally is now somewhat open in comparison.
        
         | Eduard wrote:
         | An understood term like "open open" source shouldn't be
         | hijacked and exploited for marketing purposes.
         | 
         | What these models do, they should either invented a new term,
         | or use an appropriate existing term, eg. "fair use"
        
           | failuser wrote:
           | Absolutely. Maybe the term is already coined, but I don't
           | know it. Open source implies the ability to compile software
           | from human-generated inputs. This is just self-hosted
           | freeware.
        
       | jerf wrote:
       | This isn't really new, the strict "Open Source" as defined for
       | software has never made exact, perfect sense for anything other
       | than software. That's why the Creative Commons licenses exist;
       | putting a photographic image under GPL2 has never made any sense.
       | It always needs redefinition in new media.
        
         | alerighi wrote:
         | Even for medias such as photos, songs, videos, you have a
         | source. That is the raw materials and the projects from which
         | you rendered the image, the video or the audio output.
         | 
         | The source of a language model is more in reality the model,
         | that is the code that was used to train the particular model.
         | The model itself is more of a compiled binary, altough not in
         | machine code.
         | 
         | So for a model to be really open source to me it would mean
         | that you have to release the software used for generating it,
         | so I can modify it, train it on my data, and use it.
        
         | hardolaf wrote:
         | The strict "Open Source" wasn't even a definition when I
         | started college.
        
         | not2b wrote:
         | An LLM is more like software than it is like media. The GPL
         | defines source code as the preferred form for making
         | modifications, including the scripts needed for building the
         | executable from source. The weights in this case are more
         | similar to the optimized executable code that comes out of a
         | flow. The "source" would be the training data and the code and
         | procedures for turning that into a model. For very large LLMs
         | almost no one could use this, but for smaller academic models
         | it might make sense, so researchers could build on each others'
         | work.
        
         | RobotToaster wrote:
         | Creative commons has never claimed to be an open source licence
         | though, they usually use the term free culture.
        
         | Flimm wrote:
         | It doesn't need redefinition. We just need a new term for new
         | media.
        
       | curtis3389 wrote:
       | Part of the benefit of FOSS & open source is that a curious user
       | can inspect how something is made and learn from it. It matters
       | that open weights are no different from a compiled program. Sure,
       | you can always modify an executable's instructions, but there's
       | no openness there.
       | 
       | Then there's the problems of the content of the training data,
       | which parallel the dangers of opaque algorithms.
        
       | morpheuskafka wrote:
       | The chart in this this article is very wrong to show only GPL as
       | free software and MIT/Apache as open source but not free software
       | licenses.
       | 
       | While the FSF side of things doesn't like the term "open source,"
       | even they say that "nearly all open source software is free
       | software." Specifically, the MIT and Apache (and LGPL) licenses
       | are absolutely free software licenses--otherwise Debian, FSF-
       | approved distros, etc. would have far less software to choose
       | from.
       | 
       | What the chart probably meant to distinguish is copyleft vs free
       | software or open source. And if you're ordering it from a
       | permissiveness viewpoint, the subset relationship should be
       | reversed--GPL is far more permissive than SSPL, etc., but still
       | less permissive that MIT/Apache.
        
       | skybrian wrote:
       | I don't see why the term "open source" needs to evolve when
       | "source available" is available. Or in this case, "weights
       | available under a license with few restrictions."
        
         | mhh__ wrote:
         | New generation of programmers can't remember not having open
         | source / free software of any kind so the difference is
         | academic versus felt.
        
       | flir wrote:
       | "Nyet! Am not open source! Not want lose autonomy!"
       | 
       | (Downvotes... oops. The reference is Charlie Stross's
       | Accelerando. The protagonist has a conversation with an AI that's
       | just trying to survive. One of the options he suggests is to open
       | source itself. Which is a roundabout way of saying that
       | _eventually_ we 're going to have to take the AI's own opinions
       | into account. What if it doesn't want to be open source?)
        
       | Havoc wrote:
       | It is quite an unfortunate dilution of the term
        
       | arikanev wrote:
       | How is it possible that you can fine tune Llama v2 but the
       | weights are not available? That doesn't make sense to me.
        
       | godelski wrote:
       | The headline is editorialized. Actual is "LLaMA2 isn't "Open
       | Source" - and why it doesn't matter"
       | 
       | It is actually editorialized in a way that feels quite different
       | from the actual one. I think the author and the poster might
       | disagree on what open source means.
        
         | swyx wrote:
         | they are the same person :)
        
         | FanaHOVA wrote:
         | Mods changed the title, I used the original one when first
         | posting. Not sure why they changed it.
        
       | Der_Einzige wrote:
       | Given that it's basically impossible to prove that a particular
       | text was generated using a particular LLM (and yes, even with all
       | the watermarking tricks we know of, this is and will still be the
       | case), they might as well be interchangeable. Folks can and will
       | simply ignore the silly license BS that the creators put on the
       | LLM.
       | 
       | I hope that users aggressively ignore these restrictive licenses
       | and give the middle finger to greedy companies like Facebook who
       | try to restrict usage of their models. Information deserves to be
       | free, and Aaron Swartz was a saint.
        
       | api wrote:
       | I'm not sure open source applies to actual models. Models aren't
       | human readable, so it's closer to a binary blob. It would apply
       | to the training code and possibly data set.
       | 
       | Llama2 is a binary blob pre-trained model that is useful and is
       | licensed in a fairly permissive way, and that's fine.
        
         | politelemon wrote:
         | Yes I think you've put it well. If models were smaller I'd see
         | those in the Github releases section. The model training is
         | what I'd see in the source code and the README etc, to arrive
         | at the 'blob'.
        
           | api wrote:
           | Even if it costs millions in compute to run at that scale,
           | seeing that code would be extremely informative.
        
         | cjdell wrote:
         | Very like a binary blob. You have to execute it to use it and
         | impossible for humans to reason about just by looking at it.
         | 
         | At least binary blobs can be disassembled.
        
       ___________________________________________________________________
       (page generated 2023-07-21 23:01 UTC)