[HN Gopher] Show HN: A fully open-source (Apache 2.0)implementat...
       ___________________________________________________________________
        
       Show HN: A fully open-source (Apache 2.0)implementation of llama
        
       We believe that AI should be fully open source and part of the
       collective knowledge.  The original LLaMA code is GPL licensed
       which means any project using it must also be released under GPL.
       This "taints" any other code and prevents meaningful academic and
       commercial use.  Lit-LLaMA solves that for good.
        
       Author : osurits
       Score  : 74 points
       Date   : 2023-03-28 17:33 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | nynx wrote:
       | There are already a million ways to run LLaMA. This doesn't
       | change the issue at all, which is that the weights aren't
       | commercially licensed.
        
         | rasbt wrote:
         | I think some businesses and people are worried about using GPL
         | code in their code bases because that's incompatible with their
         | own licenses.
        
         | theaniketmaurya wrote:
         | Yes, agree that the weights aren't commercially licensed (yet)!
         | The other ways to run LLaMA are using GPL license which makes
         | it difficult for commercial use even if someone trains and
         | upload the weights publicly.
         | 
         | This could be a step in for the change :)
        
       | rasbt wrote:
       | I guess that means time to fire up a few GPUs later today and get
       | some weights! We should have a weight exchange platform for that
       | maybe, haha.
        
         | A4ET8a8uTh0 wrote:
         | You mean like a blockchain? I jest, but only a little.
        
       | yewnork wrote:
       | I see this as a win for the AI community. The key for LLMs is to
       | enable people to train collaboratively and innovate more quickly
       | in this space. Are there any examples or demos available that
       | showcase the capabilities of "lit-llama"?
        
       | AmuVarma wrote:
       | Llama by FB is under a non-commercial license not a GPL license,
       | so I assume you are using a different base model, what model is
       | that?
        
         | sp332 wrote:
         | This isn't a new model, it's just new code.
        
           | AmuVarma wrote:
           | ah i see so Its just a new way to run inference on the non
           | commercial model
        
           | AmuVarma wrote:
           | So who cares if its GPL licensed it can never be put into
           | production
        
             | woodson wrote:
             | You probably should care if you integrate the inference
             | code in other software packages that aren't GPL licensed.
        
             | Tepix wrote:
             | The GPL allows for commercial use. Just adhere to the
             | license.
        
       | [deleted]
        
       | ficiek wrote:
       | If you hate GPL so much then I assume that you don't run any GPL
       | licensed code on your machines then. I admire your resolve
       | because I would think that is pretty hard!
        
       | ipsum2 wrote:
       | FYI, there's something fishy going on in this thread. Multiple
       | people from the LightningAI team theaniketmaurya (developer
       | advocate for Lightning AI) and rasbt (developer at Lightning AI)
       | are shilling for this post. The account that submitted this
       | (osurits) also only has two comments, also with the same
       | behavior.
       | 
       | Having interacted with the Lightning AI team in the past, this is
       | unsurprising behavior.
        
       | alexb_ wrote:
       | >GPL...prevents meaningful academic and commercial use
       | 
       | WTF are you talking about?
        
         | theaniketmaurya wrote:
         | GPL is a copyleft license which requires you to share anything
         | that you build using the original software. This makes it
         | difficult for commercial use.
        
           | homarp wrote:
           | Red Hat 30th anniversary -
           | https://news.ycombinator.com/item?id=35337146
        
           | n3t wrote:
           | > GPL is a copyleft license which requires you to share
           | anything that you build using the original software.
           | 
           | That's not true.
           | 
           | > This makes it difficult for commercial use.
           | 
           | Yeah, too bad it's so difficult for companies to use Linux
           | commercially. /s
        
       | adeon wrote:
       | I think implying that GPL is not "fully open source" is a hot
       | take. It's specifically designed to ensure you and anyone you
       | distribute your code gets the same freedoms. Maybe you don't
       | agree that it's a good license but that is its intention. GPL vs
       | BSD-type licenses I guess is decades long argument by now.
       | 
       | Maybe I'm a naive idealist but IMO the GPL-family of licenses are
       | underrated. You can use them to make sure you don't work for free
       | for someone who won't share their improvements.
       | 
       | I liked the choice of AGPL for AUTOMATIC1111 Stable Diffusion web
       | UI. (https://github.com/AUTOMATIC1111/stable-diffusion-webui)
       | 
       | Commercial interests are very allergic to AGPL which ensures the
       | project stays community-run and new features and fixes will
       | prioritize the most ordinary user doing things for fun.
        
         | cuuupid wrote:
         | I think OP mischaracterized the issue with the license, its
         | more that the weights don't fall under the same scope. They're
         | research use only, no commercial use allowed.
        
           | rasbt wrote:
           | Not sure, but I think the point was that if you have
           | something in GPL license (like the code in this case) it's
           | open source, but that doesn't mean you can use that for your
           | business application. That's because GPL requires you open
           | sourcing all derivative work and most businesses don't want
           | to/can't do that.
        
             | lantiga wrote:
             | The AI ecosystem is almost entirely Apache 2/MIT/BSD, and
             | GPL is just incompatible with it.
             | 
             | This is a blocker to mixing and matching, a simple Apache 2
             | rewrite fixes that problem.
             | 
             | Weights? It's another issue but we'll be looking forward to
             | fixing that too.
        
               | dTal wrote:
               | It's not incompatible at all. You can use code under all
               | of those licenses in a GPLd work.
        
           | UncleOxidant wrote:
           | Yeah, this is weird, because there are plenty of open source
           | implementations of the lama model on github - alpaca.cpp in
           | c++ is one, there are many others in PyTorch such as the one
           | used by ChatLLaMa. But without the weights they're not very
           | useful (unless you're going to try to train it yourself -
           | good luck with that unless you've got a lot of compute power
           | available).
           | 
           | A quick check on github and I find this one also with an
           | Apache license: https://github.com/chris-alexiuk/alpaca-lora
           | and alpaca.cpp with an MIT license: https://github.com/antima
           | tter15/alpaca.cpp/blob/master/LICEN...
        
       | querez wrote:
       | IANAL, but this seems very fishy to me: 1) I don't understand how
       | this isn't a derivative work of the original code, as I very
       | highly doubt you've done a clean room implementation. I doubt
       | this would hold up in court.
       | 
       | 2) Doesn't the original FB license also apply to the weights?
       | Just re-implementing the code would not change the license on the
       | weights. So while THE CODE may now be re-licensed, the weights
       | would still fall under the original license.
       | 
       | I'd love if someone with more legal understanding could shed some
       | light on this.
        
         | MacsHeadroom wrote:
         | >I don't understand how this isn't a derivative work of the
         | original code
         | 
         | The original code is Apache 2 licensed. Derivatives are fine
         | and allowed. This retains the same Apache 2 license as
         | Facebook's code.
         | 
         | It's only the model that isn't covered by that permissive
         | Apache 2 license. A model produced by a derivative of the
         | permissively licensed code, or even by the original code
         | itself, is not a derivative or the original non-permissively
         | licensed model produced by the original code and is non-
         | infringing even if it is a bit-perfect replica.
         | 
         | > Doesn't the original FB license also apply to the weights?
         | 
         | Again, there are different licenses for the code and the model
         | and neither license actually applies to the weights within the
         | model only the actual exact model. If this project produced a
         | bit-for-bit replica of Facebook's model it would still not
         | infringe on that model's license.
         | 
         | But it doesn't produce a bit-for-bit replica. Even if Facebook
         | were to re-run their same training code on their same hardware
         | would they could not produce the exact same weights as before
         | since massively parallel matrix multiplications are not
         | deterministic. Benign environmental noise like microscopic
         | fluctuations in temperature make a difference in the outcome.
        
           | kristjansson wrote:
           | > Apache 2
           | 
           | Isn't the original GPLv3[0]?
           | 
           | [0]:
           | https://github.com/facebookresearch/llama/blob/main/LICENSE
        
             | lantiga wrote:
             | Correct, the original is GPL 3.
             | 
             | To produce this implementation from the LLaMA paper we
             | started from github.com/karpathy/nanoGPT, the LLaMA
             | architecture is really similar to GPT. For instance we
             | added rotary positional encoding starting from the original
             | RoPE repo published with the paper.
             | 
             | We finally ran the original model to make sure the two
             | models were numerically.
        
         | rnosov wrote:
         | 1) I've looked at both codebases and this one is definitely a
         | derivative of the nanoGPT. You can compare all three
         | implementations yourself as they are actually surprisingly
         | compact and readable.
         | 
         | 2) The issue whether weights are copyrightable at all has not
         | been settled yet. If they are, there is a fair use doctrine
         | that allows transformative works of a copyrighted work. The
         | line is a bit blurry but consider Cariou v. Prince case[1]
         | where addition of colour to some black and white photos was
         | considered enough to be transformative. Similarly, full fine
         | tuning on current news or adding visual modality could
         | potentially create a brand new model in the eyes of a law.
         | 
         | [1] https://cyber.harvard.edu/people/tfisher/cx/2013_Cariou.pdf
        
       | barefeg wrote:
       | But aren't the weights still not for commercial use?
        
         | Ciantic wrote:
         | That's what I thought too, the source code was not an issue so
         | much as that.
         | 
         | What we need is some sort of "Large Language Model at Home"
         | (like SETI@home was) that could crowdsource the creation of the
         | model which would be free to use.
        
           | barefeg wrote:
           | Right, so sort of like https://github.com/bigscience-
           | workshop/petals but for the training phase. I suppose
           | different training runs could be proposed via a RFC type of
           | procedure. Then it's not only the open source model
           | maintainers that put the effort, but also supporters of the
           | project can "donate" their hardware resources.
        
             | lantiga wrote:
             | Some form of that is very likely the future
        
       | homarp wrote:
       | llama.cpp is also MIT
       | 
       | https://github.com/ggerganov/llama.cpp
       | 
       | previously discussed here
       | https://news.ycombinator.com/item?id=35100086
       | 
       | and one of the rust wrapper:
       | https://news.ycombinator.com/item?id=35171527 (also MIT)
        
       | 2Gkashmiri wrote:
       | Bs.
       | 
       | Prevents meaningful academic.....
       | 
       | How the hell does agpl prevent academic use? Commercial use sure
       | because agpl follows 4 freedoms and commercial often wants to
       | take someone else's work, slap their brand without acknowledging
       | the original work. That and the downstream is often closed source
       | for "business reasons" which causes their users to not enjoy the
       | fruits of the first party's licensing.
       | 
       | Where does academia come into it? Are researchers now keeping
       | everything under wraps for "shareholders interests"?
       | 
       | Isn't academia supposed to be open culture from the start without
       | any restrictions so what am I missing or are they mixing two
       | unrelated things?
       | 
       | Also, I think I might be wrong but isn't it merely converting
       | llama into their version? Uh ...
        
         | ftxbro wrote:
         | I'm not saying this is how it should be, but a lot of the
         | author lists of published papers on scaling properties of large
         | language models have been employees in research divisions
         | within big tech companies or academics holding dual positions
         | with those companies and with their university.
         | 
         | > Where does academia come into it? Are researchers now keeping
         | everything under wraps for "shareholders interests"? Isn't
         | academia supposed to be open culture from the start without any
         | restrictions so what am I missing or are they mixing two
         | unrelated things?
         | 
         | Yeah academia was never perfect, but it's becoming more and
         | more like you describe. It's been happening for a while and
         | that's a whole other thing.
        
       | theaniketmaurya wrote:
       | I am in love with this implementation considering the ability to
       | run on 8 GB VRAM and Apache 2.0 license.
        
         | theaniketmaurya wrote:
         | I am curious though how would the model weights work out?
        
       ___________________________________________________________________
       (page generated 2023-03-28 23:00 UTC)