[HN Gopher] Llama.cpp 30B runs with only 6GB of RAM now ___________________________________________________________________ Llama.cpp 30B runs with only 6GB of RAM now Author : msoad Score : 329 points Date : 2023-03-31 20:37 UTC (2 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | TaylorAlexander wrote: | Great to see this advancing! I'm curious if anyone knows what the | best repo is for running this stuff on an Nvidia GPU with 16GB | vram. I ran the official repo with the leaked weights and the | best I could run was the 7B parameter model. I'm curious if | people have found ways to fit the larger models on such a system. | terafo wrote: | I'd _assume_ that 33B model should fit with this(only repo that | I know of that implements SparseGPT and GPTQ for LLaMa), I, | personally, haven 't tried though. But you can try your luck | https://github.com/lachlansneff/sparsellama | enlyth wrote: | https://github.com/oobabooga/text-generation-webui | w1nk wrote: | Does anyone know how/why this change decreases memory consumption | (and isn't a bug in the inference code)? | | From my understanding of the issue, mmap'ing the file is showing | that inference is only accessing a fraction of the weight data. | | Doesn't the forward pass necessitate accessing all the weights | and not a fraction of them? | matsemann wrote: | Maybe lots of the data is embedding values or tokenizer stuff, | where a single prompt uses a fraction of those values. And then | the rest of the model is quite small. | w1nk wrote: | That shouldn't be the case. 30B is a number that directly | represents the size of the model, not the size of the other | components. | detrites wrote: | The pace of collaborative OSS development on these projects is | amazing, but the _rate_ of optimisations being achieved is almost | unbelievable. What has everyone been doing wrong all these years | _cough_ sorry, I mean to say weeks? | | Ok I answered my own question. | datadeft wrote: | I have predicted that LLaMA will be available on mobile phones | before the end of this year. We are very close. | terafo wrote: | You mean in contained app? It can already run on a phone. GPU | acceleration would be nice at this point, though. | rickrollin wrote: | People have actually ran it on phones. | politician wrote: | Roughly: OpenAIs don't employ enough jarts. | | In other words, the groups of folks working on training models | don't necessarily have access to the sort of optimization | engineers that are working in other areas. | | When all of this leaked into the open, it caused a lot of | people knowledgeable in different areas to put their own | expertise to the task. Some of those efforts (mmap) pay off | spectacularly. Expect industry to copy the best of these | improvements. | bee_rider wrote: | The professional optimizes well enough to get management off | their back, the hobbyist can be irrationally good. | hedgehog wrote: | They have very good people but those people have other | priorities. | kmeisthax wrote: | >What has everyone been doing wrong all these years | | So it's important to note that all of these improvements are | the kinds of things that are cheap to run on a pretrained | model. And all of the developments involving large language | models recently have been the product of hundreds of thousands | of dollars in rented compute time. Once you start putting six | digits on a pile of model weights, that becomes a capital cost | that the business either needs to recuperate or turn into a | competitive advantage. So everyone who scales up to this point | doesn't release model weights. | | The model in question - LLaMA - isn't even a public model. It | leaked and people copied[0] it. But because such a large model | leaked, now people can actually work on iterative improvements | again. | | Unfortunately we don't really have a way for the FOSS community | to pool together that much money to buy compute from cloud | providers. Contributions-in-kind through distributed computing | (e.g. a "GPT@home" project) would require significant changes | to training methodology[1]. Further compounding this, the | state-of-the-art is actually kind of a trade secret now. Exact | training code isn't always available, and OpenAI has even gone | so far as to refuse to say anything about GPT-4's architecture | or training set to prevent open replication. | | [0] I'm avoiding the use of the verb "stole" here, not just | because I support filesharing, but because copyright law likely | does not protect AI model weights alone. | | [1] AI training has very high minimum requirements to get in | the door. If your GPU has 12GB of VRAM and your model and | gradients require 13GB, you can't train the model. CPUs don't | have this limitation but they are ridiculously inefficient for | any training task. There are techniques like ZeRO to give | pagefile-like state partitioning to GPU training, but that | requires additional engineering. | seydor wrote: | > we don't really have a way for the FOSS community to pool | together that much money | | There must be open source projects with enough money to pool | into such a project. I wonder whether wikimedia or apache are | considering anything. | terafo wrote: | _AI training has very high minimum requirements to get in the | door. If your GPU has 12GB of VRAM and your model and | gradients require 13GB, you can 't train the model. CPUs | don't have this limitation but they are ridiculously | inefficient for any training task. There are techniques like | ZeRO to give pagefile-like state partitioning to GPU | training, but that requires additional engineering._ | | You can't if you have one 12gb gpu. You can if you have | couple of dozens. And then petals-style training is possible. | It is all very very new and there are many unsolved hurdles, | but I think it can be done. | webnrrd2k wrote: | Maybe a good candidate for the SETI@home treatment? | terafo wrote: | It is a good candidate. Tech is good 6-18 months away, | though. | dplavery92 wrote: | Sure, but when one 12gb GPU costs ~$800 new (e.g. for the | 3080 LHR), "a couple of dozens" of them is a big barrier to | entry to the hobbyist, student, or freelancer. And cloud | computing offers an alternative route, but, as stated, | distribution introduces a new engineering task, and the | month-to-month bills for the compute nodes you are using | can still add up surprisingly quickly. | terafo wrote: | We are talking groups, not individuals. I think it is | quite possible for couple of hundreds of people to | cooperate and train something at least as big as LLaMa 7B | in a week or two. | xienze wrote: | > but the rate of optimisations being achieved is almost | unbelievable. What has everyone been doing wrong all these | years cough sorry, I mean to say weeks? | | It's several things: | | * Cutting-edge code, not overly concerned with optimization | | * Code written by scientists, who aren't known for being the | world's greatest programmers | | * The obsession the research world has with using Python | | Not surprising that there's a lot of low-hanging fruit that can | be optimized. | Miraste wrote: | Why does Python get so much flak for inefficiencies? It's | really not that slow, and in ML the speed-sensitive parts are | libraries in lower level languages anyway. Half of the | optimization from this very post is in Python. | wkat4242 wrote: | Wow I continue being amazed by the progress being made on | language models in the scope of weeks. I didn't expect | optimisations to move this quickly. Only a few weeks ago we were | amazed with ChatGPT knowing it would never be something to run at | home, requiring $100.000 in hardware (8xA100 card). | kossTKR wrote: | Does this mean that we can also run the 60B model on a 16GB ram | computer now? | | I have the M2 air and can't wait until further optimisation with | the Neural Engine / multicore gpu + shared ram etc. | | I find it absolutely mind boggling that GPT-3.5(4?) level quality | may be within reach locally on my $1500 laptop / $800 m2 mini. | thomastjeffery wrote: | I doubt it: text size and text _pattern_ size don 't scale | linearly. | kossTKR wrote: | Interesting, i wonder what the scaling function is. | abujazar wrote: | I love how LLMs have got the attention of proper programmers such | that the Python mess is getting cleaned up. | jart wrote: | Author here. For additional context, please read | https://github.com/ggerganov/llama.cpp/discussions/638#discu... | The loading time performance has been a huge win for usability, | and folks have been having the most wonderful reactions after | using this change. But we don't have a compelling enough theory | yet to explain the RAM usage miracle. So please don't get too | excited just yet! Yes things are getting more awesome, but like | all things in science a small amount of healthy skepticism is | warranted. | conradev wrote: | > But we don't have a compelling enough theory yet to explain | the RAM usage miracle. | | My guess would be that the model is faulted into memory lazily | page by page (4K or 16K chunks) as the model is used, so only | the actual parts that are needed are loaded. | | The kernel also removes old pages from the page cache to make | room for new ones, and especially so if the computer is using a | lot of its RAM. As with all performance things, this approach | trades off inference speed for memory usage, but likely faster | overall because you don't have to read the entire thing from | disk at the start. Each input will take a different path | through the model, and will require loading more of it. | | The cool part is that this memory architecture should work just | fine with hardware acceleration, too, as long as the computer | has unified memory (anything with an integrated GPU). This | approach likely won't be possible with dedicated GPUs/VRAM. | | This approach _does_ still work to run a dense model with | limited memory, but the time/memory savings would just be less. | The GPU doesn't multiply every matrix in the file literally | simultaneously, so the page cache doesn't need to contain the | entire model at once. | jart wrote: | I don't think it's actually trading away inference speed. You | can pass an --mlock flag, which calls mlock() on the entire | 20GB model (you need root to do it), then htop still reports | only like 4GB of RAM is in use. My change helps inference go | faster. For instance, I've been getting inference speeds of | 30ms per token after my recent change on the 7B model, and I | normally get 200ms per eval on the 30B model. | conradev wrote: | Very cool! Are you testing after a reboot / with an empty | page cache? | jart wrote: | Pretty much. I do my work on a headless workstation that | I SSH into, so it's not like competing with Chrome tabs | or anything like that. But I do it mostly because that's | what I've always done. The point of my change is you | won't have to be like me anymore. Many of the devs who | contacted after using my change have been saying stuff | like, "yes! I can actually run LLaMA without having to | close all my apps!" and they're so happy. | Miraste wrote: | This is incredible, great work. Have you tried it with the | 65B model? Previously I didn't have a machine that could | run it. I'd love to know the numbers on that one. | liuliu wrote: | Metal only recent versions (macOS 13 / iOS 16) supports mmap | and use that in GPU directly. CUDA does have unified memory | mode even it is dedicated GPU, would be interesting to try | that out. Probably going to slow down quite a bit, but still | interesting to have that possibility. | zone411 wrote: | It really shouldn't act as a sparse model. I would bet on | something being off. | world2vec wrote: | >I'm glad you're happy with the fact that LLaMA 30B (a 20gb | file) can be evaluated with only 4gb of memory usage! | | Isn't LLaMA 30B a set of 4 files (60,59Gb)? | | -edit- nvm, It's quantized. My bad | smaddox wrote: | Based on that discussion, it definitely sounds like some sort | of bug is hiding. Perhaps run some evaluations to compare | perplexity to the standard implementation? | nynx wrote: | Why is it behaving sparsely? There are only dense operations, | right? | w1nk wrote: | I also have this question, yes it should be. The forward pass | should require accessing all the weights AFAIK. | [deleted] | thomastjeffery wrote: | How diverse is the training corpus? | dchest wrote: | https://arxiv.org/abs/2302.13971 | eternalban wrote: | Great work. Is the new file format described anywhere? Skimming | the issue comments I have a vague sense that r/o matter was | colocated somewhere for zero copy mmap or is there more to it? | sillysaurusx wrote: | Hey, I saw your thoughtful comment before you deleted it. I | just wanted to apologize -- I had no idea this was a de facto | Show HN, and certainly didn't mean to make it about something | other than this project. | | The only reason I posted it is because Facebook had been | DMCAing a few repos, and I wanted to reassure everyone that | they can hack freely without worry. That's all. | | I'm really sorry if I overshadowed your moment on HN, and I | feel terrible about that. I'll try to read the room a little | better before posting from now on. | | Please have a wonderful weekend, and thanks so much for your | hard work on LLaMA! | | EDIT: The mods have mercifully downweighted my comment, which | is a relief. Thank you for speaking up about that, and sorry | again. | | If you'd like to discuss any of the topics you originally | posted about, you had some great points. | d3nj4l wrote: | Maybe off topic, but I just wanted to say that you're an | inspiration! | htrp wrote: | Just shows how inefficient some of the ML research code can be | robrenaud wrote: | Training tends to require a lot more precision and hence | memory than inference. I bet many of the tricks here won't | work well for training. | sr-latch wrote: | Have you tried running it against a quantized model on | HuggingFace with identical inputs and deterministic sampling to | check if the outputs you're getting are identical? I think that | should confirm/eliminate any concern of the model being | evaluated incorrectly. | intelVISA wrote: | Didn't expect to see two titans today: ggerganov AND jart. Can | ya'll slow down you make us mortals look bad :') | | Seeing such clever use of mmap makes me dread to imagine how | much Python spaghetti probably tanks OpenAI's and other "big | ML" shops' infra when they should've trusted in zero copy | solutions. | | Perhaps SWE is dead after all, but LLMs didn't kill it... | brucethemoose2 wrote: | Does that also mean 6GB VRAM? | | And does that include Alpaca models like this? | https://huggingface.co/elinas/alpaca-30b-lora-int4 | terafo wrote: | No(llama.cpp is cpu-only) and no(you need to requantize the | model). | sp332 wrote: | According to | https://mobile.twitter.com/JustineTunney/status/164190201019... | you can probably use the conversion tools from the repo on | Alpaca and get the same result. | | If you want to run larger Alpaca models on a low VRAM GPU, try | FlexGen. I think https://github.com/oobabooga/text-generation- | webui/ is one of the easier ways to get that going. | brucethemoose2 wrote: | Yeah, or deepspeed presumably. Maybe torch.compile too. | | I dunno why I thought llama. _cpp_ would support gpus. | _shrug_ | lukev wrote: | Has anyone done any comprehensive analysis on exactly how much | quantization affects the quality of model output? I haven't seen | any more than people running it and being impressed (or not) by a | few sample outputs. | | I would be very curious about some contrastive benchmarks between | a quantized and non-quantized version of the same model. | corvec wrote: | Define "comprehensive?" | | There are some benchmarks here: | https://www.reddit.com/r/LocalLLaMA/comments/1248183/i_am_cu... | and here: https://nolanoorg.substack.com/p/int-4-llama-is-not- | enough-i... | | Check out the original paper on quantization, which has some | benchmarks: https://arxiv.org/pdf/2210.17323.pdf and this | paper, which also has benchmarks and explains how they | determined that 4-bit quantization is optimal compared to | 3-bit: https://arxiv.org/pdf/2212.09720.pdf | | I also think the discussion of that second paper here is | interesting, though it doesn't have its own benchmarks: | https://github.com/oobabooga/text-generation-webui/issues/17... | mlgoatherder wrote: | I've done some experiments here with Llama 13B, in my | subjective experience the original fp16 model is significantly | better (particularly on coding tasks). There are a bunch of | synthetic benchmarks such a wikitext2 PPL and all the whiz bang | quantization schemes seem to score well but subjectively | something is missing. | | I've been able to compare 4 bit GPTQ, naive int8, LLM.int8, | fp16, and fp32. LLM.int8 does impressively well but inference | is 4-5x slower than native fp16. | | Oddly I recently ran a fork of the model on the ONNX runtime, | I'm convinced that the model performed better than | pytorch/transformers, perhaps subtle differences in floating | point behavior etc between kernels on different hardware | significantly influence performance. | | The most promising next step in the quantization space IMO has | to be fp8, there's a lot of hardware vendors adding support, | and there's a lot of reasons to believe fp8 will outperform | most current quantization schemes [1][2]. Particularly when | combined with quantization aware training / fine tuning (I | think OpenAI did something similar for GPT3.5 "turbo"). | | If anybody is interested I'm currently working on an open | source fp8 emulation library for pytorch, hoping to build | something equivalent to bitsandbytes. If you are interested in | collaborating my email is in my profile. | | 1. https://arxiv.org/abs/2208.09225 2. | https://arxiv.org/abs/2209.05433 | bakkoting wrote: | Some results here: | https://github.com/ggerganov/llama.cpp/discussions/406 | | tl;dr quantizing the 13B model gives up about 30% of the | improvement you get from moving from 7B to 13B - so quantized | 13B is still much better than unquantized 7B. Similar results | for the larger models. | terafo wrote: | I wonder where such difference between llama.cpp and [1] repo | comes from. F16 difference in perplexity is .3 on 7B model, | which is not insignificant. ggml quirks are definitely need | to be fixed. | | [1] https://github.com/qwopqwop200/GPTQ-for-LLaMa | bakkoting wrote: | I'd guess the GPTQ-for-LLaMa repo is using a larger context | size. Poking around it looks like GPTQ-for-llama is | specifying 2048 [1] vs the default 512 for llama.cpp [2]. | You can just specify a longer size on the CLI for llama.cpp | if you are OK with the extra memory. | | [1] https://github.com/qwopqwop200/GPTQ-for- | LLaMa/blob/934034c8e... | | [2] https://github.com/ggerganov/llama.cpp/tree/3525899277d | 2e2bd... | gliptic wrote: | GPTQ-for-LLaMa recently implemented some quantization | tricks suggested by the GPTQ authors that improved 7B | especially. Maybe llama.cpp hasn't been evaluated with | those in place? | terafo wrote: | For this specific implementation here's info from llama.cpp | repo: | | _Perplexity - model options | | 5.5985 - 13B, q4_0 | | 5.9565 - 7B, f16 | | 6.3001 - 7B, q4_1 | | 6.5949 - 7B, q4_0 | | 6.5995 - 7B, q4_0, --memory_f16_ | | According to this repo[1] difference is about 3% in their | implementation with right group size. If you'd like to know | more, I think you should read GPTQ paper[2]. | | [1] https://github.com/qwopqwop200/GPTQ-for-LLaMa | | [2] https://arxiv.org/abs/2210.17323 | bsaul wrote: | how is llama performance relative to chatgpt ? is it as good as | chatgpt3 or even 4 ? | terafo wrote: | It is as good as GPT-3 at most sizes. Instruct layer needs to | be put on top in order for it to compete with GPT 3.5(which | powers ChatGPT). It can be done with comparatively little | amount of compute(couple hundred bucks worth of compute for | small models, I'd assume low thousands for 65B). | arka2147483647 wrote: | What is lama? What can it do? | terafo wrote: | Read readme in repo. | UncleOxidant wrote: | What's the difference between llama.cpp and alpaca.cpp? | cubefox wrote: | I assume the former is just the foundation model (which only | predicts text) while the latter is instruction tuned. | [deleted] | [deleted] | danShumway wrote: | I messed around with 7B and 13B and they gave interesting | results, although not quite consistent enough results for me to | figure out what to do with them. I'm curious to try out the 30B | model. | | Start time was also a huge issue with building anything usable, | so I'm glad to see that being worked on. There's potential here, | but I'm still waiting on more direct API/calling access. Context | size is also a little bit of a problem. I think categorization is | a potentially great use, but without additional alignment | training and with the context size fairly low, I had trouble | figuring out where I could make use of tagging/summarizing. | | So in general, as it stands I had a lot of trouble figuring out | what I could personally build with this that would be genuinely | useful to run locally and where it wouldn't be preferable to | build a separate tool that didn't use AI at all. But I'm very | excited to see it continue to get optimized; I think locally | running models are very important right now. | cubefox wrote: | I don't understand. I thought each parameter was 16 bit (two | bytes) which would predict minimally 60GB of RAM for a 30 billion | parameter model. Not 6GB. | gamegoblin wrote: | Parameters have been quantized down to 4 bits per parameter, | and not all parameters are needed at the same time. | heap_perms wrote: | I was thinking something similar. Turns out that you don't need | all the weights for any given prompt. | | > LLaMA 30B appears to be a sparse model. While there's 20GB of | weights, depending on your prompt I suppose only a small | portion of that needs to be used at evaluation time [...] | | Found the answer from the author of this amazing pull request: | https://github.com/ggerganov/llama.cpp/discussions/638#discu... | qwertox wrote: | Is the 30B model clearly better than the 7B? | | I played with Pi3141/alpaca-lora-7B-ggml two days ago and it was | super disappointing. In percentage between 0% = alpaca- | lora-7B-ggml and 100% GPT-3.5, where would LLaMA 30B be | positioned? | Rzor wrote: | I haven't been able to run it myself yet, but according to what | I read so far from people who did, the 30B model is where the | "magic" starts to happen. | singularity2001 wrote: | Does that only happen with the quantized model or also with the | float16 / float32 model? Is there any reason to use float models | at all? | ducktective wrote: | I wonder if Georgi or jart use GPT in their programming and | design. I guess the training data was lacking for the sort of | stuff they do due to their field of work especially jart. | jart wrote: | Not yet. GPT-4 helped answer some questions I had about the | WIN32 API but that's the most use I've gotten out of it so far. | I'd love for it to be able to help me more, and GPT-4 is | absolutely 10x better than GPT 3.5. But it's just not strong | enough at the kinds of coding I do that it can give me | something that I won't want to change completely. They should | just train a ChatJustine on my code. | Dwedit wrote: | > 6GB of RAM | | > Someone mentioning "32-bit systems" | | Um no, you're not mapping 6GB on RAM on a 32-bit system. The | address space simply doesn't exist. | jiggawatts wrote: | Windows Server could use up to 64 GB for a 32-bit operating | system. Individual processes couldn't map more than 4 GB, but | the total could be larger: | https://en.wikipedia.org/wiki/Physical_Address_Extension | sillysaurusx wrote: | On the legal front, I've been working with counsel to draft a | counterclaim to Meta's DMCA against llama-dl. (GPT-4 is | surprisingly capable, but I'm talking to a few attorneys: | https://twitter.com/theshawwn/status/1641841064800600070?s=6...) | | An anonymous HN user named L pledged $200k for llama-dl's legal | defense: | https://twitter.com/theshawwn/status/1641804013791215619?s=6... | | This may not seem like much vs Meta, but it's enough to get the | issue into the court system where it can be settled. The tweet | chain has the details. | | The takeaway for you is that you'll soon be able to use LLaMA | without worrying that Facebook will knock you offline for it. (I | wouldn't push your luck by trying to use it for commercial | purposes though.) | | Past discussion: https://news.ycombinator.com/item?id=35288415 | | I'd also like to take this opportunity to thank all of the | researchers at MetaAI for their tremendous work. It's because of | them that we have access to such a wonderful model in the first | place. They have no say over the legal side of things. One day | we'll all come together again, and this will just be a small | speedbump in the rear view mirror. | | EDIT: Please do me a favor and skip ahead to this comment: | https://news.ycombinator.com/item?id=35393615 | | It's from jart, the author of the PR the submission points to. I | really had no idea that this was a de facto Show HN, and it's | terribly rude to post my comment in that context. I only meant to | reassure everyone that they can freely hack on llama, not make a | huge splash and detract from their moment on HN. (I feel awful | about that; it's wonderful to be featured on HN, and no one | should have to share their spotlight when it's a Show HN. | Apologies.) | terafo wrote: | Wish you all luck in the world. We need much more clarity in | legal status of these models. | sillysaurusx wrote: | Thanks! HN is pretty magical. I think they saw | https://news.ycombinator.com/item?id=35288534 and decided to | fund it. | | I'm grateful for the opportunity to help protect open source | projects such as this one. It will at least give Huggingface | a basis to resist DMCAs in the short term. | [deleted] | [deleted] | sheeshkebab wrote: | All models trained on public data need to be made public. As it | is their outputs are not copyrightable, it's not a stretch to | say models are public domain. | sillysaurusx wrote: | I'm honestly not sure. RLHF seems particularly tricky --- if | someone is shaping a model by hand, it seems reasonable to | extend copyright protection to them. | | For the moment, I'm just happy to disarm corporations from | using DMCAs against open source projects. The long term | implications will be interesting. | xoa wrote: | You seem to be mixing a few different things together here. | There's a huge leap from something not being copyrightable to | saying there is grounds for it to be _made_ public. No | copyright would greatly limit the ability of model makers to | legally restrict distribution if they made it to the public, | but they 'd be fully within their rights to keep them as | trade secrets to the best of their ability. Trade secret law | and practice is its own thing separate from copyright, lots | of places have private data that isn't copyrightable (pure | facts) but that's not the same as it being made public. | Indeed part of the historic idea of certain areas of IP like | patents was to encourage more stuff to be made public vs kept | secret. | | > _As it is their outputs are not copyrightable, it's not a | stretch to say models are public domain._ | | With all respect this is kind of nonsensical. "Public domain" | only applies to stuff that is copyrightable, if they simply | aren't then it just never enters into the picture. And it not | being patentable or copyrightable doesn't mean there is any | requirement to share it. If it does get out though then | that's mostly their own problem is all (though depending on | jurisdiction and contract whoever did the leaking might get | in trouble), and anyone else is free to figure it out on | their own and share that and they can't do anything. | sheeshkebab wrote: | Public domain applies to uncopyrightable works, among other | things (including previously copyrighted works). In this | case models are uncopyrightable, and I think FB (or any of | these newfangled ai cos) would have interesting time | proving otherwise, if they ever try. | | https://en.m.wikipedia.org/wiki/Public_domain | electricmonk wrote: | _IANYL - This is not legal advice._ | | As you may be aware, a counter-notice that meets the statutory | requirements will result in reinstatement unless Meta sues over | it. So the question isn't so much whether your counter-notice | covers all the potential defenses as whether Meta is willing to | sue. | | The primary hurdle you're going to face is your argument that | weights are not creative works, and not copyrightable. That | argument is unlikely to succeed for the the following reasons | (just off the top of my head): (i) The act of selecting | training data is more akin to an encyclopedia than the white | pages example you used on Twitter, and encyclopedias are | copyrightable as to the arrangement and specific descriptions | of facts, even though the underlying facts are not; and (ii) | LLaMA, GPT-N, Bard, etc, all have different weights, different | numbers of parameters, different amounts of training data, and | different tuning, which puts paid to the idea that there is | only one way to express the underlying ideas, or that all of it | is necessarily controlled by the specific math involved. | | In addition, Meta has the financial wherewithal to crush you | even were you legally on sound footing. | | The upshot of all of this is that you may win for now if Meta | doesn't want to file a rush lawsuit, but in the long run, you | likely lose. | sva_ wrote: | Thank you for putting your ass on the line and deciding to | challenge $megacorp on their claims of owning the copyright on | NN weights that have been trained on public (and probably, to | some degree, also copyrighted) data. This seems to very much be | uncharted territory in the legal space, so there are a lot of | unknowns. | | I don't consider it ethical to compress the corpus of human | knowledge into some NN weights and then closing those weights | behind proprietary doors, and I hope that legislators will see | this similarly. | | My only worry is that they'll get you on some technicality, | like that (some version of) your program used their servers | afaik. | cubefox wrote: | Even if using LLaMA turns out to be legal, I very much doubt it | is ethical. The model got leaked while it was only intended for | research purposes. Meta engineered and paid for the training of | this model. It's theirs. | Uupis wrote: | I feel like most-everything about these models gets really | ethically-grey -- at worst -- very quickly. | willcipriano wrote: | What did they train it on? | cubefox wrote: | On partly copyrighted text. Same as you and me. | faeriechangling wrote: | Did Meta ask permission from every user they trained their | model on? Did all those users consent, and when I say consent | I'm saying was there a meeting of minds not something buried | in page 89 of a EULA, to Meta building an AI with their data? | | Turnabout is fair play. I don't feel the least bit sorry for | Meta. | terafo wrote: | LLaMa was trained on data of Meta users, though. | cubefox wrote: | But it doesn't copy any text one to one. The largest one | was trained on 1.4 trillion tokens, if I recall correctly, | but the model size is just 65 billion parameters. (I | believe they use 16 bit per token and parameter.) It seems | to be more like a human who has read large parts of the | internet, but doesn't remember anything word by word. | Learning from reading stuff was never considered a | copyright violation. | Avicebron wrote: | > It seems to be more like a human who has read large | parts of the internet, but doesn't remember anything word | by word. Learning from reading stuff was never considered | a copyright violation. | | This is one of the most common talking points I see | brought up, especially when defending things like ai | "learning" from the style of artists and then being able | to replicate that style. On the surface we can say, oh | it's similar to a human learning from an art style and | replicating it. But that implies that the program is | functioning like a human mind (as far as I know the jury | is still out on that and I doubt we know exactly how a | human mind actually "learns" (I'm not a neuroscientist)). | | Let's say for the sake of experiment I ask you to cut out | every word of pride and prejudice, and keep them all | sorted. Then when asked to write a story in the style of | jane austen you pull from that pile of snipped out words | and arranged them in a pattern that most resembles her | writing, did you transform it? Sure maybe, if a human did | that I bet they could even copyright it, but I think that | as a machine, it took those words, phrases, and applied | an algorithm to generating output, even with stochastic | elements the direct backwards traceability albeit a 65B | convolution of it means that the essence of the | copyrighted materials has been directly translated. | | From what I can see we can't prove the human mind is | strictly deterministic. But an ai very well might be in | many senses. So the transference of non-deterministic | material (the original) through a deterministic transform | has to root back to the non-deterministic model (the | human mind and therefore the original copyright holder). | shepardrtc wrote: | They don't ask permission when they're stealing users' | data, so why should users ask permission for stealing their | data? | | https://www.usatoday.com/story/tech/2022/09/22/facebook- | meta... | seydor wrote: | It's an index of the web and our own comments, barely | something they can claim ownership on , and especially to | resell. | | But OTOH, by preventing commercial use, they have sparked the | creation of an open source ecosystem where people are | building on top of it because it's fun, not because they want | to build a moat to fill it with sweet VC $$$money. | | It's great to see that ecosystem being built around it, and | soon someone will train a fully open source model to replace | Llama | dodslaser wrote: | Meta as a company has shown pretty blatantly that they don't | really care about ethitcs, nor the law for that sake. | [deleted] | victor96 wrote: | Less memory than most Electron apps! | terafo wrote: | With all my dislike to Electron, I struggle to remember even | one Electron app that managed to use 6 gigs. | baobabKoodaa wrote: | I assume it was a joke | mrtksn wrote: | I've seen WhatsApp doing it. It start with 1.5G anyway, so | after some images and stuff it inflates quite a lot. | yodsanklai wrote: | Total noob questions. | | 1. How does this compare with ChatGPT3 | | 2. Does it mean we could eventually run a system such as ChatGPT3 | on a computer | | 3. Could LLM eventually replace Google (in the sense that answers | could be correct 99.9% of the time) or is the tech inherently | flawed | addisonl wrote: | Minor correction, chatGPT uses GPT-3.5 and (most recently, if | you pay $20/month) GPT-4. Their branding definitely needs some | work haha. We are in track for you to be able to run something | like chatGPT locally! ___________________________________________________________________ (page generated 2023-03-31 23:00 UTC)