[HN Gopher] A brief history of LLaMA models ___________________________________________________________________ A brief history of LLaMA models Author : andrewon Score : 68 points Date : 2023-04-28 02:26 UTC (1 days ago) (HTM) web link (agi-sphere.com) (TXT) w3m dump (agi-sphere.com) | FloatArtifact wrote: | There needs to be a slight dedicated to tracking all these models | with regular updates. | vessenes wrote: | Most places that recommend llama.cpp for mac fail to mention | https://github.com/jankais3r/LLaMA_MPS, which runs unquantized 7b | and 13b models on the M1/M2 GPU directly. It's slightly slower, | (not a lot), and significantly lower energy usage. To me the win | not having to quantize while not melting a hole in my lap is | huge; I wish more people knew about it. | brucethemoose2 wrote: | There is also CodyCapybara (7B finetuned on code competitions), | the "uncensored" Vicuna, OpenAssistant 13B (which is said to be | very good), various non English tunes, medalpaca... the release | pace maddening. | acapybara wrote: | And let's not forget about Alpacino (offensive/unfiltered | model). | simonw wrote: | I'm running Vicuna (a LLaMA variant) on my iPhone right now. | https://twitter.com/simonw/status/1652358994214928384 | | The same team that built that iPhone app - MLC - also got Vicuna | running directly in a web browser using Web GPU: | https://simonwillison.net/2023/Apr/16/web-llm/ | newswasboring wrote: | With all these new AI models, both stable diffusion and llama | specially, I'm considering switching to iPhone. I don't think I | fully understand why iPhones and Macs are getting so many | implementations but it seems like it's hardware based. | simonw wrote: | My understanding is that part of it is that Apple Silicon | shares all available RAM between CPU and GPU. | | I'm not sure how many of these models are actively taking | advantage of that architecture yet though. | int_19h wrote: | The GPU isn't actually used by llama.cpp. What makes it | that much faster is that the workload, either on CPU or on | GPU, is very memory-intensive, so it benefits greatly from | fast RAM. And Apple is using DDR5 running at very high | clock speeds for this shared memory stuff. | | It's still noticeably slower than GPU, though. | bkm wrote: | Homogenized hardware I assume, this is why iOS had so many | photography Apps too. | sp332 wrote: | iPhones leaned in to "computational photography" a long time | ago. Eventually they added custom hardware to handle all the | matrix multiplies efficiently. They exposed some of it to | apps with an API called CoreML. They've been adding more | features like on-device photo tagging, voice recognition, VR | stuff. | sagarm wrote: | Google was the leader on computational smartphone | photography. They released their "night sight" mode before | Samsung and Apple had anything competitive. | doodlesdev wrote: | > Our system thinks you might be a robot! We're really | sorry about this, but it's getting harder and harder to tell the | difference between humans and bots these days. | | Yeah, fuck you too. Come on, really, why put this in front of a | _blog post_? Is it that hard to keep up with the bot requests | when serving a static page? | jiggawatts wrote: | It keeps saying the phrase "model you can run locally", but | despite days of trying, I failed to compile any of the GitHub | repos associated with these models. | | None of the Python dependencies are strongly versioned, and | "something" happened to the CUDA compatibility of one of them | about a month ago. The original developers "got lucky" but now | nobody else can compile this stuff. | | After years of using only C# and Rust, both of which have sane | package managers with semantic versioning, lock files, | reproducible builds, and even SHA checksums the Python package | ecosystem looks ridiculously immature and even childish. | | Seriously, can anyone here build a docker image for running these | models on CUDA? I think right now it's borderline impossible, but | I'd be happy to be corrected... | KETpXDDzR wrote: | llama.cpp was easy to setup IMO | rch wrote: | Just use Nixpkgs already. | throwaway6734 wrote: | There's a rust deep learning library called dfdx that just | setup llama: https://github.com/coreylowman/llama-dfdx | Taek wrote: | I have it running locally using the oobabooga webui, setup was | moderately annoying but I'm definitely no python expert and I | didn't have too much trouble. | int_19h wrote: | All of these things exist in the Python package ecosystem, and | are generally much more common outside of ML/DS stuff. The | latter... well, it reminds me of coding in early PHP days. | Basically, anything goes so long as it works. ___________________________________________________________________ (page generated 2023-04-29 23:00 UTC)