[HN Gopher] A brief history of LLaMA models
       ___________________________________________________________________
        
       A brief history of LLaMA models
        
       Author : andrewon
       Score  : 68 points
       Date   : 2023-04-28 02:26 UTC (1 days ago)
        
 (HTM) web link (agi-sphere.com)
 (TXT) w3m dump (agi-sphere.com)
        
       | FloatArtifact wrote:
       | There needs to be a slight dedicated to tracking all these models
       | with regular updates.
        
       | vessenes wrote:
       | Most places that recommend llama.cpp for mac fail to mention
       | https://github.com/jankais3r/LLaMA_MPS, which runs unquantized 7b
       | and 13b models on the M1/M2 GPU directly. It's slightly slower,
       | (not a lot), and significantly lower energy usage. To me the win
       | not having to quantize while not melting a hole in my lap is
       | huge; I wish more people knew about it.
        
       | brucethemoose2 wrote:
       | There is also CodyCapybara (7B finetuned on code competitions),
       | the "uncensored" Vicuna, OpenAssistant 13B (which is said to be
       | very good), various non English tunes, medalpaca... the release
       | pace maddening.
        
         | acapybara wrote:
         | And let's not forget about Alpacino (offensive/unfiltered
         | model).
        
       | simonw wrote:
       | I'm running Vicuna (a LLaMA variant) on my iPhone right now.
       | https://twitter.com/simonw/status/1652358994214928384
       | 
       | The same team that built that iPhone app - MLC - also got Vicuna
       | running directly in a web browser using Web GPU:
       | https://simonwillison.net/2023/Apr/16/web-llm/
        
         | newswasboring wrote:
         | With all these new AI models, both stable diffusion and llama
         | specially, I'm considering switching to iPhone. I don't think I
         | fully understand why iPhones and Macs are getting so many
         | implementations but it seems like it's hardware based.
        
           | simonw wrote:
           | My understanding is that part of it is that Apple Silicon
           | shares all available RAM between CPU and GPU.
           | 
           | I'm not sure how many of these models are actively taking
           | advantage of that architecture yet though.
        
             | int_19h wrote:
             | The GPU isn't actually used by llama.cpp. What makes it
             | that much faster is that the workload, either on CPU or on
             | GPU, is very memory-intensive, so it benefits greatly from
             | fast RAM. And Apple is using DDR5 running at very high
             | clock speeds for this shared memory stuff.
             | 
             | It's still noticeably slower than GPU, though.
        
           | bkm wrote:
           | Homogenized hardware I assume, this is why iOS had so many
           | photography Apps too.
        
           | sp332 wrote:
           | iPhones leaned in to "computational photography" a long time
           | ago. Eventually they added custom hardware to handle all the
           | matrix multiplies efficiently. They exposed some of it to
           | apps with an API called CoreML. They've been adding more
           | features like on-device photo tagging, voice recognition, VR
           | stuff.
        
             | sagarm wrote:
             | Google was the leader on computational smartphone
             | photography. They released their "night sight" mode before
             | Samsung and Apple had anything competitive.
        
       | doodlesdev wrote:
       | > Our system thinks you might be a robot!        We're really
       | sorry about this, but it's getting harder and harder to tell the
       | difference between humans and bots these days.
       | 
       | Yeah, fuck you too. Come on, really, why put this in front of a
       | _blog post_? Is it that hard to keep up with the bot requests
       | when serving a static page?
        
       | jiggawatts wrote:
       | It keeps saying the phrase "model you can run locally", but
       | despite days of trying, I failed to compile any of the GitHub
       | repos associated with these models.
       | 
       | None of the Python dependencies are strongly versioned, and
       | "something" happened to the CUDA compatibility of one of them
       | about a month ago. The original developers "got lucky" but now
       | nobody else can compile this stuff.
       | 
       | After years of using only C# and Rust, both of which have sane
       | package managers with semantic versioning, lock files,
       | reproducible builds, and even SHA checksums the Python package
       | ecosystem looks ridiculously immature and even childish.
       | 
       | Seriously, can anyone here build a docker image for running these
       | models on CUDA? I think right now it's borderline impossible, but
       | I'd be happy to be corrected...
        
         | KETpXDDzR wrote:
         | llama.cpp was easy to setup IMO
        
         | rch wrote:
         | Just use Nixpkgs already.
        
         | throwaway6734 wrote:
         | There's a rust deep learning library called dfdx that just
         | setup llama: https://github.com/coreylowman/llama-dfdx
        
         | Taek wrote:
         | I have it running locally using the oobabooga webui, setup was
         | moderately annoying but I'm definitely no python expert and I
         | didn't have too much trouble.
        
         | int_19h wrote:
         | All of these things exist in the Python package ecosystem, and
         | are generally much more common outside of ML/DS stuff. The
         | latter... well, it reminds me of coding in early PHP days.
         | Basically, anything goes so long as it works.
        
       ___________________________________________________________________
       (page generated 2023-04-29 23:00 UTC)