[HN Gopher] Alpaca-LoRA with Docker
       ___________________________________________________________________
        
       Alpaca-LoRA with Docker
        
       Author : syntaxing
       Score  : 139 points
       Date   : 2023-03-24 11:41 UTC (11 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | jvanderbot wrote:
       | This is neat and all but both Alpaca and Lora are things I
       | already use and already read about on HN, except now their names
       | are bulldozed by LLM tech and things will never be the same.
        
         | gitfan86 wrote:
         | Just run all your web browsing through GTP and tell it to
         | differentiate them for you
        
       | yieldcrv wrote:
       | cloned, hmu if that repo gets nuked
        
       | danso wrote:
       | From the repo README:
       | 
       | > _Try the pretrained model out here, courtesy of a GPU grant
       | from Huggingface!_
       | 
       | https://huggingface.co/spaces/tloen/alpaca-lora
       | 
       | Anyone else getting error messages when trying to submit
       | instructions to the model on Huggingface? It just says "Error" so
       | I don't know if it's a "too many users" problem or something else
       | 
       | edit: nevermind, I was able to get a response after a few more
       | tries, plus a 20 second processing time
        
       | zapdrive wrote:
       | Sorry this is moving too fast for me. So if I understand
       | correctly, LoRa kind of does what Alpaca does but using different
       | data.
       | 
       | So what is Alpaca-Lora? I know you get Alpaca by retraining Llama
       | using Stanford Alpaca 52k instruction-following data? So if I am
       | guessing right, you get Aplaca-Lora by retraining Alpaca using
       | Lora's data?
        
         | return_to_monke wrote:
         | I think your first statement is incorrect. Lora seems to be a
         | method to fine-tune and optimize the weights of models like
         | Alpaca. It is not a different dataset.
         | 
         | This reduces model sizes and therefore also compute costs.
         | 
         | See the abstract of https://arxiv.org/pdf/2106.09685.pdf
        
       | sp332 wrote:
       | This says "We provide an Instruct model of similar quality to
       | text-davinci-003", but two paragraphs later says the output is
       | comparable to Stanford's Alpaca. Those seem like very different
       | claims.
        
         | MacsHeadroom wrote:
         | "We performed a blind pairwise comparison between text-
         | davinci-003 and Alpaca 7B, and we found that these two models
         | have very similar performance: Alpaca wins 90 versus 89
         | comparisons against text-davinci-003."
         | 
         | https://crfm.stanford.edu/2023/03/13/alpaca.html
        
       | ChrisAlexiuk wrote:
       | Hey! Thanks for linking this!
       | 
       | The work was all done by the original repo author - just added a
       | Dockerfile!
        
       | saurik wrote:
       | Yesterday there was a discussion about an article which goes into
       | the usage of Alpaca-LoRA.
       | 
       | https://news.ycombinator.com/item?id=35279656
        
       | dougmwne wrote:
       | What is the final size of the weights?
        
       | teekert wrote:
       | That name is so unfortunate. Nobody searched "Lora" before
       | picking it. Bit of a blunder if you ask me.
        
         | b33j0r wrote:
         | They even capitalize the R like LoRa, but I don't think we'll
         | be running this model on an ESP32 to much profit.
         | 
         | Perhaps someone will release a llama I can run at home... how
         | about "llama-homekit"? ;)
        
       | nico wrote:
       | The demo on HuggingFace with the pre trained mode doesn't seem
       | that good.
       | 
       | Although better than Bard (btw, Bard sucks compared to ChatGPT
       | and can't even do translations - which I would have expected out
       | of the box from Google)
        
         | syntaxing wrote:
         | It's worth noting this is the 7B model (nonquantized). You can
         | get this running on pretty much any GPU with 8GB VRAM and
         | above. You can run the 13B model but that would take two GPU or
         | reducing FP16 to FP8 (I haven't tried it myself). A single
         | connection for chatgpt is rumored to require 8X A100.
        
           | nico wrote:
           | It makes me wonder if this trend will kill NVIDIA.
           | 
           | At this pace we might not even need GPUs anymore.
        
             | jnwatson wrote:
             | The race for bigger NNs will never stop.
        
           | zamalek wrote:
           | Quantizing it to 8-bit basically eliminates its ability to
           | write code.
        
       | schappim wrote:
       | I never thought that both Alpaca and LoRA would belong to such a
       | crowded tech namespace...
        
       | HnUser12 wrote:
       | > Tell me about you
       | 
       | >I am a 25-year-old woman from the United States. I have a
       | bachelor's degree in computer science and am currently pursuing a
       | master's degree in data science. I am passionate about technology
       | and am always looking for new ways to use it to make the world a
       | better place. Outside of work, I enjoy spending time with my
       | family and friends, reading, and traveling.
       | 
       | Well, I was starting to get tired of "as a AI language model"
       | disclaimer. Out of curiosity, is this model meant to be a 25 year
       | old personal assistant?
        
         | jonny_eh wrote:
         | No, it's just random "plausible" response. Re-roll the response
         | and you'll get something different.
         | 
         | Think of the prompt as "pretend you're some random person, tell
         | me some details"
        
       | kkielhofner wrote:
       | Ok, this is the base for actually self-hosted production use of
       | these things now (if you don't care about licensing...). I've
       | said in previous HN comments we've been a Dockerfile using an
       | Nvidia base image away from this for a while now (just never got
       | around to it myself).
       | 
       | I love the .ccp, Apple Silicon, etc projects but IMO for the time
       | being Nvidia is still king when it comes to multi-user production
       | use of these models with competitive response time, parameter
       | count/size, etc.
       | 
       | Of course as others pointed out the quality of these models still
       | leaves a lot to be desired but this is a good start for the
       | inevitable actually open models, finetuned variants, etc that are
       | being released on what seems like a daily basis at this point.
       | 
       | I'm walking through it (fun weekend project!) but my dual RTX
       | 4090 dev workstation will almost certainly scream with these
       | (even though VRAM isn't "great"). Over time with better and
       | better models (with compatible licenses) the OpenAI lead will get
       | smaller and smaller.
        
         | cuuupid wrote:
         | I'm hitting ChatGPT or faster speeds on my 3090. Have it
         | running the image with a reverse SSH tunnel to an EC2 instance
         | that's ferrying requests from the web. It only took 4 hours of
         | an afternoon, and based off the trending Databricks article on
         | HN we're probably only days away from a commercially licensed
         | model.
        
           | kkielhofner wrote:
           | Bit of a tangent, have you tried CloudFlare tunnels for what
           | you're doing? Literally one liner to install cloudflared and
           | boom service is on the internet with Cloudflare in front.
           | I've even used it in cases where my host was behind multiple
           | layers of NAT - just works. If you're concerned with speed
           | and performance I guarantee it will blow away your current
           | approach (while giving you all of the other Cloudflare
           | stuff). Of course if you hate CF (fair enough) disregard :).
           | 
           | I use this for an optimized hosted Whisper implementation
           | I've been working on. It hits 120x realtime with large v2 on
           | a 4090 and uses WebRTC to stream the audio in realtime with
           | datachannels for ASR responses. Hopefully a "Show HN" soon
           | once I get some legal stuff out of the way :). I mention it
           | because AFAIK it's many multiples faster than the OpenAI
           | hosted Whisper (especially for "realtime" speech).
           | 
           | I expect we'll see these kinds of innovations and more come
           | to self-hosted approaches generally and the open source
           | community will pull a web hosting, etc Microsoft vs
           | Linux/LAMP/etc 1990s/early 2000s situation on OpenAI where
           | open source wins in the end. The fact that MS is so heavily
           | invested in OpenAI is just history repeating itself.
           | 
           | Yep, saw the Databricks article! I don't try to make specific
           | time predictions but you're probably not far off :).
        
       ___________________________________________________________________
       (page generated 2023-03-24 23:00 UTC)