[HN Gopher] Alpaca-LoRA with Docker ___________________________________________________________________ Alpaca-LoRA with Docker Author : syntaxing Score : 139 points Date : 2023-03-24 11:41 UTC (11 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | jvanderbot wrote: | This is neat and all but both Alpaca and Lora are things I | already use and already read about on HN, except now their names | are bulldozed by LLM tech and things will never be the same. | gitfan86 wrote: | Just run all your web browsing through GTP and tell it to | differentiate them for you | yieldcrv wrote: | cloned, hmu if that repo gets nuked | danso wrote: | From the repo README: | | > _Try the pretrained model out here, courtesy of a GPU grant | from Huggingface!_ | | https://huggingface.co/spaces/tloen/alpaca-lora | | Anyone else getting error messages when trying to submit | instructions to the model on Huggingface? It just says "Error" so | I don't know if it's a "too many users" problem or something else | | edit: nevermind, I was able to get a response after a few more | tries, plus a 20 second processing time | zapdrive wrote: | Sorry this is moving too fast for me. So if I understand | correctly, LoRa kind of does what Alpaca does but using different | data. | | So what is Alpaca-Lora? I know you get Alpaca by retraining Llama | using Stanford Alpaca 52k instruction-following data? So if I am | guessing right, you get Aplaca-Lora by retraining Alpaca using | Lora's data? | return_to_monke wrote: | I think your first statement is incorrect. Lora seems to be a | method to fine-tune and optimize the weights of models like | Alpaca. It is not a different dataset. | | This reduces model sizes and therefore also compute costs. | | See the abstract of https://arxiv.org/pdf/2106.09685.pdf | sp332 wrote: | This says "We provide an Instruct model of similar quality to | text-davinci-003", but two paragraphs later says the output is | comparable to Stanford's Alpaca. Those seem like very different | claims. | MacsHeadroom wrote: | "We performed a blind pairwise comparison between text- | davinci-003 and Alpaca 7B, and we found that these two models | have very similar performance: Alpaca wins 90 versus 89 | comparisons against text-davinci-003." | | https://crfm.stanford.edu/2023/03/13/alpaca.html | ChrisAlexiuk wrote: | Hey! Thanks for linking this! | | The work was all done by the original repo author - just added a | Dockerfile! | saurik wrote: | Yesterday there was a discussion about an article which goes into | the usage of Alpaca-LoRA. | | https://news.ycombinator.com/item?id=35279656 | dougmwne wrote: | What is the final size of the weights? | teekert wrote: | That name is so unfortunate. Nobody searched "Lora" before | picking it. Bit of a blunder if you ask me. | b33j0r wrote: | They even capitalize the R like LoRa, but I don't think we'll | be running this model on an ESP32 to much profit. | | Perhaps someone will release a llama I can run at home... how | about "llama-homekit"? ;) | nico wrote: | The demo on HuggingFace with the pre trained mode doesn't seem | that good. | | Although better than Bard (btw, Bard sucks compared to ChatGPT | and can't even do translations - which I would have expected out | of the box from Google) | syntaxing wrote: | It's worth noting this is the 7B model (nonquantized). You can | get this running on pretty much any GPU with 8GB VRAM and | above. You can run the 13B model but that would take two GPU or | reducing FP16 to FP8 (I haven't tried it myself). A single | connection for chatgpt is rumored to require 8X A100. | nico wrote: | It makes me wonder if this trend will kill NVIDIA. | | At this pace we might not even need GPUs anymore. | jnwatson wrote: | The race for bigger NNs will never stop. | zamalek wrote: | Quantizing it to 8-bit basically eliminates its ability to | write code. | schappim wrote: | I never thought that both Alpaca and LoRA would belong to such a | crowded tech namespace... | HnUser12 wrote: | > Tell me about you | | >I am a 25-year-old woman from the United States. I have a | bachelor's degree in computer science and am currently pursuing a | master's degree in data science. I am passionate about technology | and am always looking for new ways to use it to make the world a | better place. Outside of work, I enjoy spending time with my | family and friends, reading, and traveling. | | Well, I was starting to get tired of "as a AI language model" | disclaimer. Out of curiosity, is this model meant to be a 25 year | old personal assistant? | jonny_eh wrote: | No, it's just random "plausible" response. Re-roll the response | and you'll get something different. | | Think of the prompt as "pretend you're some random person, tell | me some details" | kkielhofner wrote: | Ok, this is the base for actually self-hosted production use of | these things now (if you don't care about licensing...). I've | said in previous HN comments we've been a Dockerfile using an | Nvidia base image away from this for a while now (just never got | around to it myself). | | I love the .ccp, Apple Silicon, etc projects but IMO for the time | being Nvidia is still king when it comes to multi-user production | use of these models with competitive response time, parameter | count/size, etc. | | Of course as others pointed out the quality of these models still | leaves a lot to be desired but this is a good start for the | inevitable actually open models, finetuned variants, etc that are | being released on what seems like a daily basis at this point. | | I'm walking through it (fun weekend project!) but my dual RTX | 4090 dev workstation will almost certainly scream with these | (even though VRAM isn't "great"). Over time with better and | better models (with compatible licenses) the OpenAI lead will get | smaller and smaller. | cuuupid wrote: | I'm hitting ChatGPT or faster speeds on my 3090. Have it | running the image with a reverse SSH tunnel to an EC2 instance | that's ferrying requests from the web. It only took 4 hours of | an afternoon, and based off the trending Databricks article on | HN we're probably only days away from a commercially licensed | model. | kkielhofner wrote: | Bit of a tangent, have you tried CloudFlare tunnels for what | you're doing? Literally one liner to install cloudflared and | boom service is on the internet with Cloudflare in front. | I've even used it in cases where my host was behind multiple | layers of NAT - just works. If you're concerned with speed | and performance I guarantee it will blow away your current | approach (while giving you all of the other Cloudflare | stuff). Of course if you hate CF (fair enough) disregard :). | | I use this for an optimized hosted Whisper implementation | I've been working on. It hits 120x realtime with large v2 on | a 4090 and uses WebRTC to stream the audio in realtime with | datachannels for ASR responses. Hopefully a "Show HN" soon | once I get some legal stuff out of the way :). I mention it | because AFAIK it's many multiples faster than the OpenAI | hosted Whisper (especially for "realtime" speech). | | I expect we'll see these kinds of innovations and more come | to self-hosted approaches generally and the open source | community will pull a web hosting, etc Microsoft vs | Linux/LAMP/etc 1990s/early 2000s situation on OpenAI where | open source wins in the end. The fact that MS is so heavily | invested in OpenAI is just history repeating itself. | | Yep, saw the Databricks article! I don't try to make specific | time predictions but you're probably not far off :). ___________________________________________________________________ (page generated 2023-03-24 23:00 UTC)