[HN Gopher] Stanford Alpaca, and the acceleration of on-device L... ___________________________________________________________________ Stanford Alpaca, and the acceleration of on-device LLM development Author : Kye Score : 95 points Date : 2023-03-13 19:54 UTC (3 hours ago) (HTM) web link (simonwillison.net) (TXT) w3m dump (simonwillison.net) | swyx wrote: | I feel like there has to be another shoe to drop here, this seems | almost too good to be true. | | > Alpaca shows that you can apply fine-tuning with a feasible | sized set of examples (52,000) and cost ($600) such that even the | smallest of the LLaMA models--the 7B one, which can compress down | to a 4GB file with 4-bit quantization--provides results that | compare well to cutting edge text-davinci-003 in initial human | evaluation. | | this is the most exciting thing. the cost of finetuning is | rapidly coming down which means everyone will be able to train | their own models for their usecases. | | Looking for the contrarians on HN: what is being left unsaid here | that people like myself and Simon might be getting too optimistic | about? what are the known downsides that people in academia | already know about? | atleastoptimal wrote: | "initial human evaluation" is codeword for "cherrypicked | prompts given to people who don't know how to trick an LLM" | sebzim4500 wrote: | I don't understand why that is a bad thing. If your goal is | to make an AI assistant, then you should be optimizing for | giving answers that real users find useful, not trying to | impress other AI researchers. | ChubbyGlasses wrote: | i always found this to be a strange pov to have on LLMs. imo, | it's not humans tricking/gaming the ai, but rather chatgpt | has tricked you into believing it's smarter than it actually | is. (in human terms, chatgpt is just more articulate than | llama) | | it's a subtle distinction, but i think it shapes and reflects | how you view ai as a tool for humans or as a replacement. | dougmwne wrote: | First catch is that someone needed to spend the enormous up | front cost to train the base model, then release it under a | flexible enough license for your use case. | | The second catch is that you would get much higher quality out | of the 65b model, but would need to lay out a few thousand for | the hardware. | | The third catch is that you need the fine tuning data, but that | seems easier than ever to create out of more capable LMMs. | blueblimp wrote: | It seems still unclear how much quality loss there is compared | to the best models. What's really needed is systematic | evaluation of the output quality, but that's tricky and | relatively expensive (compared to automated benchmarks), so I | understand why it hasn't happened yet. | | Edit: I just tried it with a single task of my own (that I've | successfully used with ChatGPT and Bing) and it flubbed it | horribly, so this model at least is noticeably inferior to the | SOTA, which is not surprising given how small it is. | yunyu wrote: | I assume you haven't tried Alpaca (which hasn't been | released), only Llama. See the instruction fine tuning | section in the article. | karmasimida wrote: | They currently only supports a single input/response format | of input right? Multi turns will be more challenging to | handle. | | I am optimistic for 30B or 66B to catch up with OpenAI, but | 7B is unlikely to have the same quality. | Kye wrote: | The big failure mode is they can hallucinate nonsense that | isn't obviously nonsense. You have to check any facts against | expert sources. At that point, you could just email an expert | who can use their own LLM to whip up an answer and check the | facts themselves. | simonw wrote: | That's a big problem if you're using a language model as a | search engine. The trick is to learn how to use them for the | things that they're good for outside of that. | typest wrote: | ^^ this. For instance, LLMs are really good at turning | natural language into SQL. And if you know SQL, you can | read it and make sure it looks good. But, it's much faster | and easier than writing SQL by hand. | flir wrote: | But that's still "You have to check any facts against | expert sources"! You just have the advantage of being | your own personal expert. | porcc wrote: | We saw this happen with Stable Diffusion and it's not | surprising we see this happening here. There is a lot of | interest in taking these models that are in striking distance | (single order of magnitude) from running inference and training | on consumer level hardware and as such a lot of energy is going | into making the optimizations that can get us there. | | Generally speaking, research is not usually done with consumer | usage in mind, so what this is, and Dreambooth etc. for Stable | Diffusion was, is that gap between researcher software and | accessible software being bridged. | smoldesu wrote: | > what is being left unsaid here that people like myself and | Simon might be getting too optimistic about? | | The past week has felt like a wake-up call to enthusiasts. | Running models locally has been available for a while (even | small, fairly coherent ones), and the majority of | "improvements" recently have come from implementing the leaked | LLaMa model. | | The results from 7B are an improvement on what we had a year | ago, but not by much. We're learning that there's room to | optimize these models, but _also_ that size matters. ChatGPT | and 7B are both great at bullshitting, but you can feel the | difference in model size during regular conversation. Adding | insult to injury, it will almost always be faster to query an | API for AI results than it will be to run it locally. | | Analysis: Things are moving at a clip right now, but people | expecting competitive LLMs running locally on their smartphone | will be disappointed for quite a while. As the technology | improves, it's also safe to assume that we'll find ways to | scale model intelligence with greater resources, and the status | quo will look much different than it does today. | atleastoptimal wrote: | >API for AI results than it will be to run it locally. True, | and remotely called APIs will always be the mover for the AI | craze. Only niche hobbyists will be running them locally. | | There is no company on the planet that would benefit from | providing people local means to run LLMs. As a result only | hacks and leaks will be how individuals can manage to run | LLMs outside of heavily monitored remote API calls. | flangola7 wrote: | Who said anything about companies? Companies don't benefit | by giving people free access to buildings full of books and | knowledge, yet here they are. | atleastoptimal wrote: | In the last 50 years of AI research has any academic | institution ever provided open source easy to use tools | like the stuff big companies have put out in the past 5 | years? | simonw wrote: | Stable Diffusion came from an academic research lab. | niemandhier wrote: | Companies like Facebook can harm their competitors by | releasing models. | | Facebook is not a major player in the Llm field, the | technological advantage of openai is to large, BUT they can | reduce the expected gains of their competition by providing | less powerful alternatives for free. | mikek wrote: | Apple comes to mind. | BryantD wrote: | Agreed. Apple will run models remotely if necessary, but | from a PR perspective they align with their stated | intentions when they can run locally. | flir wrote: | At a guess: assuming quality scales with size, the model in the | data centre is always going to outcompete the model on the | device. So in any situation where you've got bandwidth | >4800bps, why would you choose the model on the device? | CuriouslyC wrote: | Your own fine tuning, no restrictions on output, | privacy/security, and if you have a reason to produce a lot | of output it'll be cheaper. Use ChatGPT if you only want to | use it occasionally, you don't care about privacy/security in | this context, the output restrictions don't bother you and | having the best possible language model is the most important | thing to you. | warning26 wrote: | Really neat! | | _> Second, the instruction data is based OpenAI's text- | davinci-003, whose terms of use prohibit developing models that | compete with OpenAI._ | | Wow, that seems really sketchy on the part of OpenAI. Even | considering their overall lack of openness, this clause feels | particularly egregious. | kir-gadjello wrote: | Charitably speaking the researchers had little time to execute | this, so they just ended up using the well known OpenAI API. | Still, it would be very useful if someone used LLaMA-65B | instead of text-davinci-003 here. | | Someone should ask the researchers, either via email or via | github pull request, it shouldn't even be that hard to do. | flangola7 wrote: | That has to run afoul of competition/antitrust laws and be | unforceable. Imagine if Ford tried to tell people you can't use | their pickups to carry tools around on a new Honda plant | construction site. | [deleted] | macintux wrote: | Active discussion on Alpaca: | https://news.ycombinator.com/item?id=35136624 | | Also: https://news.ycombinator.com/item?id=35139450 | dang wrote: | Thanks! Macroexpanded: | | _Alpaca: A strong open-source instruction-following model_ - | https://news.ycombinator.com/item?id=35136624 | | Also recent and related: | | _Large language models are having their Stable Diffusion | moment_ - https://news.ycombinator.com/item?id=35111646 - March | 2023 (355 comments) ___________________________________________________________________ (page generated 2023-03-13 23:00 UTC)