[HN Gopher] Stanford Alpaca, and the acceleration of on-device L...
       ___________________________________________________________________
        
       Stanford Alpaca, and the acceleration of on-device LLM development
        
       Author : Kye
       Score  : 95 points
       Date   : 2023-03-13 19:54 UTC (3 hours ago)
        
 (HTM) web link (simonwillison.net)
 (TXT) w3m dump (simonwillison.net)
        
       | swyx wrote:
       | I feel like there has to be another shoe to drop here, this seems
       | almost too good to be true.
       | 
       | > Alpaca shows that you can apply fine-tuning with a feasible
       | sized set of examples (52,000) and cost ($600) such that even the
       | smallest of the LLaMA models--the 7B one, which can compress down
       | to a 4GB file with 4-bit quantization--provides results that
       | compare well to cutting edge text-davinci-003 in initial human
       | evaluation.
       | 
       | this is the most exciting thing. the cost of finetuning is
       | rapidly coming down which means everyone will be able to train
       | their own models for their usecases.
       | 
       | Looking for the contrarians on HN: what is being left unsaid here
       | that people like myself and Simon might be getting too optimistic
       | about? what are the known downsides that people in academia
       | already know about?
        
         | atleastoptimal wrote:
         | "initial human evaluation" is codeword for "cherrypicked
         | prompts given to people who don't know how to trick an LLM"
        
           | sebzim4500 wrote:
           | I don't understand why that is a bad thing. If your goal is
           | to make an AI assistant, then you should be optimizing for
           | giving answers that real users find useful, not trying to
           | impress other AI researchers.
        
           | ChubbyGlasses wrote:
           | i always found this to be a strange pov to have on LLMs. imo,
           | it's not humans tricking/gaming the ai, but rather chatgpt
           | has tricked you into believing it's smarter than it actually
           | is. (in human terms, chatgpt is just more articulate than
           | llama)
           | 
           | it's a subtle distinction, but i think it shapes and reflects
           | how you view ai as a tool for humans or as a replacement.
        
         | dougmwne wrote:
         | First catch is that someone needed to spend the enormous up
         | front cost to train the base model, then release it under a
         | flexible enough license for your use case.
         | 
         | The second catch is that you would get much higher quality out
         | of the 65b model, but would need to lay out a few thousand for
         | the hardware.
         | 
         | The third catch is that you need the fine tuning data, but that
         | seems easier than ever to create out of more capable LMMs.
        
         | blueblimp wrote:
         | It seems still unclear how much quality loss there is compared
         | to the best models. What's really needed is systematic
         | evaluation of the output quality, but that's tricky and
         | relatively expensive (compared to automated benchmarks), so I
         | understand why it hasn't happened yet.
         | 
         | Edit: I just tried it with a single task of my own (that I've
         | successfully used with ChatGPT and Bing) and it flubbed it
         | horribly, so this model at least is noticeably inferior to the
         | SOTA, which is not surprising given how small it is.
        
           | yunyu wrote:
           | I assume you haven't tried Alpaca (which hasn't been
           | released), only Llama. See the instruction fine tuning
           | section in the article.
        
             | karmasimida wrote:
             | They currently only supports a single input/response format
             | of input right? Multi turns will be more challenging to
             | handle.
             | 
             | I am optimistic for 30B or 66B to catch up with OpenAI, but
             | 7B is unlikely to have the same quality.
        
         | Kye wrote:
         | The big failure mode is they can hallucinate nonsense that
         | isn't obviously nonsense. You have to check any facts against
         | expert sources. At that point, you could just email an expert
         | who can use their own LLM to whip up an answer and check the
         | facts themselves.
        
           | simonw wrote:
           | That's a big problem if you're using a language model as a
           | search engine. The trick is to learn how to use them for the
           | things that they're good for outside of that.
        
             | typest wrote:
             | ^^ this. For instance, LLMs are really good at turning
             | natural language into SQL. And if you know SQL, you can
             | read it and make sure it looks good. But, it's much faster
             | and easier than writing SQL by hand.
        
               | flir wrote:
               | But that's still "You have to check any facts against
               | expert sources"! You just have the advantage of being
               | your own personal expert.
        
         | porcc wrote:
         | We saw this happen with Stable Diffusion and it's not
         | surprising we see this happening here. There is a lot of
         | interest in taking these models that are in striking distance
         | (single order of magnitude) from running inference and training
         | on consumer level hardware and as such a lot of energy is going
         | into making the optimizations that can get us there.
         | 
         | Generally speaking, research is not usually done with consumer
         | usage in mind, so what this is, and Dreambooth etc. for Stable
         | Diffusion was, is that gap between researcher software and
         | accessible software being bridged.
        
         | smoldesu wrote:
         | > what is being left unsaid here that people like myself and
         | Simon might be getting too optimistic about?
         | 
         | The past week has felt like a wake-up call to enthusiasts.
         | Running models locally has been available for a while (even
         | small, fairly coherent ones), and the majority of
         | "improvements" recently have come from implementing the leaked
         | LLaMa model.
         | 
         | The results from 7B are an improvement on what we had a year
         | ago, but not by much. We're learning that there's room to
         | optimize these models, but _also_ that size matters. ChatGPT
         | and 7B are both great at bullshitting, but you can feel the
         | difference in model size during regular conversation. Adding
         | insult to injury, it will almost always be faster to query an
         | API for AI results than it will be to run it locally.
         | 
         | Analysis: Things are moving at a clip right now, but people
         | expecting competitive LLMs running locally on their smartphone
         | will be disappointed for quite a while. As the technology
         | improves, it's also safe to assume that we'll find ways to
         | scale model intelligence with greater resources, and the status
         | quo will look much different than it does today.
        
           | atleastoptimal wrote:
           | >API for AI results than it will be to run it locally. True,
           | and remotely called APIs will always be the mover for the AI
           | craze. Only niche hobbyists will be running them locally.
           | 
           | There is no company on the planet that would benefit from
           | providing people local means to run LLMs. As a result only
           | hacks and leaks will be how individuals can manage to run
           | LLMs outside of heavily monitored remote API calls.
        
             | flangola7 wrote:
             | Who said anything about companies? Companies don't benefit
             | by giving people free access to buildings full of books and
             | knowledge, yet here they are.
        
               | atleastoptimal wrote:
               | In the last 50 years of AI research has any academic
               | institution ever provided open source easy to use tools
               | like the stuff big companies have put out in the past 5
               | years?
        
               | simonw wrote:
               | Stable Diffusion came from an academic research lab.
        
             | niemandhier wrote:
             | Companies like Facebook can harm their competitors by
             | releasing models.
             | 
             | Facebook is not a major player in the Llm field, the
             | technological advantage of openai is to large, BUT they can
             | reduce the expected gains of their competition by providing
             | less powerful alternatives for free.
        
             | mikek wrote:
             | Apple comes to mind.
        
               | BryantD wrote:
               | Agreed. Apple will run models remotely if necessary, but
               | from a PR perspective they align with their stated
               | intentions when they can run locally.
        
         | flir wrote:
         | At a guess: assuming quality scales with size, the model in the
         | data centre is always going to outcompete the model on the
         | device. So in any situation where you've got bandwidth
         | >4800bps, why would you choose the model on the device?
        
           | CuriouslyC wrote:
           | Your own fine tuning, no restrictions on output,
           | privacy/security, and if you have a reason to produce a lot
           | of output it'll be cheaper. Use ChatGPT if you only want to
           | use it occasionally, you don't care about privacy/security in
           | this context, the output restrictions don't bother you and
           | having the best possible language model is the most important
           | thing to you.
        
       | warning26 wrote:
       | Really neat!
       | 
       |  _> Second, the instruction data is based OpenAI's text-
       | davinci-003, whose terms of use prohibit developing models that
       | compete with OpenAI._
       | 
       | Wow, that seems really sketchy on the part of OpenAI. Even
       | considering their overall lack of openness, this clause feels
       | particularly egregious.
        
         | kir-gadjello wrote:
         | Charitably speaking the researchers had little time to execute
         | this, so they just ended up using the well known OpenAI API.
         | Still, it would be very useful if someone used LLaMA-65B
         | instead of text-davinci-003 here.
         | 
         | Someone should ask the researchers, either via email or via
         | github pull request, it shouldn't even be that hard to do.
        
         | flangola7 wrote:
         | That has to run afoul of competition/antitrust laws and be
         | unforceable. Imagine if Ford tried to tell people you can't use
         | their pickups to carry tools around on a new Honda plant
         | construction site.
        
       | [deleted]
        
       | macintux wrote:
       | Active discussion on Alpaca:
       | https://news.ycombinator.com/item?id=35136624
       | 
       | Also: https://news.ycombinator.com/item?id=35139450
        
         | dang wrote:
         | Thanks! Macroexpanded:
         | 
         |  _Alpaca: A strong open-source instruction-following model_ -
         | https://news.ycombinator.com/item?id=35136624
         | 
         | Also recent and related:
         | 
         |  _Large language models are having their Stable Diffusion
         | moment_ - https://news.ycombinator.com/item?id=35111646 - March
         | 2023 (355 comments)
        
       ___________________________________________________________________
       (page generated 2023-03-13 23:00 UTC)