[HN Gopher] Chinchilla's Death
       ___________________________________________________________________
        
       Chinchilla's Death
        
       Author : KolmogorovComp
       Score  : 56 points
       Date   : 2023-09-04 18:31 UTC (4 hours ago)
        
 (HTM) web link (espadrine.github.io)
 (TXT) w3m dump (espadrine.github.io)
        
       | newfocogi wrote:
       | While the article makes good observations, this would appear to
       | be a major oversight by leading research labs if they could have
       | just kept the gas pedal down on simpler models for longer and
       | gotten better performance. This is HackerNews - can we get
       | someone from OpenAI, DeepMind, or MetaAI to respond and justify
       | why cutting off the smaller models at a lower total compute
       | budget is justified?
        
         | v64 wrote:
         | The Llama 1 paper [1] was one of the earlier models to question
         | the assumption that more params = better model. Since then
         | they've released Llama 2 and this post is offering more
         | evidence that reinforces their hypothesis.
         | 
         | I wouldn't say it was an oversight by other labs that they
         | missed this. It's easier to just increase params on a model
         | over the same training set instead of gathering a larger
         | training set necessary for a smaller model. And at first,
         | increasing model size did seem to be the way forward, but we've
         | since hit diminishing returns. Now that we've hit that point,
         | we've begun exploring other options and the Llamas are early
         | evidence of another way forward.
         | 
         | [1] https://arxiv.org/abs/2302.13971
        
       | binarymax wrote:
       | What is the likelihood of overfitting the smaller models? It's
       | not obvious what the criteria and hyperparams are that prevent
       | that.
       | 
       | If there's no overfitting and the results get reproduced then
       | this is a very promising find.
        
       ___________________________________________________________________
       (page generated 2023-09-04 23:00 UTC)