[HN Gopher] Chinchilla's Death ___________________________________________________________________ Chinchilla's Death Author : KolmogorovComp Score : 56 points Date : 2023-09-04 18:31 UTC (4 hours ago) (HTM) web link (espadrine.github.io) (TXT) w3m dump (espadrine.github.io) | newfocogi wrote: | While the article makes good observations, this would appear to | be a major oversight by leading research labs if they could have | just kept the gas pedal down on simpler models for longer and | gotten better performance. This is HackerNews - can we get | someone from OpenAI, DeepMind, or MetaAI to respond and justify | why cutting off the smaller models at a lower total compute | budget is justified? | v64 wrote: | The Llama 1 paper [1] was one of the earlier models to question | the assumption that more params = better model. Since then | they've released Llama 2 and this post is offering more | evidence that reinforces their hypothesis. | | I wouldn't say it was an oversight by other labs that they | missed this. It's easier to just increase params on a model | over the same training set instead of gathering a larger | training set necessary for a smaller model. And at first, | increasing model size did seem to be the way forward, but we've | since hit diminishing returns. Now that we've hit that point, | we've begun exploring other options and the Llamas are early | evidence of another way forward. | | [1] https://arxiv.org/abs/2302.13971 | binarymax wrote: | What is the likelihood of overfitting the smaller models? It's | not obvious what the criteria and hyperparams are that prevent | that. | | If there's no overfitting and the results get reproduced then | this is a very promising find. ___________________________________________________________________ (page generated 2023-09-04 23:00 UTC)