[HN Gopher] Training and aligning LLMs with RLHF and RLHF altern... ___________________________________________________________________ Training and aligning LLMs with RLHF and RLHF alternatives Author : rasbt Score : 65 points Date : 2023-09-10 14:04 UTC (8 hours ago) (HTM) web link (magazine.sebastianraschka.com) (TXT) w3m dump (magazine.sebastianraschka.com) | scoresmoke wrote: | Discussions about LLM alignment often miss topics of data quality | and quantity. It turns out that current models like Llama 2 use | 10K+ prompts and responses for supervised fine-tuning (SFT) and | 100K+ human preference pairs. While the preferences are pretty | easy to annotate, producing a good SFT dataset is uneasy. | | https://evalovernite.substack.com/p/rlhf-math-aint-enough | | https://doi.org/10.5281/zenodo.8186168 | jamesblonde wrote: | I read here that Yann LeCun claimed that even with RLHF, LLMs | will still hallucinate - that it's an unavoidable consequence of | their autoregressive nature | | https://www.hopsworks.ai/dictionary/rlhf-reinforcement-learn... | ShamelessC wrote: | That goes without saying. | | edit: I don't like your linked article at all. Subtly | misleading and/or misinformed. Like a yahoo news but for ML. | | to clarify: No one (certainly not OpenAI) suggested that RLHF | was useful for reducing hallucinations. It's not for that. The | insinuation that it was designed for that purpose (at least | partially) and yet "failed" is a faulty one. It was not | designed for that purpose. Hallucinations are a known issue | with large language models, and while I appreciate LeCunn re- | iterating that; lesser researchers than LeCunn are aware of | that fact. | og_kalu wrote: | Likely yes. But "solving" hallucinations is not really | important as long as mitigating it to some sufficiently low | level is possible. | phillipcarter wrote: | Moreover, it's all about use case. If you need a high degree | of reliability and reproducibility, don't use LLMs! Not yet, | at least. That's fine though, because there's a ton of value | they offer in solving problems where that isn't needed. | 3abiton wrote: | I wonder if there will be a new metric implemented in | evaluating LLMs: Hallucination score. | bugglebeetle wrote: | > If you need a high degree of reliability and | reproducibility, don't use LLMs! | | This is true of pretty much all of machine learning. LLMs | are just getting singled out because their outputs are not | getting the same level of validation that typicall occurs | with older approaches. BERT models will also spit out | whacky stuff, depending on how they're trained/fine- | tuned/used/etc | bugglebeetle wrote: | For many NLP tasks (which is what I mostly use LLMs for), | hallucinations can be prevented with simple, procedural | checks against the input or a controlled vocabulary. For | example, for NER tasks, you can just check whether the | extracted entities are valid relative to either of the two. | Geee wrote: | What datasets OpenAI uses for RLHF? Is the assumption correct | that it's "time & labor intensive"? Couldn't you take ranked | responses from HN / Reddit / Stack Exchange / Quora etc. where | answers are already ranked, and train the reward model on that? ___________________________________________________________________ (page generated 2023-09-10 23:00 UTC)