[HN Gopher] DeepSpeed Chat: Easy, fast and affordable RLHF train... ___________________________________________________________________ DeepSpeed Chat: Easy, fast and affordable RLHF training of ChatGPT- like models Author : quantisan Score : 40 points Date : 2023-04-12 21:48 UTC (1 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | teruakohatu wrote: | Does the RLHF help with training a LLM model to produce better | (more accurate) results for a particular problem domain (eg. | Customer support for a particular company) or is it only helpful | in training the LLM to be a chat agent in general or a chat agent | with guard rails? | brofallon wrote: | To use RLHF you need a dataset that includes instructions with | good & bad answers - do many of those exist? I know there are a | few datasets of just plain instructions-with-responses, but I'm | not aware of any that have both good and bad (or ranked) | responses. Is that trivial, or an important missing element here? | sdenton4 wrote: | All of the UX interface have little up/down thumb icons... | that's where the boolean feedback comes from. If people stop | using that, sentiment analysis on the human responses will | likely go a long way. | summarity wrote: | Also see the example repo README: | https://github.com/microsoft/DeepSpeedExamples/tree/master/a... | | > With just one click, you can train, generate and serve a 1.3 | billion parameter ChatGPT model within 1.36 hours on a single | consumer-grade NVIDIA A6000 GPU with 48GB memory. On a single DGX | node with 8 NVIDIA A100-40G GPUs, DeepSpeed-Chat enables training | for a 13 billion parameter ChatGPT model in 13.6 hours. On multi- | GPU multi-node systems (cloud scenarios),i.e., 8 DGX nodes with 8 | NVIDIA A100 GPUs/node, DeepSpeed-Chat can train a 66 billion | parameter ChatGPT model under 9 hours. Finally, it enables 15X | faster training over the existing RLHF systems | | > The following are some of the open-source examples that are | powered by DeepSpeed: Databricks Dolly, LMFlow, CarperAI-TRLX, | Huggingface-PEFT | | (disclaimer: MSFT/GH employee, not affiliated with this project) | nacs wrote: | > single consumer-grade NVIDIA A6000 GPU with 48GB memory | | I wouldn't call an A6000 "consumer-grade" -- it's about $5000. | | Top of the line consumer grade GPU would be a Nvidia RTX | 4090/3090 with 24GB VRAM. | tinco wrote: | Microsoft: invests 10 billion in company. Also Microsoft: here's | the tools you need to DIY one of the premium features the company | we just invested 10 billion in for free. | | Not that reproducing GPT-4 is going to be easy with this, but | it'll definitely get rid of some major hurdles. I read a report | about the difficulties HuggingFace had with producing their Bloom | model, and a lot of it was the sort of straight forward systems | engineering that goes into tooling like this. | | Is the Bloom model considered a failure by the community? If you | read the introduction it was supposed to include improvements | over GPT3, but it performs much worse, I guess because of lower | quality training data? I wonder what sort of company would have | high enough quality data that they could use this project to fine | tune a public model to the point where it would be better in some | scenario than plain old GPT4 would be. Especially when you can | just inject extra info in to the GPT4 prompt, like phind does for | example. What even is the use of fine tuning given GPT 4 exists? ___________________________________________________________________ (page generated 2023-04-12 23:00 UTC)