[HN Gopher] DeepSpeed Chat: Easy, fast and affordable RLHF train...
       ___________________________________________________________________
        
       DeepSpeed Chat: Easy, fast and affordable RLHF training of ChatGPT-
       like models
        
       Author : quantisan
       Score  : 40 points
       Date   : 2023-04-12 21:48 UTC (1 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | teruakohatu wrote:
       | Does the RLHF help with training a LLM model to produce better
       | (more accurate) results for a particular problem domain (eg.
       | Customer support for a particular company) or is it only helpful
       | in training the LLM to be a chat agent in general or a chat agent
       | with guard rails?
        
       | brofallon wrote:
       | To use RLHF you need a dataset that includes instructions with
       | good & bad answers - do many of those exist? I know there are a
       | few datasets of just plain instructions-with-responses, but I'm
       | not aware of any that have both good and bad (or ranked)
       | responses. Is that trivial, or an important missing element here?
        
         | sdenton4 wrote:
         | All of the UX interface have little up/down thumb icons...
         | that's where the boolean feedback comes from. If people stop
         | using that, sentiment analysis on the human responses will
         | likely go a long way.
        
       | summarity wrote:
       | Also see the example repo README:
       | https://github.com/microsoft/DeepSpeedExamples/tree/master/a...
       | 
       | > With just one click, you can train, generate and serve a 1.3
       | billion parameter ChatGPT model within 1.36 hours on a single
       | consumer-grade NVIDIA A6000 GPU with 48GB memory. On a single DGX
       | node with 8 NVIDIA A100-40G GPUs, DeepSpeed-Chat enables training
       | for a 13 billion parameter ChatGPT model in 13.6 hours. On multi-
       | GPU multi-node systems (cloud scenarios),i.e., 8 DGX nodes with 8
       | NVIDIA A100 GPUs/node, DeepSpeed-Chat can train a 66 billion
       | parameter ChatGPT model under 9 hours. Finally, it enables 15X
       | faster training over the existing RLHF systems
       | 
       | > The following are some of the open-source examples that are
       | powered by DeepSpeed: Databricks Dolly, LMFlow, CarperAI-TRLX,
       | Huggingface-PEFT
       | 
       | (disclaimer: MSFT/GH employee, not affiliated with this project)
        
         | nacs wrote:
         | > single consumer-grade NVIDIA A6000 GPU with 48GB memory
         | 
         | I wouldn't call an A6000 "consumer-grade" -- it's about $5000.
         | 
         | Top of the line consumer grade GPU would be a Nvidia RTX
         | 4090/3090 with 24GB VRAM.
        
       | tinco wrote:
       | Microsoft: invests 10 billion in company. Also Microsoft: here's
       | the tools you need to DIY one of the premium features the company
       | we just invested 10 billion in for free.
       | 
       | Not that reproducing GPT-4 is going to be easy with this, but
       | it'll definitely get rid of some major hurdles. I read a report
       | about the difficulties HuggingFace had with producing their Bloom
       | model, and a lot of it was the sort of straight forward systems
       | engineering that goes into tooling like this.
       | 
       | Is the Bloom model considered a failure by the community? If you
       | read the introduction it was supposed to include improvements
       | over GPT3, but it performs much worse, I guess because of lower
       | quality training data? I wonder what sort of company would have
       | high enough quality data that they could use this project to fine
       | tune a public model to the point where it would be better in some
       | scenario than plain old GPT4 would be. Especially when you can
       | just inject extra info in to the GPT4 prompt, like phind does for
       | example. What even is the use of fine tuning given GPT 4 exists?
        
       ___________________________________________________________________
       (page generated 2023-04-12 23:00 UTC)