[HN Gopher] Godot-dodo - Finetuning LLaMA on single-language com...
       ___________________________________________________________________
        
       Godot-dodo - Finetuning LLaMA on single-language comment:code data
       pairs
        
       Author : minosu
       Score  : 12 points
       Date   : 2023-04-23 22:33 UTC (26 minutes ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | brucethemoose2 wrote:
       | This is fabulous.
       | 
       | Just want to add that there are efforts to impove training speed,
       | like this: https://github.com/Lightning-AI/lit-llama/issues/62
       | 
       | So the practical cost/dataset size for language finetunes is
       | bound to get better rapidly.
        
       | minosu wrote:
       | This repository presents finetuned LLaMA models that try to
       | address the limited ability of existing language models when it
       | comes to generating code for less popular programming languages.
       | 
       | gpt-3.5-turbo and gpt-4 have proven to be excellent coders, but
       | fall off sharply when asked to generate code for languages other
       | than Python/Javascript etc. The godot-dodo approach to address
       | this: Finetune smaller models on a single one of these languages,
       | using human-created code scraped from MIT-licensed GitHub
       | repositories, with existing GPT models generating instructions
       | for each code snippet.
       | 
       | This differs from the dataset generation approach used by
       | projects such as stanford-alpaca or gpt4all, in that the output
       | values of the training set remain high quality, human data, while
       | following the same instruction-following behavior. This will
       | likely prove more effective the more obscure the language. In
       | this case, GDScript was used, which is the scripting language for
       | the popular open-source game-engine Godot. The same approach
       | however can be applied to any other language.
       | 
       | Performance is promising, with the 7 billion parameter finetune
       | outperforming GPT models in producing syntax that compiles on
       | first try, while being somewhat less capable at following complex
       | instructions.
       | 
       | A comprehensive evaluation comparing all models can be found
       | here: https://github.com/minosvasilias/godot-
       | dodo/tree/main/models
        
       ___________________________________________________________________
       (page generated 2023-04-23 23:00 UTC)