[HN Gopher] StarCoder and StarCoderBase: 15.5B parameter models ...
       ___________________________________________________________________
        
       StarCoder and StarCoderBase: 15.5B parameter models with 8K context
       length
        
       Author : belter
       Score  : 71 points
       Date   : 2023-05-15 21:06 UTC (1 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | fbodz wrote:
       | Has anyone figured out a way to fine tune this with 24gb of vram?
       | I have tried with deepspeed etc but no luck. Seems to be just out
       | of reach for fine tuning requiring 26gb.
        
         | csdvrx wrote:
         | Have you tried quantization? It's often a cheap and simple way
         | to reduce the VRAM requirements.
         | 
         | What hardware are you using? (CPU,RAM,GPU,VRAM)
         | 
         | Have you considered using llama.cpp for a mixed CPU+GPU use (if
         | you have enough RAM)
        
       | jimlongton wrote:
       | (Possibly naive question) This is marketed as open source. Does
       | that mean I can download the model and run it locally? If so,
       | what kind of GPU would I need?
        
         | pyrophane wrote:
         | Here is a good reference:
         | 
         | https://huggingface.co/docs/transformers/perf_train_gpu_one
        
       | cs702 wrote:
       | It's great to see this!
       | 
       | A big THANK YOU to everyone who made it possible.
       | 
       | I'm looking forward to playing with it -- and also, eventually,
       | inevitably, running a quantized, super-efficient version on my
       | laptop.
        
       | simonw wrote:
       | This is trained on The Stack, which is available here:
       | https://huggingface.co/datasets/bigcode/the-stack/
       | 
       | Interesting to note that The Stack is 6TB - the whole of the
       | RedPajama LLM training set (a lot more than just code) is only
       | 2.6TB.
       | 
       | To get an idea what that training data looks like, I grabbed the
       | first 300MB SQL file from
       | https://huggingface.co/datasets/bigcode/the-stack/tree/main/...
       | and then dumped the first 1,000 rows from that into JSON and
       | loaded it into Datasette Lite:
       | 
       | https://lite.datasette.io/?json=https://gist.github.com/simo...
       | 
       | Here's a query that shows a random row - hit the blue "Run SQL"
       | button to see another one:
       | https://lite.datasette.io/?json=https://gist.github.com/simo...
        
         | vlovich123 wrote:
         | Something tells me that I haven't trained on 6 TB of code and
         | can meaningfully outperform any AI. That tells me that there's
         | something still structurally missing about the training
         | efficiency. I wonder if this replicates to things like chess/go
         | - for a computer trained on the same number of games that a
         | human is, is the computer still able to outperform a human?
        
           | mysterydip wrote:
           | I wonder how curated the input data is. Just on the surface
           | of it, there's a lot of spaghetti code out there that people
           | may have shared. I once saw a codebase that used three
           | different implementations of a date/time structure and
           | overloaded operators to convert between them. Or people
           | rolling their own crypto, sort, or random functions,
           | reimplementing data structures, etc.
        
           | RangerScience wrote:
           | Is this training just to understand code, or is training to
           | understand code _and_ language?
           | 
           | (If we're comparing you to the model, is the model starting
           | at "baby" or "teenager"?)
        
       | bootloop wrote:
       | The biggest interest I have in this, is that I would like to have
       | the ability to ask questions about large code-bases. I think
       | being able to generate small functions or explain single code
       | sections is nice, but being able to ask bigger architectural
       | questions would be really helpful for all kind of engineers (in
       | particular in a large company).
       | 
       | I have seen approaches with merging context across multiple
       | levels. But that can only do so much. Is it viable to fine-train
       | a model to a specific code-base so it has knowledge across all
       | files? Does anyone have more info on this kind of problem space?
        
       | freeqaz wrote:
       | Looks like the model is on HuggingFace here, for anybody that is
       | curious to play with it. https://huggingface.co/bigcode/starcoder
        
       | ftxbro wrote:
       | Do I need to make an account on huggingface to get the model? I
       | would prefer not to do it, and just download a zip like you can
       | on github.
        
       | meghan_rain wrote:
       | tldr how does it compare to codepilot/gpt4?
        
         | bavell wrote:
         | From the summary:
         | 
         | "We perform the most comprehensive evaluation of Code LLMs to
         | date and show that StarCoderBase outperforms every open Code
         | LLM that supports multiple programming languages and matches or
         | outperforms the OpenAI code-cushman-001 model."
         | 
         | So I'd assume not up to par with gpt4 or copilot. Can't wait to
         | see it evolve from here!
        
       | [deleted]
        
       | nr2x wrote:
       | Given some of my own open source code is no doubt in GPT and
       | Bard, which feels wrong given the fees and limitations, I'm VERY
       | VERY excited for this!
        
       ___________________________________________________________________
       (page generated 2023-05-15 23:00 UTC)