[HN Gopher] StarCoder and StarCoderBase: 15.5B parameter models ... ___________________________________________________________________ StarCoder and StarCoderBase: 15.5B parameter models with 8K context length Author : belter Score : 71 points Date : 2023-05-15 21:06 UTC (1 hours ago) (HTM) web link (arxiv.org) (TXT) w3m dump (arxiv.org) | fbodz wrote: | Has anyone figured out a way to fine tune this with 24gb of vram? | I have tried with deepspeed etc but no luck. Seems to be just out | of reach for fine tuning requiring 26gb. | csdvrx wrote: | Have you tried quantization? It's often a cheap and simple way | to reduce the VRAM requirements. | | What hardware are you using? (CPU,RAM,GPU,VRAM) | | Have you considered using llama.cpp for a mixed CPU+GPU use (if | you have enough RAM) | jimlongton wrote: | (Possibly naive question) This is marketed as open source. Does | that mean I can download the model and run it locally? If so, | what kind of GPU would I need? | pyrophane wrote: | Here is a good reference: | | https://huggingface.co/docs/transformers/perf_train_gpu_one | cs702 wrote: | It's great to see this! | | A big THANK YOU to everyone who made it possible. | | I'm looking forward to playing with it -- and also, eventually, | inevitably, running a quantized, super-efficient version on my | laptop. | simonw wrote: | This is trained on The Stack, which is available here: | https://huggingface.co/datasets/bigcode/the-stack/ | | Interesting to note that The Stack is 6TB - the whole of the | RedPajama LLM training set (a lot more than just code) is only | 2.6TB. | | To get an idea what that training data looks like, I grabbed the | first 300MB SQL file from | https://huggingface.co/datasets/bigcode/the-stack/tree/main/... | and then dumped the first 1,000 rows from that into JSON and | loaded it into Datasette Lite: | | https://lite.datasette.io/?json=https://gist.github.com/simo... | | Here's a query that shows a random row - hit the blue "Run SQL" | button to see another one: | https://lite.datasette.io/?json=https://gist.github.com/simo... | vlovich123 wrote: | Something tells me that I haven't trained on 6 TB of code and | can meaningfully outperform any AI. That tells me that there's | something still structurally missing about the training | efficiency. I wonder if this replicates to things like chess/go | - for a computer trained on the same number of games that a | human is, is the computer still able to outperform a human? | mysterydip wrote: | I wonder how curated the input data is. Just on the surface | of it, there's a lot of spaghetti code out there that people | may have shared. I once saw a codebase that used three | different implementations of a date/time structure and | overloaded operators to convert between them. Or people | rolling their own crypto, sort, or random functions, | reimplementing data structures, etc. | RangerScience wrote: | Is this training just to understand code, or is training to | understand code _and_ language? | | (If we're comparing you to the model, is the model starting | at "baby" or "teenager"?) | bootloop wrote: | The biggest interest I have in this, is that I would like to have | the ability to ask questions about large code-bases. I think | being able to generate small functions or explain single code | sections is nice, but being able to ask bigger architectural | questions would be really helpful for all kind of engineers (in | particular in a large company). | | I have seen approaches with merging context across multiple | levels. But that can only do so much. Is it viable to fine-train | a model to a specific code-base so it has knowledge across all | files? Does anyone have more info on this kind of problem space? | freeqaz wrote: | Looks like the model is on HuggingFace here, for anybody that is | curious to play with it. https://huggingface.co/bigcode/starcoder | ftxbro wrote: | Do I need to make an account on huggingface to get the model? I | would prefer not to do it, and just download a zip like you can | on github. | meghan_rain wrote: | tldr how does it compare to codepilot/gpt4? | bavell wrote: | From the summary: | | "We perform the most comprehensive evaluation of Code LLMs to | date and show that StarCoderBase outperforms every open Code | LLM that supports multiple programming languages and matches or | outperforms the OpenAI code-cushman-001 model." | | So I'd assume not up to par with gpt4 or copilot. Can't wait to | see it evolve from here! | [deleted] | nr2x wrote: | Given some of my own open source code is no doubt in GPT and | Bard, which feels wrong given the fees and limitations, I'm VERY | VERY excited for this! ___________________________________________________________________ (page generated 2023-05-15 23:00 UTC)