[HN Gopher] Show HN: Tabby - A self-hosted GitHub Copilot
       ___________________________________________________________________
        
       Show HN: Tabby - A self-hosted GitHub Copilot
        
       I would like to introduce Tabby, which is a self-hosted alternative
       to GitHub Copilot that you can integrate into your hardware. While
       GitHub Copilot has made coding more efficient and less time-
       consuming by assisting developers with suggestions and completing
       code, it raises concerns around privacy and security.  Tabby is in
       its early stages, and we are excited to receive feedback from the
       community.  Its Github repository is located here:
       https://github.com/TabbyML/tabby.  We have also deployed the latest
       docker image to Huggingface for a live demo:
       https://huggingface.co/spaces/TabbyML/tabby.  Tabby is built on top
       of the popular Hugging Face Transformers / Triton FasterTransformer
       backend and is designed to be self-hosted, providing you with
       complete control over your data and privacy. In Tabby's next
       feature iteration, you can fine-tune the model to meet your project
       requirements.
        
       Author : wsxiaoys
       Score  : 292 points
       Date   : 2023-04-06 16:40 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | arco1991 wrote:
       | Very interesting to have a self-hosted version of Copilot. Will
       | definitely try this out.
        
       | mska wrote:
       | Out of curiosity, how do companies offering self-hosted/on-prem
       | solutions monetize their offerings?
       | 
       | Do they rely on legal contracts to prevent customers from using
       | the software for free or modifying it for their own purposes?
        
         | sebzim4500 wrote:
         | Presumably the same way as every software company before SaaS
         | took over a few years ago.
        
         | ushakov wrote:
         | VC money, mostly
        
         | wongarsu wrote:
         | Most of them do it by charging money for the product, just like
         | every other software company in the last four decades.
         | 
         | If we are talking about open source self-hosted specifically,
         | it's mostly consulting, paid support, and offering managed
         | hosting.
        
       | moonchrome wrote:
       | Anyone dare to guess how fast would GH copilot be if it ran
       | locally ?
       | 
       | My main problem with copilot is latency/speed - I would shell out
       | for 4090 if it meant I could use local copilot model that's super
       | fast/low latency/explores deep suggestions.
        
         | atq2119 wrote:
         | One of the weaknesses of transformers is that inference is
         | inherently serial, one token at a time, and the entire model
         | needs to be read once per token. This means that inference
         | (after prompt processing) is bounded by memory bandwidth
         | instead of compute.
         | 
         | That said, local solutions always tend to be lower latency than
         | cloud ones just because you get to skip the network.
        
       | jslakro wrote:
       | Considering it's an alpha version I think a VSCode extension is
       | the missing part
        
         | wsxiaoys wrote:
         | There it is:
         | https://marketplace.visualstudio.com/items?itemName=TabbyML....
        
       | DanHulton wrote:
       | Copilot has so far been pretty useful to me as a "sometimes
       | smarter intellisense". It'll frequently correctly guess the
       | arguments I want for a function (and their types), and every once
       | in a while I'll type `a.map(` and it'll auto-fill the
       | transformation code I was planning on writing.
       | 
       | The simpler the task I'm trying to do, the better chance it has
       | of being correct, but that's also the part where I feel I get the
       | most benefit from it, because I already thoroughly understand
       | exactly what I'm writing, why I'm writing it, and what it needs
       | to look like, and Copilot sometimes saves me the 5-30s it takes
       | to write it. Over a day, that adds up and I can move marginally
       | faster.
       | 
       | It's definitely not a 100x improvement (or even a 10x
       | improvement), but I'm glad to have it.
       | 
       | If this works as well, locally, to escape the privacy issue, I'll
       | be thrilled. Checking it out.
        
         | jrockway wrote:
         | Yeah, I think it's pretty good at this. I've also done "write a
         | test like the one above but for Bar instead of Foo" and it got
         | the right answer. It would be better to not duplicate code like
         | that in most cases, but sometimes it's OK.
         | 
         | I recently needed a script to grab all my github repos from the
         | github api and produce a CSV with the columns URL, name,
         | description, and it could not do that at all. Just hallucinated
         | libraries and APIs that did not exist.
        
         | dgunay wrote:
         | Copilot also helps a lot with verbose, RSI-inducing languages
         | like C++. On days where my hands are cold or my hands are just
         | not feeling as physically limber as usual, it's a big relief
         | when it autocompletes a long series of symbols for me.
         | 
         | I know regular old LSP/Intellisense helps here but you are
         | still often constrained to only autocompleting one token at a
         | time, and have to type it at least partially in most cases.
        
         | greenie_beans wrote:
         | i feel the same way, but the `a.map(` autofill is hit or miss
         | for me. it'll interrupt my train of thought and it'll take me a
         | minute to remember what i was trying to do, then i'll have to
         | review the code to make sure it's right. and often there's a
         | better way to do it, so i end up refactoring.
        
         | Rapzid wrote:
         | I was frustrated with it after a week and about quit it but am
         | developing a more nuanced perspective.
         | 
         | Tldr; Can't imagine using it with a language that doesn't have
         | great type checking to catch its BS, and they really need to
         | tune how it interacts with traditional intelligence.
         | 
         | Been using it in VSCode with C# and Typescript. It gets in the
         | way of the normal intellisense forcing you to use a shortcut to
         | get out and back to intellisense.
         | 
         | For me this was really getting in the way because working one
         | line at a time when you know what you need isn't its strong
         | suite.
         | 
         | Stubbing out methods or even classes, or offering a "possibly
         | correct" solution in areas you are fuzzy about is a strong
         | point. Even stubbing out objects, methods, and etc using your
         | existing code as a reference point...
         | 
         | But possible is the key. It's a bullshitter so it could
         | randomly invent parameters or interfaces that are plausible,
         | but not real. It also trips up on casing regularly.
         | 
         | All this to say; you gotta review everything it does and be on
         | the lookout. Without the tooling help of a typed lang like c#
         | or typescript this would be much harder.
         | 
         | I thought I was reaching a productive flow with this but then I
         | loaded my proj in Rider, which didn't have copilot installed,
         | and banged out some code and man it was frictionless in
         | comparison.
         | 
         | Feel like copilot should be opt in via a shortcut, not opt out
         | as it is. They really need to work out how to reduce the
         | friction of using intellisense when it's the best option which
         | it often is. But the copilot creators seem dead set on it
         | constantly back-seat driving.
        
         | [deleted]
        
         | yunwal wrote:
         | > to escape the privacy issue
         | 
         | Genuine question, do you not use GitHub for things other than
         | copilot? It seems to me either the privacy issues of copilot
         | are overblown or the privacy issues of GitHub itself are
         | underblown, because they both end up with basically the same
         | data.
        
           | littlestymaar wrote:
           | > Genuine question, do you not use GitHub for things other
           | than copilot?
           | 
           | I certainly not use Github for _everything_.
        
           | wongarsu wrote:
           | I have files open in VSCode that aren't committed to Git, and
           | that I wouldn't be allowed to commit there even in private
           | repos (customer data covered by GDPR).
           | 
           | In this context I think it's important to note the
           | distinction between "Copilot for Individuals" and "Copilot
           | for Business", because for twice the money you potentially
           | get a lot more privacy:
           | 
           | > What data does Copilot for Individuals collect?
           | 
           | > [...] Depending on your preferred telemetry settings,
           | GitHub Copilot may also collect and retain the following,
           | collectively referred to as "code snippets": source code that
           | you are editing, related files and other files open in the
           | same IDE or editor, URLs of repositories and files path.
           | 
           | > What data does Copilot for Business collect?
           | 
           | > [...] GitHub Copilot transmits snippets of your code from
           | your IDE to GitHub to provide Suggestions to you. Code
           | snippets data is only transmitted in real-time to return
           | Suggestions, and is discarded once a Suggestion is returned.
           | Copilot for Business does not retain any Code Snippets Data.
           | 
           | https://github.com/features/copilot
        
             | blibble wrote:
             | unless that's directly in the contract, I'd not trust
             | Microsoft to discard anything that might be of value in a
             | training dataset at some point in the future
             | 
             | say, if training is determined to be fair use
        
           | colinsane wrote:
           | "privacy" sits in a broader sea that includes "control",
           | "customization" and "lock-in". i've been using the GPT family
           | of tools less than otherwise because they're constantly
           | changing in ways that do or will disrupt any workflow i
           | integrate them into: new guard-rails are added such that new
           | versions explicitly don't support use-cases the old versions
           | do, etc.
           | 
           | self-hosting solves those problems: if done right, it gives
           | me a thing which is _static_ until i explicitly choose to
           | upgrade it, so i can actually integrate it into more longterm
           | workflows. if enough others also do this, there's some
           | expectation that when i do upgrade it i'll have a few options
           | (models) if the mainline one degrades in some way from my
           | ideal.
           | 
           | LLMs are one of the more difficult things to self-host here
           | due to the hardware requirements. there's enough interest
           | from enthusiasts right now that i'm hopeful we'll see ways to
           | overcome that though: pooling resources, clever forms of
           | caching, etc.
        
           | moonchrome wrote:
           | This - if you're hosting code on GH you're covered by worst
           | case unless you're worried they are collecting other
           | telemetry.
        
             | RussianCow wrote:
             | No, you're not, unless you never edit any code that you
             | don't commit. I write scratch code all the time, and I can
             | think of at least a few times where I hard-coded things
             | into these files that probably should never leave my work
             | computer.
        
         | japhyr wrote:
         | Yes, I went into all of this thinking the tools need to handle
         | complex code in order to really be useful. But that's the code
         | I want to be most careful about, so it's the code I really want
         | to architect myself, and understand thoroughly.
         | 
         | Making the simpler code much faster to develop leaves me a lot
         | more time and focus for the complex work.
         | 
         | These tools really are amazing, even with their limitations.
        
       | covi wrote:
       | This is awesome, and glad to see that SkyPilot is useful in
       | distributing Tabby on any cloud!
       | https://github.com/TabbyML/tabby/blob/main/deployment/skypil...
        
       | syntaxing wrote:
       | What sort of resources do we need to run this, particularly VRAM?
       | Also, how does this compare to Fauxpilot?
        
         | wsxiaoys wrote:
         | Tabby's philosophy is to achieve a completion rate comparable
         | to Codex/Copilot by using a model size of less than 1B, with
         | support for BF16/FP16 that reduces VRAM requirements to 2G or
         | less. This may seem impossible, given that the model size is 10
         | times smaller than that of Codex, but it is definitely
         | achievable, especially in an on-premises environment where
         | customers want to keep the code behind a firewall.
         | 
         | Related research works include [1]. (Hints: combine code search
         | and LLM).
         | 
         | [1]: RepoCoder: Repository-Level Code Completion Through
         | Iterative Retrieval and Generation
         | https://arxiv.org/abs/2303.12570
         | 
         | This also reveals Tabby's roadmap beyond other OSS work like
         | Fauxipilot :)
        
           | moffkalast wrote:
           | > philosophy is to achieve
           | 
           | That doesn't answer the question, can anyone without more
           | VRAM than sense actually run it as-is or should we wait until
           | they reach their allegedly impossible aspirational goal?
           | 
           | The very first line of this sort of post should be the specs
           | required and if the trained model weights are actually
           | available, otherwise it's just straight up clickbait.
        
             | sp332 wrote:
             | Models are usually trained in fp16, meaning 16 bits = 4
             | bytes per parameter. So a 1B model would take 4GB to begin
             | with, and can be optimised down from there with
             | sparsification and/or quantization. A 50% reduction in size
             | might be noticeably worse than the original, but still
             | useful for boilerplate or other highly predictable
             | patterns.
        
             | nacs wrote:
             | The post says it uses "2GB Or less" of VRAM.
             | 
             | A 1B parameter transformer model is on the low/tiny-end of
             | model size these days.
        
       | vikp wrote:
       | How does this compare to fauxpilot -
       | https://github.com/fauxpilot/fauxpilot? Fauxpilot also uses
       | Triton with fastertransformers and GPT-J style models (codegen).
        
         | wsxiaoys wrote:
         | Linked my replies to others' questions: [1], [2]
         | 
         | [1] https://news.ycombinator.com/item?id=35471882
         | 
         | [2] https://news.ycombinator.com/item?id=35471390
         | 
         | I hope these can be of help and answer your questions as well!
        
       | GartzenDeHaes wrote:
       | After using Github Copilot for a couple of weeks, it doesn't seem
       | to do much other than Stackoverflow/blog copy paste. Does anyone
       | get much else from it when writing non-boilerplate, non-tutorial,
       | and non-stackoverflow types of code?
        
         | moonchrome wrote:
         | What's wrong with using it for boilerplate ?
         | 
         | It costs 10$/month and probably saves me hours in stupid
         | boilerplate, not just because it reduces typing but because the
         | kind of typing it fixes is the boring stuff that gets me out of
         | flow.
         | 
         | And boilerplate is a vague term, good example of copilot "AI"
         | features :                 function(a, b, c, d) {         if(a
         | ...) { ..         }              if(b // starts generalizing
         | from a case but is very context aware              // here it's
         | already doing c and d         }
         | 
         | and we can be talking about very non-trivial permutations here
         | where it's obvious from context what you want to do but
         | generalizing to cover all cases would be more work than just
         | writing it.
         | 
         | And tests - sometimes it can generate a correct test with mocks
         | just from function name, sometimes it's garbage - but it's
         | really easy to glance at which it is.
         | 
         | My current experience with copilot for ~2 months is that it's
         | really good at such boilerplate, it's annoying when it's late
         | or fails at what I expect it but, and it makes generating
         | boilerplate code very cheap so it can lead to a lot of
         | duplication.
         | 
         | If you're good enough you know to pick up when you should apply
         | DRY and when not, you'll get a feel when copilot is useful and
         | it will be a decent productivity boost IMO. If they made it
         | faster and more reliable (just for the cases I already use it
         | for) I would pay >100$ month out of pocket for sure.
        
         | phamilton wrote:
         | My favorite usage of it is when writing database migrations.
         | I'll write my `up` case and then it will write the `down` case
         | for me. For migrations adding multiple columns, indexes,
         | triggers, etc. it's good about getting the right order for the
         | down case and generating the right commands.
         | 
         | Is it writing meaningful code for me? Nah. Is it helpful?
         | Certainly.
        
         | lispisok wrote:
         | That's pretty much my experience. I'll go entire days accepting
         | zero copilot suggestions. Another thing I dont like is if there
         | is some bad code in any files it is parsing in my project it
         | will suggest similar bad code.
        
         | dpkirchner wrote:
         | I haven't used Copilot. I find ChatGPT a lot more useful than
         | SO because I'm less likely to bump in to "duplicate" questions
         | (that are rarely duplicates) and where the only interactions
         | are criticism over the question's format or structure. I assume
         | using Copilot lets you avoid similar problems.
        
         | in3d wrote:
         | What programming language are you using? It's better in more
         | popular languages.
        
           | GartzenDeHaes wrote:
           | C#, #5 on the TIOBE index
        
             | egeozcan wrote:
             | I love C# and I used it extensively for a long time, but I
             | think Copilot shines with JS/TS and Python because of the
             | massive amount of code published in open source projects.
             | 
             | Not that I find it as useful as most people do, but there
             | is a difference...
             | 
             | It's also not that great with go, but surprisingly (to me)
             | still better than C#.
        
               | riceart wrote:
               | A bit ironic since StackOverflow began life as the
               | C#/.NET QA site - and nevermind copilot is a Microsoft
               | joint.
        
         | SkyPuncher wrote:
         | I suspect you're trying to feed to too large/vague of prompts.
         | 
         | I've found it amazing when working in semi-familiar programming
         | languages. I'm primarily a Ruby dev, but currently working in
         | Python. The languages are close enough that it's an easy
         | transition, but they do a lot of fundamentals differently.
         | 
         | Previously, it'd be extremely disruptive to my thought-process
         | to have to go lookup basic functions, like checking array
         | lengths or running a regex on a string.
         | 
         | Now, I just write a comment like `# check array length` or `#
         | on x_var, run this regex "\s+"`. Copilot spits back what I need
         | (or at least close enough to avoid having to break context).
         | 
         | Even in core languages, I'm finding it very useful for writing
         | system calls or niche functionality that I don't use
         | frequently. My mental model knows what needs to be done, but I
         | don't remember the exact calls off the top of my head.
        
         | anakaine wrote:
         | I've been using it to help write python for some scientific
         | calcs which I know that nobody else should be using on stack or
         | github. Certainly not in the way I'm implementing them.
         | 
         | Copilot is fantastic at predicting where I'm heading next once
         | I give it a little bit of a start. It helps me work out
         | unfamiliar functions that I might need to chain, or syntax I'm
         | not entirely familiar with.
         | 
         | I'd say it's a big winner for me.
        
         | akrymski wrote:
         | I actually still prefer SO as I get to see different solutions
         | to the same problem and related discussion. Co-pilot is great
         | for boilerplate stuff though, which is the majority of lines
         | for most projects these days I'd guess.
        
           | hn_user2 wrote:
           | If you like seeing options, you might be interested in having
           | the Copilot pane open while you code if you are using it. It
           | shows usually about 4 solutions in the panel that it is
           | thinking about for autocomplete, and you can click which one
           | you want.
           | 
           | Not necessarily better than SO but I find it nice to see
           | options.
        
         | michaelmior wrote:
         | I haven't used Copilot recently, but I have been using
         | Codium[0] which is similar. I find it often does a reasonably
         | good job at completing similar structures in my own projects
         | where I can't imagine it has seen similar examples. It doesn't
         | always do a great job, but it does save me time.
         | 
         | [0] https://codeium.com/
        
           | dalmo3 wrote:
           | I migrated from copilot to codeium as I found it much faster
           | and just as smart. Not sure if the speedup comes from network
           | latency (I'm in New Zealand) or actual model calculations.
           | That was pre GPT4 though, and I'm willing to try copilot
           | again soon.
        
       | lfkdev wrote:
       | This exakt name is already used by a big open-source projekt,
       | https://github.com/eugeny/tabby (50k stars) maybe consider
       | changing the name for better SEO
        
         | wsxiaoys wrote:
         | Thanks, I'm just so into the puns: Tab-by-ML / TabbyML / Tabby
        
           | boesboes wrote:
           | love it
        
       | akrymski wrote:
       | What LLM is this using? Or did you train your own?
        
         | wsxiaoys wrote:
         | Tabby is tested against various open-source models based on
         | GPT-J/GPT-NeoX, including https://huggingface.co/CarperAI/FIM-
         | NeoX-1.3B and
         | https://huggingface.co/Salesforce/codegen-350M-multi.
         | 
         | In the meantime, I'm training language-specific base models
         | using NeoX blocks. I hope to release them soon.
         | 
         | The Huggingface Space demo is running on a derived model of
         | Salesforce/codegen-350M-multi
        
       | wongarsu wrote:
       | Trying the demo I got "value is not a valid enumeration member;
       | permitted: 'unknown', 'python', 'javascript'". Trying some
       | clearly identifiable Rust code with language set to 'unknown' I
       | got a completion in Java back. The completion made sense, was
       | properly indented and syntactically correct, it just was the
       | wrong programming language.
       | 
       | Is this a limitation of the hosted demo or the chosen model, or
       | do I simply have to wait a bit until my favorite niche language
       | is supported?
        
       | nathancahill wrote:
       | This looks exactly like what I need for a project I've been
       | working on. How do you get your own code in to the model? Or is
       | that the future fine-tuning step you're talking about?
        
         | wsxiaoys wrote:
         | Tabby collected the source code from the related repository and
         | built them into a dataset. I already have some proof-of-concept
         | pipelines built [1] and [2], but I still need some time to
         | polish the data pipeline..
         | 
         | [1]:
         | https://github.com/TabbyML/tabby/blob/main/tabby/tools/repos...
         | 
         | [2]:
         | https://github.com/TabbyML/tabby/blob/main/tabby/tools/train...
        
           | nathancahill wrote:
           | Ok, looks like that makes sense! Is there a prompt token
           | limit like OpenAI models? Codex I believe has a 8k
           | prompt/response limit.
        
           | skerit wrote:
           | You can train this on all the code in your own repositories?
           | I would assume that makes the completions a lot better?
        
       | zoba wrote:
       | My assumption is that this would not be fast enough for practical
       | use on M1/M2 Macbooks. Is that correct?
        
       | simonw wrote:
       | Anyone know of a quick workaround for this?                   %
       | docker run \           -it --rm \           -v ./data:/data \
       | -v ./data/hf_cache:/home/app/.cache/huggingface \           -p
       | 5000:5000 \           -e MODEL_NAME=TabbyML/J-350M \
       | tabbyml/tabby         Unable to find image 'tabbyml/tabby:latest'
       | locally         latest: Pulling from tabbyml/tabby
       | docker: no matching manifest for linux/arm64/v8 in the manifest
       | list entries.         See 'docker run --help'.
       | 
       | I have an M2 Mac. I believe Docker is capable of running images
       | compiled for different architectures using QEMU style
       | workarounds, but is that something I can do with a one-liner or
       | would I need to build a new image from scratch?
       | 
       | Previous experiments with Docker and QEMU:
       | https://til.simonwillison.net/docker/emulate-s390x-with-qemu
        
         | markng wrote:
         | On my M1, --platform linux/amd64 seems to work (I think I
         | needed to turn rosetta on for this for some other stuff, I
         | don't recall.)
        
           | markng wrote:
           | (that said, I am currently downloading this image, and I've
           | had problems running x86-64 stuff in docker sometimes)
        
           | andrewmunsell wrote:
           | It doesn't seem to work for me on an M1, even with that flag
           | 
           | > OSError: [Errno 38] Function not implemented
        
       | myin wrote:
       | Great to have a self-hosted solution, both for data privacy and
       | quality improvement potentials.
        
       | grudg3 wrote:
       | Hi, will this work on AMD Gpu? I have plenty of VRAM available
        
       | nmstoker wrote:
       | Is this serious? There's very little to go on in the repo to
       | establish how well thought through this is.
       | 
       | I don't want to mark them down for poor language skills but the
       | style of the comments on the TabbyML GitHub profile suggests a
       | rather casual approach, and when combined with a lack of any
       | serious documentation or even basic details beyond a sketched
       | architecture diagram, I kind of wonder... Is there any particular
       | context others can point to that I may be overlooking?
        
         | nmstoker wrote:
         | And to clarify, I'm not trying to be down on the authors of the
         | repo, it's more a surprise this has got so high on HN without
         | much to support it
        
           | yieldcrv wrote:
           | the interest is in clients side LLMs and similar technologies
           | 
           | this is enthusiast level territory that doesn't require
           | professional window dressing at the moment
           | 
           | if you're the kind of person that needs that then you aren't
           | the target audience yet
        
             | nmstoker wrote:
             | I don't need marketing and window dressing, I'm interested
             | in evidence of it genuinely working but there's very little
             | here.
             | 
             | The idea of client side LLMs is kind of obviously a winner,
             | who wouldn't want that and who hasn't thought of that, so
             | this shouldn't shoot to the top on that basis.
             | 
             | The repo looks close to thrown together vapourware. I'm
             | keen not to undermine their efforts since we all aspire to
             | create great things but the fact this has risen so high
             | with so little supporting evidence says more about the
             | current mood here.
        
               | yieldcrv wrote:
               | its useful client side LLMs that people can potentially
               | package as a commercial offering, and this is an ongoing
               | gap in the market
               | 
               | So all sorts of vaporware will fill that, and people will
               | commend other people that seemingly take the initiative
               | instead of just thinking about it
        
       | uglycoyote wrote:
       | I'm confused about the premise here. The power of self-hosting
       | such a thing is presumably that you would be able to train it on
       | your own company's codebase as a corpus of examples to help other
       | people in the company know how to navigate the specifics of your
       | codebase.
       | 
       | But there's nothing in the introductory materials about how to
       | train this thing.
        
         | RhodesianHunter wrote:
         | Many of these that I've seen are more focused on keeping
         | company data private than on training the model(s) on company
         | data.
         | 
         | Which is pretty entertaining given how many of these companies
         | host their code on Github already.
        
         | PantaloonFlames wrote:
         | Yes, how could I train it on my codebase? Does the team
         | acknowledge this is a thing people would want to do?
         | 
         | Or am I misunderstanding the idea here?
        
         | sp332 wrote:
         | Sometimes it's just about avoiding boilerplate. But the last
         | sentence of the post here says that fine-tuning is coming in
         | the next release.
        
       ___________________________________________________________________
       (page generated 2023-04-06 23:00 UTC)