[HN Gopher] CodeCompose: A large-scale industrial deployment of ...
       CodeCompose: A large-scale industrial deployment of AI-assisted
       code authoring
       Author : azhenley
       Score  : 112 points
       Date   : 2023-06-03 13:38 UTC (9 hours ago)
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
       | m3kw9 wrote:
       | Who else's is tired of these trillion dollar companies talk and
       | talk and have no products?
         | neuronexmachina wrote:
         | It's a paper about a (for now) internal product:
         | "In this paper we present CodeCompose, an AI-assisted code
         | authoring tool developed and deployed at Meta internally.
         | CodeCompose is based on the InCoder LLM that merges generative
         | capabilities with bi-directionality. We have scaled up
         | CodeCompose to serve tens of thousands of developers at Meta,
         | across 10+ programming languages and several coding surfaces."
         | Looks like the InCoder model it's based on can be downloaded
         | here: https://huggingface.co/facebook/incoder-6B
           | Veuxdo wrote:
           | Does companies publishing papers about internal tools serve
           | any purpose other than PR?
             | chromatin wrote:
             | There are many scholars working in industry (from software
             | engineering to biotech) who believe in the ideals of
             | information sharing and publication of research.
               | [deleted]
             | IshKebab wrote:
             | Of course it does. They give lots of details about the
             | tools and their use which is obviously helpful to anyone
             | wanting to do something similar.
             | A great example is
             | https://www.uber.com/blog/research/keeping-master-green-
             | at-s...
             | I think maybe Gitlab Merge Trains predate it, but it was
             | definitely influential.
             | lmeyerov wrote:
             | It was a useful read for me, esp seeing numbers on fine-
             | tuning & use. We are piloting a DB analyst tool where users
             | can use natural language to do DB queries, generate AI
             | analyses, make interactive GPU charts, etc, so many nearby
             | questions we think about a lot. Previously as a PhD
             | publishing on program synthesis, most of our writeups were
             | much smaller scale wrt live user evals, so all combiner...
             | super cool to see.
             | For FB... It probably helps keep the team valued internally
             | + helps with retention & recruiting. For PhD trained types,
             | this kind of paper is almost table stakes.
             | Less obvious... FB has been laying off teams like this
             | despite productivity ROI intuitions, so if I was there, I'd
             | be careful to quantify current + future ROI - I'm sure
             | there are key #'s not being shared.
             | williamstein wrote:
             | This paper says: "Customized for the organization:
             | CodeCompose is fine-tuned on Meta's internal code
             | repository, and is thus able to handle Meta-specific
             | languages such as Hack and Flow." If you work at an org
             | that might want to build their own LLM trained on their own
             | internal code base, then the lessons of this paper would be
             | of value to you.
               | Veuxdo wrote:
               | Makes sense. For Meta, though, by publishing papers like
               | this are they hoping for something else other than PR? My
               | only other guess would be attracting talent.
               | bee_rider wrote:
               | Lots of companies have research departments and release
               | papers; the people working in these departments have some
               | academic roots at least. The incentives for releasing
               | papers are:
               | * To raise your profile and reputation generally
               | * The specific publish or perish incentives in academia
               | * Because you really think you've done something
               | interesting and novel, and want to share it with the
               | world
               | Only the middle one is removed when going to industry.
       | alan-stark wrote:
       | The abstract says _..we present metrics from our large-scale
       | deployment of CodeCompose that shows its impact on Meta 's
       | internal code authoring experience over a 15-day time window,
       | where 4.5 million suggestions were made by CodeCompose.
       | Quantitative metrics reveal that (i) CodeCompose has an
       | acceptance rate of 22% across several languages, and (ii) 8% of
       | the code typed by users of CodeCompose is through accepting code
       | suggestions from CodeCompose. Qualitative feedback indicates an
       | overwhelming 91.5% positive reception for CodeCompose._
       | In other terms, out of 4.5 million suggestions about 80% were
       | off, yet there is 91% positive reception. That's 3.6 million
       | rejected suggestions that potentially distracted programmers from
       | doing their work. Yet users are happy. Is there a contradiction
       | in these figures?
         | YetAnotherNick wrote:
         | If you take random question from stack overflow, my guess is
         | that 80% of them don't have correct answer, yet I am very happy
         | stackoverflow exists.
           | Mountain_Skies wrote:
           | I've had Bing provide me with code from SO that was from the
           | question, which was code that was explicitly stated to not
           | work and the poster wanted to know what was wrong with it.
           | Bing's AI didn't understand this and claimed it was a
           | solution.
         | afro88 wrote:
         | Think of it like traditional code completion. It's mostly wrong
         | but still useful. You either type through it, or tab/arrow to
         | select the correct completion.
         | AI code completion (like Github Copilot) is like this. Still a
         | time saver overall, even with a low acceptance rate.
         | alan-stark wrote:
         | Reading these answers reminded me why I love HN - actually
         | thoughtful perspectives :) Guess a lot boils down to two
         | variables - (a) suggestion UX quality and (b) definition of
         | 'rejection' event. I skimmed through the paper and it turns out
         | that 91% figure is based on feedback from 70 people and
         | anonymous feedback wasn't allowed. So, 'overwhelming 91%
         | favorable' can be paraphrased to `64 people out of the total
         | 16k user base said they liked it'. Would be interesting to see
         | indirect metrics like retention on day 15.
           | idiotsecant wrote:
           | Quite an insightful comment. In an institution that large
           | it's surprising there were only 64 brown nosers. I expect out
           | of 16k captive audience employees you could probably get 64
           | people to give a positive opinion of replacing paychecks with
           | meta store scrip.
         | seanmcdirmid wrote:
         | A lot of time suggestions are provided but not used because you
         | already knew the answer and typed fast enough not to take it.
         | moonchrome wrote:
         | It's easy to :
         | - anticipate when the suggestions are likely to be useless and
         | not even bother
         | - scan the proposals to see if they are what you want in cases
         | it's useful
         | It's a boilerplate generator and you're happy when it saves you
         | tedious mental effort.
         | rychco wrote:
         | I treat it the same way I do pre-LLM LSP suggestions, which is
         | basically inline documentation lookup. 'Oh what was that
         | function name for inserting something at the end? PushB- no,
         | InsertAft- no, App - end! Yea that's it'
         | In this case it gave me 3 suggestions but I only accepted 1. I
         | could see this taking 5-10 suggestions for an LLM to when it's
         | not something as straightforward as a function name. It's still
         | very useful despite this low acceptance rate
         | fnordpiglet wrote:
         | I'd say it's hard to argue with the positive impression of the
         | engineer using it. If they find it's suggestions helpful it's
         | not a distraction, it's helpful.
         | Using GitHub copilot daily I find it's suggestions often
         | nonsense but interesting to see regardless. Often for
         | boilerplate it's spot on and it saves me dozens of lines of
         | typing. But it also suggests stuff on every key stroke many of
         | which I just type through, similar to intellisense. Assuming
         | Metas code thingy is better, I would find myself in that 91%,
         | as I'm already there with what's available to the general
         | public.
         | My only gripe, fwiw, with copilot in vscode is it interferes
         | with intellisense. Often I want to see the code completion from
         | both, but copilot jumps in before intellisense and the
         | intellisense never renders and I use it as an inline api
         | reference. Sometimes it's so frustrating I have to turn off
         | copilot. But, copilot is generally useful enough that I
         | reenable it once I've understood the api stuff I'm unsure of.
         | There's some escape backspace period dance I can do that
         | sometimes let's intellisense win. I've not dug deeply enough
         | into vscode configuration to know if there's some parameter to
         | tweak the race conditions. I'd note that when intellisense
         | renders first copilot still renders its suggestions but the
         | other way doesn't work.
         | pavlov wrote:
         | I think the 8% number better explains why users were so
         | overwhelmingly happy. Assuming the suggestions in general are
         | not distractingly wrong, then 8% of code automatically written
         | is a decent amount of time saved researching solutions.
           | visarga wrote:
           | Interesting that 91% find it useful but only 8% of the code
           | is generated by LLM. This is even with a LLM tuned on the
           | internal codebase. This will give a mild boost but not
           | replace anyone.
           | layer8 wrote:
           | But only 22% are accepted for those 8%, which means that the
           | 78% code suggestions that are not accepted correspond to an
           | equivalent of over 28% of all code written. Not sure that
           | having to spend the time evaluating an additional 28% of code
           | in vain amounts to an overall win.
           | Though I guess the success rates when using Stack Overflow
           | aren't too dissimilar.
         | cloudking wrote:
         | Have you tried GitHub Copilot? You don't have to accept the
         | code suggestions, so they don't really distract you or get in
         | the way once you get used to the UX.
           | tablatom wrote:
           | I find them extremely distracting. Evaluating a suggestion
           | is, for me, an entirely different mental process from the
           | creative process I'm in the middle of. The tagline that
           | copilot helps you stay in the flow is very much not my
           | experience.
           | I am well aware that others are having a different experience
           | with it.
             | bredren wrote:
             | The Industrial Challenges section of the paper addresses
             | specific areas of flow disruption they focused on.
             | Some folks may never accept AI code completion /
             | suggestions (like some prefer vim over modern IDEs) but at
             | least people working on this stuff can describe points
             | known to focus on.
             | cloudking wrote:
             | I've found I am naturally ignoring the large complex
             | suggestions because they usually have mistakes, and
             | accepting the small easy suggestions. I respect your
             | experience though, to each their own.
               | irthomasthomas wrote:
               | Mine doesn't even make complex suggestions. I can't get
               | it to suggest more than one line at a time. Wonder what's
               | different? I'm on the beta.
               | baq wrote:
               | The thing can generate whole unit tests if you leave it a
               | one-like description in a comment next to the function
               | you want tested. It's actually amazing.
               | cloudking wrote:
               | For example, sometimes I'll start out with a code comment
               | for a function, hit enter and the next line suggestion
               | will be the entire function.
       | wpride wrote:
       | Everyone in this space seems to be building on the LSP and
       | classic auto-complete in particular as their UI. But I've found
       | this to be non ideal.
       | - As mentioned in this paper I definitely do not want the AI
       | suggestion crowding out a suggestion generated directly from the
       | type bindings
       | - I often do want the AI to write an entirely new block of
       | boilerplate. To do this you have to write a comment string
       | targeted at the AI, then delete this afterwards
       | - Sometimes I'd just like the AI to explain to me what some code
       | does without writing anything
       | - This isn't something I always want on; I find myself turning
       | the plugin on and off depending on the context
       | Overall I think we need a novel UX to really unlock the AI's
       | helpfulness
         | anotherpaulg wrote:
         | I have been enjoying a chat based AI coding modality. I built
         | some tooling that gets rid of the need to cut & paste code
         | between the chat and your files. This makes chatting about code
         | changes much more ergonomic. My tool also integrates directly
         | with git, which provides a safety net. It's easy to undo
         | changes if the AI does something silly.
         | Here are some chat transcripts that give a flavor of what it's
         | like to code with AI this way:
         | https://aider.chat/examples/
         | My tool is open source, and currently only works if you have a
         | gpt-4 api key.
         | fnordpiglet wrote:
         | In vscode the Genie extension does these things and you can
         | provide your own contextual hooks with custom prompts. It's
         | particularly good at explaining syntax and semantic errors.
         | florbo wrote:
         | This echoes my sentiment exactly. My biggest gripe is when type
         | suggestions are replaced with AI suggestions, as I more often
         | just want to auto-complete a method/attribute. I frequently
         | find myself toggling AI suggestions via hotkey.
         | As for the getting a suggestion by writing comments, an "insert
         | from prompt" action perhaps, or just a separate prompt
         | pane/popup/whatever-you-prefer combined with using good ol'
         | copy+paste would suffice.
         | stepanhruda wrote:
         | Does it need to be that novel of a UX?
         | If you want to know what some code does, just select it & hit a
         | keyboard shortcut (or right click and choose explain from
         | menu).
         | If you want AI to write code for you, write a comment starting
         | with a specific word, it suggests the implementation and you
         | can choose to accept & replace the comment with it.
         | rytill wrote:
         | What kind of novel UX are you imagining?
       | Animats wrote:
       | Hm. It seems to be like automated Stack Overflow. Only 8% of the
       | code comes from the AI system, but it's useful for getting
       | examples of how to do something.
       | Hallucination about API calls was reported as a problem. I've
       | seen that one. There's an amusing, and seriously annoying,
       | tendency for these systems to make up some plausible API call
       | that does what you need, but doesn't exist. Maybe something
       | should collect up such suggestions as proposals for new API
       | calls.
         | freeone3000 wrote:
         | The future of API design -- "yes it would make SENSE if that
         | existed, but it doesn't" => now it does
       | ChatGTP wrote:
       | It's a funny game because they all need their own clones of each
       | model / product.
       | Feels like tech is making billions but is a little lost ?
       | zeedude wrote:
       | Limit training to stackoverflow input and wham! we have automated
       | modern programming ;)
       | regularfry wrote:
       | I would very much like a local code assist tool. Assuming
       | integration with editors is my problem, what's best in class this
       | week if a) I have a respectable GPU; b) I don't, and need CPU-
       | only?
         | wsxiaoys wrote:
         | Check out https://github.com/TabbyML/tabby, which is fully
         | self-hostable and comes with niche features.
         | On M1/M2, it offers a convenient single binary deployment,
         | thanks to Rust. You can find the latest release at
         | https://github.com/TabbyML/tabby/releases/tag/latest
         | (Disclaimer: I am the author)
         | MisterAV wrote:
         | On Visual Studio there's an extension (by Microsoft) called
         | IntelliCode which is a small AI assistant that runs locally on
         | the CPU. It doesn't come close to these new large GPU models
         | but it's quite handy. It looks into what you're typing on the
         | current line and the previous activity along with the current
         | project and tries to predict the full line or even the same
         | change on multiple lines if that makes sense.
           | azhenley wrote:
           | We recently published a paper on IntelliCode and share some
           | of the usage numbers.
           | https://austinhenley.com/pubs/Vaithilingam2023ICSE_IntelliCo.
           | ..
           | Disclaimer: I'm one of the co-authors.
         | mormegil wrote:
         | I have just installed Fauxpilot
         | <https://github.com/fauxpilot/fauxpilot> (nVidia GPU-only) and
         | it works... OK. Still evaluating and I'm basically sceptic on
         | the whole concept, but... let's see.
         | Filligree wrote:
         | Nothing even comes close to copilot. I realise you said
         | "local", but if you insist on that you're going to be
         | disappointed.
       | synthiq wrote:
       | For anyone interested in related research, I used
       | https://mirrorthink.ai to find some background on the state-of-
       | the-art.
       | (disclaimer: this is AI generated, but grounded on contents of
       | papers, with real references, so I'd say it is still
       | constructive)
       | The state-of-the-art in code generation has seen significant
       | advancements with the deployment of large language models (LLMs)
       | in various code authoring tools. One such example is the study on
       | GitHub Copilot, Amazon CodeWhisperer, and ChatGPT [1], which
       | evaluates the code quality of these AI-assisted code generation
       | tools. The study reveals that ChatGPT generates correct code
       | 65.2% of the time, while GitHub Copilot and Amazon CodeWhisperer
       | achieve 46.3% and 31.1% correctness, respectively. These results
       | indicate that LLMs have made substantial progress in generating
       | high-quality code, but there is still room for improvement.
       | Other research in the field has explored various techniques to
       | enhance code generation and assistance. For instance, RepoCoder
       | [2] focuses on repository-level code completion by integrating
       | code generation and retrieval models in an iterative paradigm.
       | This approach considers the repository-level context, including
       | customized information such as API definitions and identifier
       | names, to improve code completion suggestions. Serenity [3]
       | leverages library-based Python code analysis for code completion
       | and automated machine learning. The authors explore the potential
       | of data flow analysis produced by Serenity to improve code
       | completion when combined with neural models.
       | In addition to these advancements, the field has seen progress in
       | incorporating contextual information into code completion models.
       | The paper on enriching source code with contextual data [4]
       | investigates the impact of incorporating contextual information
       | on the performance of code completion models. The authors conduct
       | an empirical study to analyze the effectiveness of this approach.
       | These achievements, along with the advancements in LLMs,
       | contribute to the ongoing progress in code generation and
       | assistance. As the field continues to evolve, it is expected that
       | AI-assisted tools will become increasingly sophisticated and
       | effective in assisting developers with various aspects of the
       | software development process.
       | [1] Evaluating the Code Quality of AI-Assisted Code Generation
       | Tools: An Empirical Study on GitHub Copilot, Amazon
       | CodeWhisperer, and ChatGPT - 2023:
       | https://arxiv.org/abs/2304.10778
       | [2] RepoCoder: Repository-Level Code Completion Through Iterative
       | Retrieval and Generation - 2023: https://arxiv.org/abs/2303.12570
       | [3] Serenity: Library Based Python Code Analysis for Code
       | Completion and Automated Machine Learning - 2023:
       | https://arxiv.org/abs/2301.05108
       | [4] Enriching Source Code with Contextual Data for Code
       | Completion Models: An Empirical Study - 2023:
       | https://arxiv.org/abs/2304.12269
       | fabmilo wrote:
       | I would like to work in this code copilot space, I think will be
       | one of the fastest applications of LLms in the near future. I
       | have been working on a tool to autogenerate docstrings from a
       | python method in google format
       | bolinfest wrote:
       | If you want to skip the paper and watch the video:
       | https://youtu.be/ANDJ0TKjyWw
       | Disclaimer: I am the person in the video.
         | muglug wrote:
         | It was a great video, and a great paper.
         | As someone who writes quite a lot of Hack, I'm selfishly
         | interested in whether you plan to open-source this work (not
         | the weights, obviously, but everything else).
       (page generated 2023-06-03 23:00 UTC)