[HN Gopher] Copilot regurgitating Quake code, including sweary c...
       ___________________________________________________________________
        
       Copilot regurgitating Quake code, including sweary comments
        
       Author : bencollier49
       Score  : 1063 points
       Date   : 2021-07-02 11:52 UTC (11 hours ago)
        
 (HTM) web link (twitter.com)
 (TXT) w3m dump (twitter.com)
        
       | kklisura wrote:
       | Phew! Are jobs are safe!
        
         | unknownOrigin wrote:
         | _snickers_
        
         | CookieMon wrote:
         | though our companies will one day be competing with product
         | manufacturers in China who get to use it to its fullest
        
       | crescentfresh wrote:
       | Direct url to the "gif" in that twitter post:
       | https://video.twimg.com/tweet_video/E5R5lsfXoAQDRkE.mp4
       | 
       | I could not figure out how to show it larger on the twitter UI. I
       | don't have a twitter account so that may be the problem.
        
       | Thomashuet wrote:
       | It seems like a very sensible answer from copilot since the
       | prompt includes "Q_" which makes it obvious that the programmer
       | is specifically looking for the Quake version of this function.
       | 
       | To me it doesn't show that copilot will regurgitate existing code
       | when I don't want it to, just that if I ask it to copy some
       | famous existing code for me it will oblige.
        
         | cjaybo wrote:
         | Apparently you haven't seen many of the demos that people are
         | showing off? Because saying that this only occurs when the
         | author is explicitly asking for copied code is blatantly false.
        
           | Thomashuet wrote:
           | No I haven't. If you think the other demos are more
           | interesting please link to them. I'm just saying that this
           | demo is biased and that we can't draw any conclusion from it.
           | Actually the author has just confessed optimizing it for
           | entertainment in a sister comment. That doesn't mean that the
           | claim is false but it doesn't show that it is true either.
        
             | the_mitsuhiko wrote:
             | I think you misunderstood my comment. The same code gets
             | generated if you call the function `float fast_rsqrt` or
             | `float fast_isqrt` for instance. I intentionally wanted it
             | to be looking like `Q_rsqrt` so that people pick up on it
             | quicker.
        
               | heavyset_go wrote:
               | Thanks for making the video in the OP.
               | 
               | Do you have more examples like this that I can share with
               | those who don't use Twitter, like a repo or blog post?
        
         | OskarS wrote:
         | You're not wrong, but the very idea that it will regurgitate
         | copyrighted code _at all_ (and especially at this length, word
         | for word), means that it will be totally unacceptable for many
         | places. In fact, it is arguably not acceptable to use anywhere
         | if you care deeply about copyright.
        
         | advisedwang wrote:
         | The claim for AI systems like this is that it has actually
         | learned something and is generating code from scratch.
         | Oftentimes the authors will claim regurgitation is simply not
         | possible, and this example shows that's a lie.
         | 
         | Many arguments on the benefits, legality and power of AI
         | systems rely on this claim.
         | 
         | To turn around now and say it's OK to regurgitate in the right
         | setting is to move the goalposts.
        
           | caconym_ wrote:
           | > Oftentimes the authors will claim regurgitation is simply
           | not possible
           | 
           | Do the Copilot authors claim this?
           | 
           | I get that you're suggesting that Copilot may benefit from
           | absolute claims made by the authors of other, similar systems
           | (or their proponents), but I also don't think it's reasonable
           | to exclude nuance and the specifics of Copilot from ongoing
           | discussions on that basis. The Copilot authors have publicly
           | acknowledged the regurgitation problem, and by their account
           | are working on solutions to it (e.g. attribution at
           | suggestion-generation time) that don't involve sweeping it
           | under the rug.
        
             | rckstr wrote:
             | They did! In the faq which I can't find anymore they said:
             | 
             | >GitHub Copilot is a code synthesizer, not a search engine:
             | the vast majority of the code that it suggests is uniquely
             | generated and has never been seen before. We found that
             | about 0.1% of the time, the suggestion may contain some
             | snippets that are verbatim from the training set.
        
               | caconym_ wrote:
               | This actually seems like an explicit acknowledgement that
               | regurgitation _is_ possible, and not remotely a claim
               | that it is  "simply not possible".
               | 
               | It stands to reason that cases where people are
               | intentionally trying to produce regurgitation will
               | strongly overlap with the minority of cases where it
               | actually happens. So I think we are probably suffering
               | from some selection bias in discussions on HN and similar
               | forums--that might be unavoidable, and it certainly
               | stimulates some interesting discussion, but we should try
               | to avoid misrepresenting the product as a whole and/or
               | what its creators have said about it.
        
               | sombremesa wrote:
               | I think only Github's lawyers would interpret what GP
               | posted the way you did. Looks like weasel wording to make
               | such an interpretation possible, while making customers
               | believe that code is more or less synthesized in
               | realtime. "Snippets" makes one think one or two lines of
               | code, not entire functions and classes.
        
               | caconym_ wrote:
               | I think that until somebody shows that Copilot is willing
               | to copy _distinctive_ code fragments verbatim,
               | unprompted, with a high occurrence rate, I 'm not going
               | to start accusing Github of building an engine to
               | cynically exploit the IP rights of open source copyright
               | holders for profit. I've seen no evidence of that, and in
               | absence of evidence I prefer to remain neutral and open-
               | minded.
               | 
               | How would that work, anyway? Rare, distinctive code forms
               | seem much more difficult for an ML thing to suggest with
               | a high-ish confidence level, since there won't be much
               | training data. The Quake thing makes sense because it's
               | one of the most famous sections of code in the world, and
               | probably exists in thousands of places in the public
               | Github corpus.
               | 
               | I'm emphasizing _distinctive_ because a lot of
               | boilerplate takes up a lot of room, but still doesn 't
               | make a reasonable argument for copyright infringement
               | when yours looks like somebody else's.
        
               | sombremesa wrote:
               | It looks like you're responding to the wrong comment. I
               | don't recall alleging that Github is "building an engine
               | to cynically exploit the IP rights of open source
               | copyright holders for profit".
        
               | caconym_ wrote:
               | > I think only Github's lawyers would interpret what GP
               | posted the way you did. Looks like weasel wording to make
               | such an interpretation possible,
               | 
               | So what are you suggesting here, except that Github is
               | attempting a legal sleight-of-hand to hide real
               | infringement?
               | 
               | > while making customers believe that code is more or
               | less synthesized in realtime.
               | 
               | What are you suggesting here except that Github is
               | (essentially) lying to customers, making them believe
               | something that is substantially untrue?
               | 
               | When I say "building an engine to cynically exploit the
               | IP rights of open source copyright holders for profit", I
               | am talking about a scenario in which they are sweeping
               | legitimate IP concerns under the rug with bad faith legal
               | weaselry and misrepresentation of how the product
               | functions, etc., to chase profit. I do not see how that
               | is substantially different from the implications of your
               | comment, especially in the context of this subthread.
               | 
               | Could you enlighten me as to how your intended meaning
               | substantially differs from my interpretation? If you
               | don't mean to accuse Github of malfeasance, we probably
               | don't have much to discuss.
        
             | ohazi wrote:
             | Nat Friedman explicitly stated that it shouldn't
             | regurgitate [0]:
             | 
             | > It shouldn't do that, and we are taking steps to avoid
             | reciting training data in the output
             | 
             | He's being woefully naive. To put it bluntly, we don't know
             | how to build a neural network that isn't capable of
             | spitting out training data. The techniques he pointed to in
             | other threads are academic experiments, and nobody seems to
             | have a credible explanation for why we should believe that
             | they work.
             | 
             | [0] https://news.ycombinator.com/item?id=27677177
        
               | caconym_ wrote:
               | "Shouldn't" isn't the same as "doesn't".
               | 
               | I'm not anything close to an ML expert, and I have no
               | opinion on whether what they're aiming for is possible,
               | but this document^[1] (linked in your linked comment)
               | states explicitly that they are aware of the recitation
               | issue and are taking steps to mitigate it. So, in the
               | context of the comment I replied to, I think Github is
               | very far from claiming that recitation is "simply not
               | possible".
               | 
               | ^[1] https://docs.github.com/en/github/copilot/research-
               | recitatio...
        
               | ohazi wrote:
               | That kind of bullshit phrasing can only get you so far.
               | 
               | It's like if some corporate PR department told you "we're
               | aware of the halting problem, and are taking steps to
               | mitigate it." You would rightly laugh them out of the
               | room.
               | 
               | It's not going to work, and the people making these
               | statements either don't understand how much they don't
               | understand, or are deluding themselves, or are actively
               | lying to us.
               | 
               | An honest answer would be something like "We are aware
               | that this is a problem, and solving it is an active area
               | of research for us, and for the machine learning
               | community at large. While we believe that we will
               | eventually be able to mitigate the problem to an
               | acceptable degree, it is not yet known whether this
               | category of problem can be fully solved."
        
               | caconym_ wrote:
               | You're using some pretty strong language here, but do you
               | have any more substantive criticisms of the analysis they
               | present at
               | https://docs.github.com/en/github/copilot/research-
               | recitatio... ? They seem to think the incidence of
               | meaningful (i.e. substantively infringing) recitation is
               | very low, and that their solution in those cases will be
               | attribution rather than elimination.
               | 
               | Again, I'm not an ML expert, but that sounds a lot more
               | reasonable to me than announcing one's intention to solve
               | the halting problem.
        
               | ohazi wrote:
               | They had some people use the thing for a while, and
               | concluded "Hey look, it doesn't seem to quote verbatim
               | very often. Yay!" There is nothing in there that
               | describes any sort of mitigation. The three sentences
               | about an attribution search at the very end are
               | aspirational at best, and are presented as "obvious" even
               | though it's not at all clear that such a fuzzy search can
               | be implemented reliably.
               | 
               | I use the halting problem as an analogy because their
               | naive attempts to address this problem feel a lot like
               | naive attempts to get around the halting problem ("just
               | do a quick search for anything that looks like a loop,"
               | "just have a big list of valid programs," etc.). I can
               | perform a similar analysis of programs that I run in my
               | terminal and come to a similar "Hey look, most of them
               | halt! Yay!" conclusion. I can spin a story about how most
               | of the ones that don't halt are doing so intentionally
               | because they're daemons.
               | 
               | But this approach is inherently flawed. I can use a fuzz
               | tester to come up with an infinite number of inputs that
               | cause something as simple as 'ls' to run forever.
               | 
               | Similarly, I can come up with an infinite number of
               | adversarial inputs that attempt to make Copilot spit out
               | training data. Some of them will work. Some of them will
               | produce something that's close enough to training data to
               | be a concern, but that their "attribution search" will
               | fail to catch. That's the "open research question" that
               | they need to solve.
               | 
               | We _don 't have_ a general solution to this problem yet,
               | and we may never have one. They're trying to pass off a
               | hand-wavey "we can implement some rules and it won't be a
               | problem most of the time" solution as adequate. I don't
               | see any reason to believe that it will be adequate. Every
               | attempt I've seen at using logic to try and coax a
               | machine learning model into not behaving pathologically
               | around edge cases has fallen flat on its face.
        
               | caconym_ wrote:
               | > The analysis you're citing is just that -- a
               | statistical analysis. They had some people use the thing
               | for a while, and concluded "Hey look, it doesn't seem to
               | quote verbatim very often. Yay!" There is nothing in
               | there that describes any sort of mitigation.
               | 
               | > The three sentences about an attribution search at the
               | very end are aspirational at best, and are presented as
               | "obvious" even though it's not at all clear that such a
               | fuzzy search can be implemented reliably.
               | 
               | I agree with all of this, though I do think that the
               | attribution strategy they describe sounds a lot easier
               | than solving the halting problem or entirely eliminating
               | recitation in their model. Obviously, the proof will be
               | in the pudding.
               | 
               | Maybe you and others are reacting to them framing this as
               | "research", as if they're trying to prove some
               | fundamental property of their model rather than simply
               | harden it against legally questionable behavior in a more
               | practical sense. I think a statistical analysis is fine
               | for the latter, assuming the sample is large enough.
        
               | csande17 wrote:
               | The biggest issue with that analysis is that their model
               | is clearly very able to copy code and change the variable
               | names, copying code and changing variable names is very
               | clearly still "copying", and the analysis doesn't seem to
               | include that in its definition of "recitation event".
        
               | caconym_ wrote:
               | I'd fully expect it to copy code and change variable
               | names in a lot of cases--if it wants to achieve the goal
               | of filling in boilerplate, how could it do anything else?
               | That's pretty much the definition of boilerplate: it's
               | largely the same every time you write it.
               | 
               | What's less clear to me is that Copilot regularly does
               | that sort of thing with code distinctive enough that it
               | could reasonably be said to constitute copyright
               | infringement. If somebody's actually shown that it does,
               | I'd love to see that analysis.
        
         | the_mitsuhiko wrote:
         | I was able to trigger it without the Q prompt before. It just
         | made the a nicer looking gif that way.
         | 
         | I got it to produce more GPL code too, that one is just not
         | entertaining.
        
       | throwaway_egbs wrote:
       | Welp, guess I'll be taking all my code off of GitHub now, lest it
       | be copied verbatim while ignoring my licenses.
       | 
       | (I'm no John Carmack, but still.)
        
       | throwaway_egbs wrote:
       | This reply from @AzureDevOps is bizarre: "We understand. However,
       | the way to report this issues related to Windows 11 is through
       | our Windows Insider even from another device. Thanks in advance."
       | 
       | I think I'm gonna give "AI" a few more years.
       | 
       | https://twitter.com/AzureDevOps/status/1411018079849619458
        
       | bob1029 wrote:
       | This is pretty clearly just a search engine with more parameters.
       | 
       | I thought there was something more going on with copilot, but the
       | fact that it is regurgitating arbitrary code comments tells me
       | that there is zero semantic analysis going on with the actual
       | code being pulled in.
        
         | josefx wrote:
         | They openly claim it is an AI. What about the state of AI
         | currently in use made you think that there was any intelligence
         | behind it?
        
         | thegeomaster wrote:
         | It is decidedly not "just a search engine with more
         | parameters." Language models are just prone to repeating
         | training examples verbatim when they have a strong signal with
         | the prompt. Arguably, in this case, it is the most correct
         | continuation.
        
         | saynay wrote:
         | It's more that the model is so large it is capable of
         | memorizing a lot. This can be seen in other language models
         | like GPT-3 as well.
         | 
         | Comments, I suspect, will be more likely to be memorized since
         | the model would be trained to make syntactically correct
         | outputs, and a comment will always be syntactically correct.
         | That would mean there is nothing to 'punish' bad comments.
        
           | username90 wrote:
           | The model in this case is just a lossy compression of github,
           | and you search that.
        
       | sydthrowaway wrote:
       | What causes this in a net? I'm guessing the RNN gets in a
       | catastrophic state..
        
         | salawat wrote:
         | Neural nets aren't magic. You actually need quite a bit of
         | complexity and modeling of interrelated problem spaces to get
         | anything more than a childlike naivete or trauma savant-like
         | mastery of one particular area with crippling deficiencies
         | elsewhere.
        
         | otabdeveloper4 wrote:
         | > catastrophic state
         | 
         | No, overfitting is the normal state for neural nets.
        
         | captainmuon wrote:
         | I would say overfitting - the net doesn't "understand" the code
         | in any meaningful sense. It just finds fitting examples and
         | jumbles them a bit.
         | 
         | Understanding would mean to have an internal representation
         | related to the intention of the user, the expected behavior,
         | and say the AST of the code. My pessimistic interpretation of
         | this and many other recent AI applications is that it is a
         | "better markov chain".
        
           | LeanderK wrote:
           | a markov chain can have an internal representation related to
           | the intention of the user. I guess this example just got
           | copied a lot and is therefore included multiple times in the
           | training data, forcing the network to memorise it. Neural
           | networks always memorise things that appear too frequent.
           | Memorized Artifacts in an otherwise working neural network is
           | usually seen as a "bug" (since the training allowed the
           | network to cheat), not as a proof that the network didn't
           | generalise.
        
         | michaelt wrote:
         | This is the network working as designed.
         | 
         | I mean, if you wrote an autocomplete system for written english
         | and asked it to complete the sentence "O Romeo, Romeo" what
         | would you expect to happen?
         | 
         | You'd expect it to complete to "O Romeo, Romeo, wherefore art
         | thou Romeo?" - a very famous quote.
         | 
         | How else could you produce the single right output for that
         | unique input, other than memorising and regurgitating?
        
           | NumberCruncher wrote:
           | > You'd expect it to complete to "O Romeo, Romeo, wherefore
           | art thou Romeo?" - a very famous quote.
           | 
           | What about completing it to "O Romeo, Romeo, brave Mercutio
           | is dead", based on the context, as advertised?
        
       | villgax wrote:
       | That salesforce engineer is the average developer when prompting
       | to write an About me section lol
        
       | ronenlh wrote:
       | There are a lot of good points made against copilot. But I'm
       | optimistic in that it will improve with time. At worst it's an
       | efficient code copy-pasting tool, but at best it could be the
       | next level of abstraction.
        
       | littlestymaar wrote:
       | Copilot was only trained on public code right? Because that would
       | be an absolute PR disaster if some proprietary code was leaked
       | that way.
        
         | ezekg wrote:
         | I'm curious if you can find certain keywords that may leak
         | private code...
        
           | imnitishng wrote:
           | Waiting for the access to test this out
        
       | nadsat2021 wrote:
       | I don't understand why this is an issue. And now I have a tool to
       | do my coding, as well as my social media contributions:
       | "'Sweary comments' is not grammatically correct English." ->
       | GPT-2 ->              "Sweary comments" is not grammatically
       | correct English. The "sweary" meaning is used as slang by those
       | in charge, especially the ones who get a free ride through the
       | media, for the sake of a quick buck, or the sake of getting a
       | better job with a promotion. The word is commonly used in a
       | social context and comes from an actual "sweary" comment. The
       | statement or phrase, "You're too shy!" is derived from this
       | phrase: "I wish he were as outgoing as you."
       | 
       | Am I right?
        
       | PaulHoule wrote:
       | This kind of thing devalues the work that programmers do.
       | 
       | The head example they show is using a sentiment analysis API
       | which is about the most useless use of technology there is.
        
       | saiojd wrote:
       | Based on all the negative comments so far, and based on this
       | website's aptitude at predicting the viability of a product, it
       | really seems like Copilot is bound to be a success.
        
         | DantesKite wrote:
         | Right?
         | 
         | Even given all its initial problems, I don't see a world where
         | people completely avoid using it.
        
         | sydthrowaway wrote:
         | We're gonna Dropbox this thing all the way to the top!
        
         | stusmall wrote:
         | No wireless. Less space than a nomad. Lame.
        
           | saiojd wrote:
           | Yeah. I get why people's initial reaction is to dislike it
           | tbh. Honestly I doubt the utility will be huge for experts,
           | mostly likely it will just alleviate having to remember
           | certain how a certain language implements a specific concept.
        
       | sktrdie wrote:
       | I'm going to go against the flow here and say that worrying about
       | this is similar to worrying about the license we give to snippets
       | of code we copy-paste from other licensed code.
       | 
       | The reality is that we never attribute the original source
       | because we copy-paste it, change it up a bit, and make it our
       | own. Literally everybody does this.
       | 
       | I still care about licensing and proper attribution but the
       | reality is that a snippet of code is not something so easy to
       | attribute. Should we attribute all kinds of ideas, even the very
       | small ones? How quickly is an idea copied, altered & reused? Can
       | we attributes all the thoughts humans have?
        
       | nadsat2021 wrote:
       | "sweary comments"
       | 
       | When I hear phrases like that, I worry more about human
       | intelligence. "I'm a little tea pot, short and stout," said
       | social media.
       | 
       | I watched Kubrik's "A Clockwork Orange" again this week, after a
       | certain amount of fearful anticipation.
       | 
       | When Alex said "eggy weggie", it clicked. It's like Burgess time-
       | traveled to 2021 to document our modern infantilization and
       | antisocialization. He forgot to include the Internet, loss of
       | humor, and emerging AI, but I guess he was overwhelmed by the
       | enormity of the "baddiwad".
       | 
       | Later on, droogs.
        
       | Osiris wrote:
       | I assumed it was trained on source code that was explicitly
       | licensed with a permissive license. Are they training it using
       | private unlicensed repos also?
        
       | GhostVII wrote:
       | Sounds like we need another tool called "Auditor" that scans your
       | code to see if it violates copyright laws.
        
       | omgwtfbbq wrote:
       | The uproar over Copilot is kind of hilarious. Maybe it's SWE's
       | realizing that they might not be as irreplaceable as they seem
       | but its an awful lot of salty comments. If anything I think
       | Copilot is a really cool PoC and shows just how close we are
       | getting to automating large portions of the code writing process
       | which we should all welcome as more cycles can be spent on
       | architecture and system design.
        
       | ethbr0 wrote:
       | The irony is that we're whinging about a tool that generates code
       | that will be difficult to understand _in the future_...
       | 
       | ... and the example is mathematically- and floating-point-spec
       | obtuse enough that it was incomprehensible at the time _it was
       | written_. (As evidenced by id comments)
        
       | maliker wrote:
       | Copilot transitions programmers from writing code to reading
       | auto-generated code. And the feeling is that reading code is 10x
       | harder than reading it? Seems like a rich source of problems.
       | 
       | (However, I'm still definitely going to try this out once I get
       | off the waitlist.)
        
       | rgbrenner wrote:
       | So this makes it official... this post[0] and the comments on the
       | announcement[1] concerned about licensing issues were absolutely
       | correct... and this product has the possibility of getting you
       | sued if you use it.
       | 
       | Unfortunately for GitHub, there's no turning back the clocks.
       | Even if they fix this, everyone that uses it has been put on
       | notice that it copies code verbatim and enables copyright
       | infringement.
       | 
       | Worse, there's no way to know if the segment it's writing for you
       | is copyrighted... and no way for you to comply with license
       | requirements.
       | 
       | Nice proof of concept... but who's going to touch this product
       | now? It's a legal ticking time bomb.
       | 
       | 0. https://news.ycombinator.com/item?id=27687450
       | 
       | 1. https://news.ycombinator.com/item?id=27676266
        
         | sktrdie wrote:
         | If they get rid of licensed stuff it should be ok no? I really
         | want to use this and seems inevitable that we'll need it just
         | as google translate needs all of the books + sites + comments
         | it can get a hold of.
        
           | ianhorn wrote:
           | Unlicensed code just means "all rights reserved." You'd need
           | to limit it to permissively licensed code and make sure you
           | comply with their requirements.
        
           | runeb wrote:
           | How would they do that?
        
             | oauea wrote:
             | Read the LICENSE file in each repo.
        
               | rovr138 wrote:
               | What guarantees it's intact?
        
           | [deleted]
        
           | mmastrac wrote:
           | Well... the whole training set is licensed, so you can't
           | really get rid of it. I think that the technology they are
           | using for this is just not ready.
        
             | fragmede wrote:
             | Just retrain the model using properly licensed code?
             | ("just" is doing a ton of heavy lifting, but let's be real,
             | that's not impossibly hard)
        
               | [deleted]
        
           | eCa wrote:
           | Which licenses would it be ok that the training material is
           | licensed under, though? If it produces verbatim enough copies
           | of eg. MIT licensed material, then attribution is required.
           | Similar with many other open source-friendly licenses.
           | 
           | On the other hand, if only permissive licenses that also
           | don't require attribution is used, well, then for a start,
           | the available corpus is much smaller.
        
         | eganist wrote:
         | Adding to this:
         | 
         | I run product security for a large enterprise, and I've already
         | gotten the ball rolling on prohibiting copilot for all the
         | reasons above.
         | 
         | It's too big a risk. I'd be shocked if GitHub could remedy the
         | negative impressions minted in the last day or so. Even with
         | other compensating controls around open source management, this
         | flies right under the radar with a c130's worth of adverse
         | consequences.
        
           | fragmede wrote:
           | Do you also block stack overflow and give guidance to never
           | copy code from that website or elsewhere on the Internet? I'm
           | legitimately curious - my org internally officially denounces
           | the copying of stack overflow snippets. Thankfully for my
           | role it's moot as I mostly work with an internal non-public
           | language, for better or worse, and I have no idea how well
           | that's followed elsewhere in the wider company.
        
             | samtheprogram wrote:
             | Anything posted to Stack Overflow has a specific (Creative
             | Commons IIRC) license associated with it. The same is not
             | true of GitHub Copilot, and in fact their FAQ doesn't
             | specify a license at all, probably because they are
             | technically unable to since it is trained on a wide variety
             | of code from differing licenses (and code not written by a
             | human is currently a grey area for copyright). The FAQ
             | simply says to use it at your own risk.
        
             | summerlight wrote:
             | Google (and most of other big techs I guess?) also
             | explicitly prohibit employees from use of stack overflow
             | code snippets.
        
               | Noumenon72 wrote:
               | I tried Googling this and couldn't find it. I also don't
               | want to believe it because it seems like the world
               | suddenly turned into an apocalyptic hellscape with no
               | place for developers like me. Do you have a source?
        
             | gunapologist99 wrote:
             | Apples and oranges: Stack overflow snippets are explicitly
             | granted under a permissive license, as long as you
             | attribute.
             | 
             | https://stackoverflow.com/help/licensing
             | 
             | It appears that the code that copilot is using is created
             | under a huge variety of licenses, making it risky.
             | 
             | On the other hand, a small snippet in a function that is
             | derived from many existing pieces of other code may fall
             | under fair use, even if it is not under an open source
             | license of some sort.
        
               | rorykoehler wrote:
               | It just seems bizarre that this wasn't flagged internally
               | at Microsoft. They have tons of compliance staff.
        
               | mustacheemperor wrote:
               | Maybe we'll even get a sneak peak at Windows 11's source
               | code. Time to start writing a Win32 API wrapper and see
               | what the robot comes up with!
        
               | snicker7 wrote:
               | That's because Microaoft doesn't dare use this for
               | production code (presumably).
               | 
               | They are 100% okay with letting their competitors get
               | into legal hot water.
        
               | rorykoehler wrote:
               | It's surely a bit of a liability grey area?
        
               | ngcazz wrote:
               | Could bet they baked in the legal fees and are taking a
               | calculated risk
        
               | comex wrote:
               | Except that CC-BY-SA is not a permissive license; the SA
               | part is a form of copyleft. It's just that nobody
               | enforces it. From the text [1]:
               | 
               | - "[I]f You Share Adapted Material You produce [..] The
               | Adapter's License You apply must be a Creative Commons
               | license with the same License Elements, this version or
               | later, or a BY-SA Compatible License."
               | 
               | - "Adapted Material means material [..] that is _derived
               | from_ or based upon the Licensed Material " (emphasis
               | added)
               | 
               | - "Adapter's License means the license You apply to Your
               | Copyright and Similar Rights in Your contributions to
               | Adapted Material in accordance with the terms and
               | conditions of this Public License.'
               | 
               | - "You may not offer or impose any additional or
               | different terms or conditions on, or apply any Effective
               | Technological Measures to, Adapted Material that restrict
               | exercise of the rights granted under the Adapter's
               | License You apply."
               | 
               | A program that includes a code snippet is unquestionably
               | a derived work in most cases. That means that if you
               | include a Stack Overflow code snippet in your program,
               | and fair use does not apply, then you have to license the
               | _entire program_ under the CC-BY-SA. Alternately, you can
               | license it under the GPLv3, because the license has a
               | specific exemption allowing you to relicense under the
               | GPLv3.
               | 
               | For open source software under permissive licenses, it
               | may actually be okay to consider the entire program as
               | licensed under the CC-BY-SA, since permissive licenses
               | are typically interpreted as allowing derived works to be
               | licensed under different licenses; that's how GPL
               | compatibility works. But you'd have to be careful you
               | don't distribute the software in a way that applies any
               | Effective Technological Measures, aka DRM. Such as via
               | app stores, which often include DRM with no way for the
               | app author to turn it off. (It may actually be better to
               | relicense to the GPL, which 'only' prohibits adding
               | additional terms and conditions, not the mere use of DRM.
               | But people have claimed that the GPL also forbids app
               | store distribution because the app store's terms and
               | conditions count as additional restrictions.)
               | 
               | For proprietary software where you do typically want to
               | impose "different terms or conditions", this is a dead
               | end.
               | 
               | Note that copying extremely short snippets, or snippets
               | which are essentially the only way to accomplish a task,
               | may be considered fair use. But be careful; in Oracle v.
               | Google, Google's accidental copying of 9 lines of utterly
               | trivial code [2] was found to be neither fair use nor "de
               | minimis", and thus infringing.
               | 
               | Going back to Stack Overflow, these kinds of surprising
               | results are why Creative Commons itself does not
               | recommend using its licenses for code. But Stack Overflow
               | does so anyway. Good thing nobody ever enforces the
               | license!
               | 
               | See also:
               | https://opensource.stackexchange.com/questions/6777/can-
               | i-us...
               | 
               | [1] https://creativecommons.org/licenses/by-
               | sa/4.0/legalcode
               | 
               | [2] https://majadhondt.wordpress.com/2012/05/16/googles-9
               | -lines/
        
               | [deleted]
        
               | wrs wrote:
               | Yes. In a past life, after researching the situation, we
               | had to find and remove all the code copied from Stack
               | Overflow into our codebase. I can't fathom why SO won't
               | fix the license.
               | 
               | What makes it even worse is if you try to do the right
               | thing by crediting SO (the BY part) you're putting a red
               | flag in the code that you should have known you have to
               | share your code (the SA part).
        
               | aasasd wrote:
               | In addition to other licensing gotchas, a ton of SO
               | snippets are copied wholesale from elsewhere--docs or
               | blog posts. So it's pretty likely that the poster can't
               | license them in the first place because they never
               | checked the source's license requirements.
        
             | mediaman wrote:
             | Who really copies stack overflow snippets verbatim? It's
             | usually just easier to refer to it for help figuring out
             | the right structure and then adapt it for your own needs.
             | Usually it needs customization for your own application
             | anyway (variables, class instances, etc).
        
               | canadev wrote:
               | Yeah! I've uh, ... never copied a bit of code into my
               | repo verbatim, right?
               | 
               | yeah right. I wish.
               | 
               | (Not saying every dev does this)
        
               | TillE wrote:
               | I've copied plenty of Microsoft sample code verbatim,
               | because the Win32 API sucks and their samples usually get
               | the error handling right.
               | 
               | But, I can't think of a single scenario where I've copied
               | something from Stack Overflow. I'm searching for the idea
               | of how to solve a problem, and typically the relevant
               | code given is either too short to bother copying, or it's
               | long and absolutely not consistent with how I want to
               | write it.
        
               | Noumenon72 wrote:
               | "Too short to bother copying"? I copy single words of
               | text to avoid typing and typos. I would never type out
               | even a single line of code when I could paste and edit.
        
               | blooalien wrote:
               | I don't think I've _ever_ copied code directly from any
               | of the Stack* sites. I generally read all the answers
               | (and comments) and then use what I learn to write my own
               | (hopefully better) code specific to my needs.
        
               | corobo wrote:
               | Yeah my experience has always been "ohhh that solution
               | makes sense" then I go write it myself
               | 
               | If nothing else this whole copilot thing is helping ease
               | some chronic imposter syndrome
        
               | bartread wrote:
               | Ha! Well, I think a lot of people copy code from
               | StackOverflow verbatim once at least - including me.
               | 
               | Of course it turned out the code I'd blindly inserted
               | into my project contained a number of bugs. In one or two
               | cases, quite serious ones. This, even though it was the
               | accepted answer.
               | 
               | It was probably more effort to fix up the code I'd copy
               | pasta'd than write it from scratch. Since then I've never
               | copied and pasted from StackOverflow verbatim.
        
               | baud147258 wrote:
               | I think I did a few times, usually for languages that I
               | wasn't going to spend to much time with (so no benefits
               | in figuring how to do it from the answers) and for
               | specific tasks.
        
         | jpswade wrote:
         | Not only this but a huge amount of publicly available code is
         | truly terrible and should never really be used other than a
         | point of reference, guidance.
        
         | Kiro wrote:
         | No-one cares about this. People have no clue about licenses and
         | just copy-paste whatever. If someone gets access to their code
         | and see all the violations they're screwed anyway.
        
           | jerf wrote:
           | Ask your legal department about that. Sure, engineers don't
           | care about licensing at all, but we are not the only players
           | here.
        
           | [deleted]
        
         | [deleted]
        
         | __MatrixMan__ wrote:
         | Is it still a legal concern if I'm just coding because I want
         | to solve a problem and I'm not trying to use it to do business?
        
           | maclockard wrote:
           | If you publish the code anywhere, potentially. You could be
           | (unknowingly) violating the original license if the code was
           | copied verbatim from another source.
           | 
           | How much of a concern this is depends heavily on what the
           | original source was.
        
             | kevin_thibedeau wrote:
             | Distributing binaries to third parties is enough to trigger
             | a license violation. For internal corporate tools, it would
             | be less of an issue as "distribution" hasn't happened.
        
             | lolinder wrote:
             | And the problem with copilot is that you have no way of
             | knowing. If it changes even a little bit of the code, it's
             | basically ungoogleable but still potentially in violation.
        
           | saurik wrote:
           | Yes: not all code on GitHub is licensed in a way that lets
           | you use it _at all_. People focus on GPL as if that were the
           | tough case; but, in addition to code (like mine) under AGPL
           | (which you need to not use in a product that exposes similar
           | functionality to end users) there is code that is merely
           | published under  "shared source" licenses (so you can look,
           | but not touch) and even literally code that is stolen and
           | leaked from the internals of companies--including
           | Microsoft!... this code often gets taken down later, but it
           | isn't always noticed and either way: it is now part of
           | Copilot :/--that, if you use this mechanism, could end up in
           | your codebase.
        
         | eximius wrote:
         | Seems like the liability should also be on _Copilot itself_ ,
         | as a derivative work.
        
       | fourseventy wrote:
       | Ahh yes the infamous "evil floating point bit level hacking" code
        
       | tyingq wrote:
       | They have 4 hand picked examples on their homepage:
       | https://copilot.github.com/
       | 
       | One has the issue with form encoding:
       | https://news.ycombinator.com/item?id=27697884
       | 
       | The python example is using floats for currency, in an expense
       | tracking context.
       | 
       | The golang one uses a word ("value") for a field name that's been
       | a reserved word since SQL-1999. It will work in popular open
       | source SQL databases, but I believe it would bomb in some servers
       | if not delimited...which it is not.
       | 
       | The ruby one isn't outright terrible, but shows a very
       | Americanized way to do street addresses that would probably
       | become a problem later.
       | 
       | And these are the hand picked examples. This product seems like
       | it needs some more thought. Maybe a way to comment, flag, or
       | otherwise call out bad output?
        
         | xyzzy_plugh wrote:
         | > The golang one uses a word ("value") for a field name that's
         | been a reserved word since SQL-1999. It will work in popular
         | open source SQL databases, but I believe it would bomb in some
         | servers if not delimited...which it is not.
         | 
         | In their defense they created the table with this column before
         | invoking the autocomplete, so they sort of reap what the sow
         | here.
         | 
         | It could at least auto-quote the column names to remove the
         | ambiguity, but it's not a compiler, is it.
        
         | mempko wrote:
         | These are great examples. I wrote about how this will propagate
         | all sorts of bugs.
         | 
         | But my argument was that it's good enough developers may get
         | complacent and not review the auto complete closely enough. But
         | maybe I'm wrong! Maybe it's not that good yet.
        
         | shadowgovt wrote:
         | Now that they have an AI that can be trained to replicate code,
         | it looks like the next step is training it to replicate good
         | code. That will be non-trivial, since step one is identifying
         | good code and they may not have much big data signal to draw
         | from for that.
         | 
         | We know you can't use StackOverflow upvotes. However, they
         | should have enough signal to identify what snippets of code
         | have been most frequently copy-pasted from one project to
         | another.
         | 
         | Question is whether that serves as a good proxy for good code
         | identification.
        
         | slver wrote:
         | > And these are the hand picked examples. This product seems
         | like it needs some more thought.
         | 
         | Everyone's self-preservation instincts kicking in to attack
         | Copilot is kinda amusing to watch.
         | 
         | Copilot is not supposed to produce excellent code. It's not
         | even supposed to produce final code, period. It produces
         | suggestions to speed you up, and it's on you to weed out stupid
         | shit, which is INEVITABLE.
         | 
         | As a side note, Excel also uses floats for currency, so best
         | practice and real world have a huge gap in-between as usual.
        
           | Supermancho wrote:
           | > Everyone's self-preservation instincts kicking in to attack
           | Copilot is kinda amusing to watch
           | 
           | Nobody is threatened by this, assuredly. As with IDEs giving
           | us autocomplete, duplication detection, etc this can only be
           | helpful. There is an infinite amount of code to write for the
           | foreseeable future, so it would be great if copilot had more
           | utility.
        
           | mkr-hn wrote:
           | Have you met programmers? Even those who care about quality
           | are often under a lot of pressure to produce. Things slip
           | through. Before, it was verbatim copies from Stack Overflow.
           | Now it'll be using Copilot code as-is.
        
             | slver wrote:
             | So, nothing new, is your point?
        
               | mkr-hn wrote:
               | Then why are you complaining? Unless something is new
               | that warrants you getting mad about people getting mad at
               | technology.
        
               | saiojd wrote:
               | Not the parent, but people really like to get riled up on
               | the same topics, over and over again, which quickly
               | monopolizes and derails all conversion. Facebook bad, UIs
               | suck, etc. We can now add to the list, "AI will never
               | reduce demand for software engineering".
        
           | volta83 wrote:
           | So how do you know if the code that Copilot regurgitates is
           | almost a 1:1 verbatim copy of some GPL'ed code or not ?
           | 
           | Because if you don't realize this, you might be introducing
           | GPL'ed code into your propiertary code base, and that might
           | end up forcing you to distribute all of the other code in
           | that code base as GPL'ed code as well.
           | 
           | Like, I get that Copilot is really cool, and that software
           | engineers like to use the latest and bestest, but even if the
           | code produced by Copilot is "functionally" correct, it might
           | still be a catastrophic error to use it in your code base due
           | to licenses.
           | 
           | This issue looks solvable. Train 2 copilots, one using only
           | BSD-like licensed software, and one using also GPL'ed code,
           | and let users choose, and/or warn when the snippet has been
           | "heavily inspired" by GPL'ed code.
           | 
           | Or maybe just train an adversarial neural network to detect
           | GPL'ed code, and use it to warn on snippets, or...
        
             | the_rectifier wrote:
             | You have the same issue with MIT because it requires
             | attribution
        
             | slver wrote:
             | It's very easy: don't use copilot code verbatim, and you
             | won't have GPL code verbatim.
        
               | volta83 wrote:
               | > It's very easy: don't use copilot
               | 
               | Fixed that for you.
               | 
               | Verbatim isn't the problem / solution. If you take a
               | GPL'ed library and rename all symbols and variables, the
               | output is still a GPL'ed library.
               | 
               | Just seeing the output of GPL'ed code spitted by copilot
               | and writing different code "inspired" by it can result in
               | GPL'ed code. That's why "clean room"s exist.
               | 
               | Copilot is going to make for a very interesting to follow
               | law case, because probably until somebody sues, and
               | courts decide, nobody will have a definitive answer of
               | whether it is safe to use or not.
        
               | throw_2021-07 wrote:
               | Stack Overflow content is licensed under CC-BY-SA. Terms
               | [1]:
               | 
               | * Attribution -- You must give appropriate credit,
               | provide a link to the license, and indicate if changes
               | were made. You may do so in any reasonable manner, but
               | not in any way that suggests the licensor endorses you or
               | your use.
               | 
               | * ShareAlike -- If you remix, transform, or build upon
               | the material, you must distribute your contributions
               | under the same license as the original.
               | 
               | In over a decade of software engineering, I've seen many
               | reuses of Stack Overflow content, occasionally with links
               | to underlying answers. All Stack Overflow content use
               | I've seen would clearly fail the legal terms set out by
               | the license.
               | 
               | I suspect Copilot usage will similarly fail a stringent
               | interpretation of underlying licenses, and will similarly
               | face essentially no enforcement.
               | 
               | [1] https://creativecommons.org/licenses/by-sa/4.0/
        
             | guhayun wrote:
             | The solution might be simpler than we think,just tell the
             | algorithm
        
             | didibus wrote:
             | Doesn't this go beyond license and into copyright?
             | 
             | The license lets you modify the program, but the copyright
             | still enforces that you can't copy/past code from it to
             | your own project no?
        
           | pydry wrote:
           | It's true I probably wouldnt have laughed quite as loudly if
           | there werent a chorus of smug economists telling us that
           | tools like this are gonna put me out of a job.
        
             | slver wrote:
             | Business types hate dealing with programmers, that's a
             | fact. And these claims of "we'll replace programmers"
             | happen with certain precise regularity.
             | 
             | Ruby on Rails was advertised as so simple, startup founders
             | who can't program were making their entire products in it
             | in a few days, with zero experience. As if.
        
           | j-pb wrote:
           | If I want random garbage in my codebase that I have to fix
           | anyways I might as well hire a underpaid intern/junior.
           | 
           | It's easier to write correct code than to fix buggy code. For
           | the former you have to understand the problem, for the latter
           | you have to understand the problem, and a slightly off
           | interpretation of it.
        
           | tyingq wrote:
           | _" self-preservation"_
           | 
           | My suggestion was a way to comment or flag, not to kill the
           | product. These were particularly notable to me because
           | someone hand-picked these 4 to be the front page examples of
           | what a good product it was.
        
           | saiojd wrote:
           | I agree with you. This is basically similar to autocomplete
           | on cellphone keyboard (useful because typing is hard on
           | cellphone), but for programming (useful because what we type
           | tends to involve more memorization than prose).
        
           | tyingq wrote:
           | >As a side note, Excel also uses floats for currency
           | 
           | It's still problematic, but the defaults and handling there
           | avoid some issues. So, for example:
           | 
           | Excel: =1.03-.42 produces 0.61, by default, even if you
           | expand out the digits very far.
           | 
           | Python: 1.03-.42 produces 0.6100000000000001, by default.
        
             | slver wrote:
             | Excel rounds doubles to 15 digits for display and
             | comparison. The exact precision of doubles is something
             | like 15.6 digits, those remaining 0.6 digits causing some
             | of those examples floating (heh) around.
        
               | okl wrote:
               | That depends
               | https://randomascii.wordpress.com/2012/03/08/float-
               | precision...
        
               | slver wrote:
               | A lot of these edge cases are about theoretical concerns
               | like "how many digits we need in decimal to represent an
               | exact IEEE binary float".
               | 
               | In practice a double is 15.6 digits precise, which Excel
               | rounds to 15 to eliminate some weirdness.
               | 
               | In their documentation they do cite their number type as
               | 15 digit precision type. Ergo that's the semantic they've
               | settled on.
        
         | ssss11 wrote:
         | "Maybe a way to comment, flag, or otherwise call out bad
         | output?"
         | 
         | A copilot for copilot? :)
        
         | TeMPOraL wrote:
         | - The Go one (averaging) is non-idiomatic, and has a nasty bug
         | in it: https://news.ycombinator.com/item?id=27698287
         | 
         | - The JavaScript one (memoization) is a bad implementation, it
         | doesn't handle some argument types you'd expect it to handle:
         | https://news.ycombinator.com/item?id=27698125
         | 
         | You can tell a lot about what to expect, if there are so many
         | bugs in the very examples used to market this product.
        
         | gentleman11 wrote:
         | > The python example is using floats for currency.
         | 
         | Dumb question, but what is the proper way to handle currency?
         | Custom number objects? Strings for any number of decimal
         | places?
        
           | spamizbad wrote:
           | For Python, I prefer decimal.Decimal[1]. When you serialize,
           | you can either convert it to a string (and then have your
           | deserializer know the field type and automatically encode it
           | back into a decimal) OR just agree all numeric values can
           | only be ints or decimals. You can pass
           | parse_float=decimal.Decimal to json.loads[2] to make this
           | easier.
           | 
           | My most obnoxious and spicy programming take is that ints an
           | decimals should be built-in and floats should require
           | imports. I understand why though: Decimal encoding isn't
           | anywhere near as standardized as other numeric types like
           | integers or floating-point numbers.
           | 
           | [1] https://docs.python.org/3/library/decimal.html [2]
           | https://docs.python.org/3/library/json.html
        
             | dragonwriter wrote:
             | > My most obnoxious and spicy programming take is that ints
             | an decimals should be built-in and floats should require
             | imports
             | 
             | I don't care about making inexact numbers require imports,
             | but the most natural literal formats should produce exact
             | integers, decimals, and/or rationals.
        
           | dangerbird2 wrote:
           | Either a fixed-point decimal (i.e. an integer with the ones
           | representing 1/100, 1/1000, etc. of a dollar, or a ratio type
           | if you need arbitrary precision.
        
             | Quekid5 wrote:
             | > ratio type if you need arbitrary precision.
             | 
             | This is the better default, so I'd ditch the qualifier,
             | personally. At the very least when it comes to the
             | persistent storage of monetary amounts. People often start
             | out _thinking_ that they won 't need arbitrary precision
             | until that _one little requirement_ trickles into the
             | backlog...
             | 
             | Arbitrary precision rationals handles all the artithmetic
             | you could reasonably want to do with monetary amounts and
             | it lets you decide where to round _at display time_ (or
             | when generating a final invoice or whatever), so there 's
             | no information loss.
        
           | SmooL wrote:
           | Yeah, you probably want to use some sort of decimal package
           | for a configurable amount of precision, and then use strings
           | when serializing/storing the values
        
           | wodenokoto wrote:
           | A lot of good answers, but they mostly relate to accounting
           | types of problems (which granted, is what you need to do with
           | currency data 99% of the time)
           | 
           | I'd just add that if you are building a price prediction
           | model, floats are probably what you need.
        
             | tyingq wrote:
             | The example code is the start of an expense tracking
             | tool...
        
           | stickfigure wrote:
           | Create a Money class, or use one off the shelf. It should
           | store the currency and the amount. There are a few popular
           | ways of storing amounts (integer cents, fixed decimal) but it
           | should not be exposed outside the Money class.
           | 
           | There's plenty of good advice in this subthread for how to
           | represent currency inside your Money abstraction, but
           | whatever you do, keep it hidden. If you pass around numbers
           | as currency values you will be in for a world of pain as your
           | application grows.
        
           | pizza234 wrote:
           | This is a complex topic, mainly for two reasons: 1. it works
           | on two layers (storage and code) 2. there is a context to
           | take care of.
           | 
           | [Modern] programming languages have decimal/rational data
           | types, which (within limits) are exact. Where this is not
           | possible, and/or it's undesirable for any reason, just use an
           | int and scale it manually (e.g. 1.05 dollars = int 105).
           | 
           | However, point 2 is very problematic and important to
           | consider. How do account 3 items that cost 1/3$ each (e.g. if
           | in a bundle)? What if they're sold separately? This really
           | depends on the requirements.
           | 
           | My 20 cents: if you start a project, start storing currency
           | in an exact form. Once a project grows, correcting the FP
           | error problem is a big PITA (assuming it's realistically
           | possible).
        
             | himinlomax wrote:
             | > How do account 3 items that cost 1/3$ each (e.g. if in a
             | bundle)?
             | 
             | You never account for fractional discrete items, it makes
             | no sense. A bundle is one product, and a split bundle is
             | another. For products sold by weight or volume, it's
             | usually handled with a unit price, and a fractional
             | quantity. That way the continuous values can be rounded but
             | money that is accounted for needs not be.
        
             | XorNot wrote:
             | The problem is also stupid people and companies.
             | 
             | My last job they wanted me to invoice them hours worked,
             | which was some number like 7.6.
             | 
             | This number plays badly when you run it through GST and
             | other things - you get repeaters.
             | 
             | So I looked up common practice here, even tried asking
             | finance who just said "be exact", and eventually settled on
             | that below 1 cent fractions I would round up to the nearest
             | cent in my favour for each line item.
             | 
             | First invoice I hand them, they manually tally up all the
             | line items and hours, and complain it's over by 55 cents.
             | 
             | So I change it to give rounded line items but straight
             | multiplied to the total - and they complain it doesn't
             | match.
             | 
             | Finally I just print decimal exact numbers (which are
             | occasionally huge) and they stop complaining - because
             | excel is now happy the sums match when they keep second
             | guessing my invoices.
             | 
             | All of this of course was irrelevant - I still had to put
             | hours into their payroll system as well (which they checked
             | against) and my contract specifically stated what my day
             | rate was to be in lieu of notice.
             | 
             | So how should you do currency? Probably in whatever form
             | that matches how finance are using excel, which does it
             | wrong.
        
               | hobs wrote:
               | I wish this was untrue, but I have spent years hearing
               | the words "why dont my reports match?" - no amount of
               | logic, diagrams, explaining, the next quarter or instance
               | - "why dont my reports match?"
               | 
               | BECAUSE EXCEL SUCKS MY DUDE.
        
               | zdragnar wrote:
               | Well, they did say to be exact, and you handed them
               | approximations, so...
        
               | mokus wrote:
               | The "exact" version they wanted was full of
               | approximations too. They just didn't have enough
               | numerical literacy to understand how to say how much
               | approximation they are ok with.
               | 
               | I guarantee nothing in anyone's time accounting system is
               | measured to double-precision accuracy. Or at least, I've
               | never quite figured out the knack myself for stopping
               | work within a particular 6 picosecond window.
        
               | XorNot wrote:
               | Sure, but at the end of the day someone had to pay me an
               | integer amount of cents. They wanted a total which was a
               | normal dollar figure. But when you sum up 7.6 times
               | whatever a whole lot, you _might_ get a nice round number
               | or you might get an irrational repeater.
               | 
               | What's notable is clearly no one had actually thought
               | this through at a policy level - the answer was "excel
               | goes brrrr" depending on how they want to add up and
               | subtotal things.
        
             | dralley wrote:
             | >[Modern] programming languages have decimal/rational data
             | types
             | 
             | This caveat is kind of funny, in light of COBOL having
             | support for decimal / fixed precision data types baked
             | directly into the language.
             | 
             | It's not a problem with "non-modern" languages, it's a
             | problem with C and many of its successors. That's precisely
             | why many "non-modern" languages have stuck around so long.
             | 
             | https://medium.com/the-technical-archaeologist/is-cobol-
             | hold...
             | 
             | Additionally, mainframes are so strongly optimized for
             | hardware-accelerated fixed point decimal computing that for
             | a lot of financial calculations it can be legitimately
             | difficult to match their performance with standard
             | commercial hardware.
        
               | caleb-allen wrote:
               | It is quite simple to do the same in Julia
        
               | adwn wrote:
               | > _It 's not a problem with "non-modern" languages, it's
               | a problem with C and many of its successors._
               | 
               | Not really. Any semi-decent modern language allows the
               | creation of custom types which support the desired
               | behavior and often some syntactic sugar (like operator
               | overloading) to make their usage more natural. Take C++,
               | for example, the archetypal "C successor": It's almost
               | trivial to define a class which stores a fixed-precision
               | number and overload the +, -, *, etc. operators to make
               | it as convenient as a built-in type, and put it in
               | library. In my book, this is vastly superior to making
               | such a type a built-in, because you can never satisfy
               | everyone's requirements.
        
               | pjmlp wrote:
               | It is also trivial to keep doing C mistakes with a C++
               | compiler, hence no matter how many ISO revisions it will
               | still have, lack of safety due to C copy-paste
               | compatibility will never be fixed.
        
               | adwn wrote:
               | > _[...] no matter how many ISO revisions it will still
               | have, lack of safety due to C copy-paste compatibility
               | will never be fixed._
               | 
               | Okay, no idea how that's relevant to "built-in decimal
               | types" vs "library-defined decimal types", but if it
               | makes you feel better, you can do the same in Rust or
               | Python, two languages which are "modern" compared to
               | COBOL, don't inherit C's flaws, and which enable defining
               | custom number types/classes/whatever together with
               | convenient operator overloading.
        
               | pjmlp wrote:
               | Rust I agree, Python not really as the language doesn't
               | provide any way to keep invariants.
        
               | adwn wrote:
               | > _Python not really as the language doesn 't provide any
               | way to keep invariants_
               | 
               | Again, how is that relevant? If there's no way to enforce
               | an invariant in _custom data types_ , then there's also
               | no way to enforce invariants in _code using built-in data
               | types_.
        
               | pjmlp wrote:
               | It is surely relevant.
               | 
               | Rust provides the mechanisms to enforce them, while in
               | Python, like all dynamic languages, everything is up for
               | grabs.
        
               | adwn wrote:
               | What I meant [1] was: In Python, invariants are enforced
               | by conventions, not by the compiler. If that's not
               | suitable for a given use case, then Python is _entirely_
               | unsuited for that use case, regardless whether it
               | provides built-in decimal types or user-defined decimal
               | types. That 's why I said that your objection regarding
               | invariant enforcement is irrelevant to this discussion.
               | 
               | [1] (but was to lazy to write out)
        
               | [deleted]
        
           | kyrra wrote:
           | To pile on, here's a copy/paste from when this was asked a
           | few days ago:
           | 
           | Googler, opinions are my own. Over in payments, we use micros
           | regularly, as documented here:
           | https://developers.google.com/standard-
           | payments/reference/gl...
           | 
           | GCP on there other hand has standardized on unit + nano. They
           | use this for money and time. So unit would 1 second or 1
           | dollar, then the nano field allows more precision. You can
           | see an example here with the unitPrice field:
           | https://cloud.google.com/billing/v1/how-tos/catalog-
           | api#gett...
           | 
           | Copy/paste the GCP doc portion that is relevant here:
           | 
           | > [UNITS] is the whole units of the amount. For example if
           | currencyCode is "USD", then 1 unit is one US dollar.
           | 
           | > [NANOS] is the number of nano (10^-9) units of the amount.
           | The value must be between -999,999,999 and +999,999,999
           | inclusive. If units is positive, nanos must be positive or
           | zero. If units is zero, nanos can be positive, zero, or
           | negative. If units is negative, nanos must be negative or
           | zero. For example $-1.75 is represented as units=-1 and
           | nanos=-750,000,000.
        
           | ronnier wrote:
           | In its base unit. So cents in USD. Which can be an int64
           | 
           | Or if your language has something specific built in, use
           | that.
        
             | umanwizard wrote:
             | Not necessarily. It depends on the application.
        
             | ainar-g wrote:
             | > Or if your language has something specific built in, use
             | that.
             | 
             | Unless your language is PostgreSQL's dialect of SQL,
             | apparently. https://wiki.postgresql.org/wiki/Don%27t_Do_Thi
             | s#Don.27t_use...
        
               | pilif wrote:
               | It has the same issue that the other suggestion of your
               | parent comment had: it can't deal with fractions of
               | cents, which is an issue you will most likely run into
               | before you will into floating point rounding issues.
        
               | fredros wrote:
               | Of course for databases you should use a decimal.
        
             | tzs wrote:
             | > In its base unit. So cents in USD. Which can be an int64.
             | 
             | Note that if you use cents in the US so that everything is
             | an integer then as long as you do not have to deal with
             | amounts that are outside the range [-$180 trillion, $180
             | trillion] you can also use double. Double can exactly
             | represent all integer numbers of cents in that range.
             | 
             | This may be faster than int64 on some systems, especially
             | on systems that do not provide int64 either in hardware or
             | in the language runtime so you'd have to do it yourself.
        
           | marcosdumay wrote:
           | Each country has a law or something similar that states how
           | people should calculate over prices.
           | 
           | The usual is to use decimal numbers with fixed precision (the
           | actual precision varies from one country to another), and I
           | don't know of any modern exception. But as late as the 90's
           | there were non-decimal monetary systems around the world, so
           | if you are saving any historic data, you may need something
           | more complex.
        
           | umanwizard wrote:
           | Depends what you're doing. In fact it's not _always_ wrong to
           | use floats for currency. For accounting you should probably
           | use a fixed-precision decimal type.
        
             | jacobsenscott wrote:
             | If someone asks how to handle money the best answer is
             | integers or fixed precision decimals. There may be a valid
             | case for using floats, but if someone asks they shouldn't
             | be using floats.
             | 
             | Also I'm hard pressed to come up with a case where floats
             | would work. Can you give an example?
        
               | umanwizard wrote:
               | > Can you give an example?
               | 
               | The answer is the same as _any_ time you should use
               | floats: where you don't care about answers being exact,
               | either (1) because calculation speed is more important
               | than exactness, or (2) because your inputs or
               | computations involve uncertainty anyway, so it doesn't
               | matter.
               | 
               | This is more likely to be the case in, say, physics than
               | it is in finance, but it's not impossible in the latter.
               | For example, if you are a hedge fund and some model
               | computes "the true price of this financial instrument is
               | 214.55", you certainly want to buy if it's being sold for
               | 200, and certainly don't if it's being sold for 250, but
               | if it's being sold for 214.54, the correct interpretation
               | is that _you aren't sure_.
               | 
               | When people say "you should never use floats for
               | currency", their error is in thinking that the only
               | applications for currency are in accounting, billing, and
               | so on. In those applications, one should indeed use a
               | decimal type, because we do care about the rounding
               | behavior exactly matching human customs.
        
               | tyingq wrote:
               | That's fair, though the example code I mentioned is the
               | start of an expense tracker.
        
               | umanwizard wrote:
               | Fair enough -- in that case, you should definitely use
               | either a decimal type or an integer.
        
               | jacobsenscott wrote:
               | Good answer. I've only ever worked on accounting style
               | financial apps, so I've didn't think of those types of
               | cases.
        
               | naniwaduni wrote:
               | You can't use a generic decimal type in that case either!
               | You need a special-purpose type that rounds exactly
               | matching the conventions you're following. This is
               | necessarily use-, culture-, and probably currency-
               | specific.
        
               | bidirectional wrote:
               | Most things in front office use floats in my experience,
               | e.g. derivative pricing, discounting, even compound
               | interest. None of these things are going to be any better
               | with integers or fixed-precision, but maybe harder to
               | write and slower.
        
               | stevesimmons wrote:
               | Yes, the risk management/instrument pricing part in the
               | "Front Office" uses floats, because the calculations
               | involve compound interest and discount rates.
               | 
               | And the downstream parts for trade confirmation ("Middle
               | Office"), settlement and accounting ("Back Office") used
               | fixed precision. Because they are fundamentally
               | accounting, which involves adding things up and cross-
               | checking totals.
               | 
               | These two parts have a very clear boundary, with strictly
               | defined rounding rules when the floating point
               | risk/trading values get turned into fixed point
               | accounting values.
        
           | lordgilman wrote:
           | Integer cents or an arbitrary precision decimal type.
        
             | shagie wrote:
             | Having worked on a POS system, the issue of using cents
             | alone is if you've got something like "11% rebate" and you
             | need to deal with fractional cents.
             | 
             | The arbitrary precision decimal type should be the default
             | answer for currency until it is shown that the requirements
             | no and at no time in the future will _ever_ require
             | fractional units of the smallest denomination.
             | 
             | As an aside, this may be constrained by the systems that
             | the data is persisted into too... the Buffett Overflow is a
             | real thing ( https://news.ycombinator.com/item?id=27044044
             | ).
        
           | _ZeD_ wrote:
           | python has the `decimal` module in the stdlib
        
           | tyingq wrote:
           | There's no one answer, but decimal counts of the smallest
           | unit that needs to be measured is common. Like pennies in the
           | US, or maybe "number of 1/10 pennies" if there's things like
           | gasoline tax.
        
             | bpicolo wrote:
             | You can use integers instead of decimal if you're using the
             | smallest unit.
        
             | bencollier49 wrote:
             | Say what you like about COBOL, but it got this stuff right.
        
           | bidirectional wrote:
           | Every front office finance project I have ever worked on has
           | used floating point, so take the dogma with a grain of salt.
           | It depends entirely on the context.
        
             | jhugo wrote:
             | They probably just accumulate the rounding errors into an
             | account and write it off periodically without even
             | realising why it happens.
        
               | bidirectional wrote:
               | No, it's just that we're in the realm of predictions and
               | modelling, not accounting. If you're constructing a curve
               | to forecast 50 years of interest rates from a limited set
               | of instruments, you're already accepting a margin of
               | error orders of magnitude greater than the inaccuracies
               | introduced by floating point.
               | 
               | The models also use transcendental functions which cannot
               | be accurately calculated with fixed point, rationals,
               | integers etc.
        
               | jhugo wrote:
               | Makes sense; I wasn't aware of the meaning of "front
               | office" as a term of art in finance.
        
               | BruiseLee wrote:
               | It's not like decimal or fixed point does not suffer from
               | rounding errors either. In fact for many calculations,
               | binary floating point gives more accurate answers.
               | 
               | In accounting there are specific rules that require
               | decimal system, so one must be very careful with the
               | floating point if it is used.
        
             | dirkt wrote:
             | And the all suffer from rounding error problems?
             | 
             | I mean, fixed point and a specific type for currency (which
             | also should include the denomination, while we are at it)
             | are not rocket science. Spreadsheets get that right, at
             | least.
        
               | bidirectional wrote:
               | Excel uses IEEE-754 floating point, so I don't get what
               | you mean with the spreadsheet comment. It has formatting
               | around this which rounds and adds currency symbols, but
               | it's floating point you're working with.
               | 
               | Rounding error doesn't matter on these types of financial
               | applications. It's the less glamorous accounting work
               | that has to bother with that.
               | 
               | They're not rocket science, but they're unnecessary, and
               | would still be off anyway. Try and calculate compound
               | interest with your fixed point numbers.
        
           | dragonwriter wrote:
           | > Dumb question, but what is the proper way to handle
           | currency?
           | 
           | In python, for exact applications (not many kinds of
           | modeling, where floats are probably right), decimal.Decimal
           | is usually the right answer, but fractions.Fraction is
           | sometimes more appropriate, and if you are using NumPy or
           | tools dependent on it, using integers (representing decimals
           | multiplied by the right power of 10 to get the minimum unit
           | in the ones position) is probably better.
        
           | trevor-e wrote:
           | Someone already mentioned there's a `decimal` package in
           | Python that's better suited for currency. Back when I was a
           | Java developer we used this: https://docs.oracle.com/javase/7
           | /docs/api/java/math/BigDecim...
        
           | kkirsche wrote:
           | The Decimal class is one way if you roll your own. py-moneyed
           | seems to be a well maintained library though I haven't used
           | it.
           | 
           | Disclaimer: I only work with currency in hobby projects.
        
           | thayne wrote:
           | An integer of the smallest denomination. For example, cents
           | for the American dollar. And you probably would want to wrap
           | it in a custom type to simplify displaying it properly, and
           | maybe handle different currencies. If you language has a
           | fixed point type that might also be appropriate, but that's
           | pretty rare, and wouldn't work for currencies that aren't
           | decimal (like the old british pound system).
        
             | biztos wrote:
             | Do they still use fractional cents (or whatever) in
             | finance?
             | 
             | https://money.howstuffworks.com/personal-
             | finance/financial-p...
        
             | TchoBeer wrote:
             | What if I'm calculating sales tax? Can't use an integer
             | anymore.
        
               | kyllo wrote:
               | Yes, you can. There are algorithms for rounding up,
               | rounding down, rounding to nearest, and banker's
               | rounding, on the results of integer division. This is a
               | solved problem.
        
         | eulers_secret wrote:
         | I haven't seen anyone mention this issue for some reason, but
         | in fetch_tweets.py:
         | fetch_tweets_from_user(user_name):           ...
         | tweets = api.user_timeline(screen_name=user, count=200,
         | include_rts=False)
         | 
         | 'user' isn't defined, should be user_name, right? Side note,
         | 'copilot' is a decent name for this (though copilots are
         | usually very competent, moreso than this right now). You _must_
         | check the suggestions carefully. Maybe it 'll make folks better
         | at code review, lol.
        
         | rjknight wrote:
         | > This product seems like it needs some more thought. Maybe a
         | way to comment, flag, or otherwise call out bad output?
         | 
         | Wait for your colleagues to use it, fix the bad code in the
         | pull request, and wait for copilot to learn from the new
         | training data you just provided!
        
           | more_corn wrote:
           | This is actually a good idea that is missing from nearly
           | every machine learning product. How do you back propagate
           | lessons from user interaction into future training of the
           | model? It can be done, I can't think of a place I've seen it
           | done though.
        
             | adriancr wrote:
             | It would be viewed as IP theft by most companies to upload
             | private code to this for use by others
        
               | dgb23 wrote:
               | It would have to be in the same range of what is
               | suggested, small patches and opt in.
               | 
               | If snippets are a legal problem, then Copilot is
               | problematic by default, since it suggests code that may
               | or may not be sourced from free software.
        
               | adriancr wrote:
               | Even free software snippets have clauses like GPL or
               | attribution.
               | 
               | Putting GPL code in proprietary codebase would cause a
               | company massive headaches...
               | 
               | So I agree copilot is problematic by default, liability
               | to lawsuits for employers and forced open sourcing,
               | liability to IP lawsuits as well which will end up on
               | employees shoulders.
        
             | TeMPOraL wrote:
             | It's tricky, because once you start accepting user
             | feedback, you need to _moderate_ it, or else someone will
             | poison your model for fun and profit.
        
           | giantg2 wrote:
           | But what about all the bad training data provided too?
        
         | amelius wrote:
         | What are the statistics of Copilot based on a validation set?
         | How often does it get code right?
         | 
         | I want to see hard statistics, not 4 hand-picked examples.
        
           | rubatuga wrote:
           | Yeah. And like how would you even devise a metric? Like
           | compile it down to assembly and see if it's similar logic?
        
             | amelius wrote:
             | Well, this is the question which the producers of the tool
             | should answer.
             | 
             | You can't just release a ML tool onto the public if you
             | haven't validated it first.
        
           | mnky9800n wrote:
           | That's what I thought when I first started working in text
           | generation too. It's highly annoying people pitch their
           | successful models with hand picked examples. It's literally
           | the opposite of STATISTICAL learning imo.
        
         | foobiekr wrote:
         | Copilot appears to be "give more efficiency leverage to the
         | worst kind of coder."
        
           | codyb wrote:
           | Hmm... I mean, these all seem like mistakes I could make and
           | I don't think I'm the "worst kind of coder".
           | 
           | The currency one I learned a while back, but it's not like I
           | intuited using integers by default.
           | 
           | Value being a reserved keyword, I'm not sure I'd know that
           | and I do Postgres work as part of my myriad duties at the
           | startup I work at. Maybe I'd make that mistake in a
           | migration, maybe I have already.
           | 
           | In a way, is it much different then what we do now as
           | engineers? I'm hard pressed to call it much of an engineering
           | discipline considering most teams I work on barely do design
           | reviews before they launch in to writing code, documentation
           | and meeting minutes are generally an afterthought, and the
           | code review process while decent isn't perfect either and
           | often times relies on arcane knowledge derived over months
           | and years of wrangling with particular <framework, project,
           | technology>.
           | 
           | It's pretty neat, presumably it'll learn as people correct
           | it, and it'll get better over time. I mean it's not even
           | version one.
           | 
           | I get the concerns, but I think they're a bit overblown, and
           | this'll be really useful for people who want to learn how to
           | code. Sure they'll run into some bugs, but, I mean, they were
           | going to do that anyways.
        
             | voakbasda wrote:
             | Is this any worse? Maybe not. Is it better? Absolutely not.
             | 
             | This kind of tool will only further entrench the production
             | of mediocre, bug-ridden code that plagues the world. As
             | implemented, this will not be a solution; it is a express
             | lane in the race to the bottom.
        
               | pasquinelli wrote:
               | it _is_ a race to the bottom, and people are trying to
               | win. any skilled trade is being turned into an unskilled
               | job. it might suck, the results might suck, but it 's
               | more profitable, and that's what matters.
        
           | ticviking wrote:
           | I'm not really sure that type of tool could really be
           | anything else.
           | 
           | How would a model become aware of all of the various edge
           | cases that depend on which SQL database you use or
           | differences in language versions over time?
        
             | sbr464 wrote:
             | Can it submit pull requests to itself with if/else boolean
             | logic/hacks?
        
             | gmadsen wrote:
             | a large data set covering exactly what you just mentioned?
        
             | TeMPOraL wrote:
             | > _I 'm not really sure that type of tool could really be
             | anything else._
             | 
             | It can't be, because they've chosen to use a deep learning
             | approach. That makes it a dead end right from the start.
             | 
             | > _How would a model become aware of all of the various
             | edge cases that depend on which SQL database you use or
             | differences in language versions over time?_
             | 
             | A lot of things that we call "edge cases" are only a
             | problem for humans. They're not "edge cases" from the point
             | of view of the grammar / semantics of programming languages
             | and libraries. The way a hypothetical, better Copilot could
             | work, is by having directly encoded grammars and semantics
             | metadata corresponding to popular languages and tools. It
             | could generate code in principled and introspectable way,
             | by having a model of the computation it wants to express
             | and encoding it in a target language.
             | 
             | Of course, such hypothetical Copilot is a harder task -
             | someone would have to come up with a structure for
             | explicitly representing understanding of the abstract
             | computation the user wants to happen, and then translate
             | user input into that structure. That's a lot of drudgery,
             | and from my vague understanding of the "classical" AI
             | space, there might be a bunch of unsolved problems on the
             | way.
             | 
             | Real Copilot uses DNNs, because they let you ignore all
             | that - you just keep shoving code at it, until the black-
             | box model starts to give you mostly correct answers. The
             | hard work is done automagically. It makes sense for some
             | tasks, less for others - and I think code generation is one
             | of those things where black-box DNNs are a bad idea.
        
               | jhgb wrote:
               | > The way a hypothetical, better Copilot could work, is
               | by having directly encoded grammars and semantics
               | metadata corresponding to popular languages and tools. It
               | could generate code in principled and introspectable way,
               | by having a model of the computation it wants to express
               | and encoding it in a target language.
               | 
               | But that sounds like too much work, let's just throw a
               | lot of data into an NN and see what comes out! /s
               | 
               | > and introspectable
               | 
               | Which most importantly means "debuggable", I assume. From
               | what I get there doesn't seem to be any way to ad-hoc fix
               | an NN's output.
        
           | heroHACK17 wrote:
           | This is my thought as well. I get the "make productive
           | engineers even more productive" angle, but productive
           | engineers' bottleneck isn't coding. Sure, coding up a
           | boilerplate Go web server is tedious, but I have done it so
           | many times that it takes me two seconds now.
           | 
           | On the flip side, coding can be the bottleneck for the worst
           | kind of coder. When I first started coding, coding was hard
           | simply because I had very little reps and was just learning
           | to understand how to code common solutions, data structures,
           | libraries, etc. Fast forward a few years and, if I were still
           | struggling to understand these concepts, Copilot is a
           | lifeline.
        
             | hamandcheese wrote:
             | I'm gonna have to disagree - coding can and does take
             | significant amounts of time even when I know exactly what
             | problem I am solving.
             | 
             | I admit that at many organizations there are so many other
             | factors and bottlenecks, but it's not uncommon that I find
             | myself 8+ hours deep into a coding task that I had expected
             | would be much shorter.
             | 
             | On the other hand, usually that's due to refactoring or
             | otherwise not being satisfied with the quality of my
             | initial solution, so copilot probably wouldn't help...
        
           | captn3m0 wrote:
           | I find it is reducing my research time by providing a decent
           | starting solution space. Especially for boring stuff where
           | you just need to google the signature of some standard
           | library function.
        
           | majormajor wrote:
           | It takes what should be your method of last resort -
           | copypaste - and makes it the first thing you try.
           | 
           | All the steps in between - looking at the docstring for the
           | function you're calling, googling for more general
           | information, looking at and _deciding not to use_ not-
           | applicable or poorly-written SO answers - get pushed aside.
           | So instead of you having to convince yourself  "yes, it's
           | safe to copy-paste these lines from SO, they actually fit my
           | problem" you're presented with magic and I think the burden
           | for rejecting it is going to be higher once it's in your
           | editor than when you're just reading it on a SO post or
           | Github snippet.
           | 
           | Even for a newcomer looking to learn, working on simple stuff
           | that it has great completions for, it seems like it will
           | sabotage your long-term growth, since it takes all the _why_
           | and the reasoning out of it. Autocomplete for a function name
           | isn 't that relevant to gaining a deeper understanding.
           | Knowing _why_ a certain block of code is passed in in a
           | certain style, or needs to be written at all? Probably that
           | is.
        
             | majormajor wrote:
             | Thinking about it more: there's a very small subset of
             | problems that I think this is actually great for. And I do
             | run into this somewhat often: relatively new libraries or
             | frameworks that don't really care about thorough
             | documentation so they only show you a few happy path
             | snippets and nothing about how to do something more
             | interesting, so you have to bridge the gap between "this
             | one line in the doc obviously doesn't work with me, but I'd
             | like to figure it out without reading all their source code
             | from scratch..." - getting more example snippets barfed up
             | onto my screen from other people who've figured it out
             | before could be a sort of replacement for the library
             | writers having provided documentation in the first place.
             | But ... this is a somewhat insane way to work around a
             | problem of shitty code documentation, and is still
             | insufficient in a couple ways:
             | 
             | * some poor bastard is going to have to be the first person
             | to figure out how to do something, so that copilot itself
             | can know
             | 
             | * any non-code nuances around "oh, if you do that, your
             | memory usage is going to explode" or "oh, by the way, if
             | you do that, make sure you don't do your own threading"
             | will still fail to be communicated.
        
           | groby_b wrote:
           | I've called it "The Fully Mechanized Stack Overflow Brigade"
           | before, and everything that comes to light supports that
           | assessment.
           | 
           | On the upside, think of the consultancy fees you can charge
           | to clean up those messes.
        
         | bierjunge wrote:
         | The Golang example would not even compile, because `sql` is not
         | imported.
        
           | IncRnd wrote:
           | That's for the best. We don't want products that pretend to
           | write code for us, while copying other's code without
           | attribution and that may not even work.
        
         | stevelosh wrote:
         | The golang one also silently drops rows.Err() on the floor.
         | 
         | https://golang.org/pkg/database/sql/#Rows
        
         | jacurtis wrote:
         | > The ruby one isn't outright terrible, but shows a very
         | Americanized way to do street addresses that would probably
         | become a problem later.
         | 
         | As someone who has been coding up address storage and
         | validation for the past week in my current job, that one really
         | made me laugh. Mostly because it tries to simplify all the
         | stuff I have been analyzing and mulling over for a week into a
         | single auto-complete.
         | 
         | Spoiler: The Github Copilot's solution simply won't work. It
         | would barely work for Americanized addresses, but even then not
         | be ideal. Of course trying to internationalize it, this thing
         | isn't even close.
         | 
         | I get what Copilot is trying to do. But at the same time I
         | don't get it. Because from my experience, typing code is the
         | fastest part of my job. I don't really have a problem typing. I
         | spend most of my time thinking about the problem, how to solve
         | it, and considering ramifications of my decisions before ever
         | putting code in the IDE. So Copilot comes around and it
         | autocompletes code for me. But I still have to read what it
         | suggested, making edits to it, and consider if this is solving
         | the problem appropriately. I'm still doing everything I used to
         | do, except it saved me from typing out a block of code
         | initially. I still have to most likely rebuild, edit, or change
         | the function somewhat. So it just saves me from typing that
         | first pass. Well that's the easy part of the job.
         | 
         | I have never had a manager come to me and ask why a project is
         | taking so long where I could answer "it just takes so long to
         | type out the code, i wish I had a copilot that could type it
         | for me". That's why we call it software engineering and not
         | coding. Coding is easy. Software engineering is hard. Github
         | Copilot helps with coding, but doesn't help with Software
         | Engineering.
        
           | reaperducer wrote:
           | _I spend most of my time thinking about the problem, how to
           | solve it, and considering ramifications of my decisions
           | before ever putting code in the IDE. So Copilot comes around
           | and it autocompletes code for me. But I still have to read
           | what it suggested, making edits to it, and consider if this
           | is solving the problem appropriately._
           | 
           | So, rather than helping people program better, all its done
           | is replace a bunch of the offshore cut-and-paste shops with
           | "AI."
        
           | neutronicus wrote:
           | A lot of my job is thinking hard about how to do [X],
           | incidentally needing to remember how to do [trivial thing Y]
           | and looking it up.
           | 
           | Like, I did it before, remember that it was trivial, I just
           | forget the snippet and I have to break focus to look it up -
           | often by scrolling through my own commit history to try and
           | find the time I did [trivial thing Y] four months ago.
           | 
           | I do kind of wish I could automate that. Skipping the actual
           | typing of the snippet is sort of gravy on top of that.
        
             | nitrogen wrote:
             | _It would be nice if there were a way to automate the
             | "remembering what that one function is called and what
             | order the parameters are in" portion of my job._
             | 
             | IME the best thing for this is looking at the method
             | listing in the docs for the classes I'm using. E.g. for
             | Ruby, it's usually looking at the methods in Enumerable,
             | Enumerator, Array, or Hash. Or I'll drop a _binding.pry_
             | into the function, run it, and then type _ls_ to see what
             | 's in scope.
        
               | greyfox wrote:
               | this sounds super interesting, is there a video or upload
               | somewhere that i can watch this being performed in real
               | time?
        
               | nitrogen wrote:
               | I very briefly show some of the interactivity of Ruby+Pry
               | here: https://youtu.be/Gy7l_u5G928?t=805 (the overall
               | code segment starts at
               | https://www.youtube.com/watch?v=Gy7l_u5G928&t=626s)
               | 
               | I'd be happy to hear about better demonstrations, and
               | there's also Pry's website (https://pry.github.io/) where
               | they link to some screencasts.
        
               | shados wrote:
               | Even in the 90s that was a solved problem in Visual Basic
               | with autocomplete. That a lot of dev environments "lost"
               | the ability to do it is mind boggling. With that said,
               | doesn't Rubymine let you do that with autocomplete with
               | the prompt giving you all the info you need? (I haven't
               | done Ruby in a long time).
               | 
               | Still, having to look up the doc or run the code to
               | figure out how to type it is orders of magnitude slower
               | than proper auto complete (be it old school Visual Studio
               | style, or something like Copilot).
        
               | nitrogen wrote:
               | _orders of magnitude slower than proper auto complete_
               | 
               | Having worked extensively with verbose but autocomplete-
               | able languages like Java, compact dynamic languages like
               | Ruby, and a variety of others including C, Scala, and
               | Kotlin, I've come to the conclusion that, for me,
               | autocomplete is a crutch and I develop deeper
               | understanding and greater capabilities when I go to the
               | docs. IDE+Java encourages sprawl, which just further
               | cements the need for an IDE. Vim+Ruby+FZF+ripgrep+REPL
               | encourages me to design code that can be navigated
               | without an IDE, which ultimately results in cleaner
               | designs.
               | 
               | If there's _any_ lag whatsoever in the autocomplete, it
               | breaks my flow state as well. I can maintain flow better
               | when typing out code than when it just pops into being
               | after some hundreds of milliseconds delay. Plus, there 's
               | always the chance for serendipity when reading docs. The
               | docs were written by the language creators for a reason.
               | Every dev should be visiting them often.
        
               | shados wrote:
               | That's totally cool but the grandparent was talking about
               | remembering shit they already knew. Not everyone has a
               | fantastic memory, and remember the arguments are A then B
               | or B then A doesn't deepen your understanding of a
               | language. Most of the time the autocomplete and the
               | official doc use the exact same source anyway, formatted
               | the same way, with the same info.
               | 
               | But if it works for you, more power to you!
        
           | lwhi wrote:
           | >Because from my experience, typing code is the fastest part
           | of my job. I don't really have a problem typing. I spend most
           | of my time thinking about the problem, how to solve it, and
           | considering ramifications of my decisions before ever putting
           | code in the ID
           | 
           | So very true.
           | 
           | [1] Understanding the problem > [2] thinking about all
           | possible solutions > [3] working out which solution fits best
           | > [4] working out which implementations are possible > [5]
           | working out the most suitable implementation
           | 
           | ... and finally, [6] implementing via code.
        
           | ph0rque wrote:
           | > I spend most of my time thinking about the problem, how to
           | solve it...
           | 
           | A few years ago, I got a small but painful cut on my
           | fingertip. I thought I would have a hard time on the job as a
           | dev. To my surprise, I realized I spend 90-95% of my time
           | thinking, and only 5-10% of the time typing. It turned out to
           | be almost a non-issue.
        
           | shados wrote:
           | > I don't really have a problem typing.
           | 
           | Im absolutely with you and want to upvote that part of the
           | comment x100. Unfortunately it's often considered a fairly
           | spicy opinions.
           | 
           | Entire frameworks (Rails) are built around the idea of typing
           | as little as possible. Others can't even be mentioned without
           | the topic of boilerplate/keystroke count causing a flame war
           | (Redux).
           | 
           | A lot of engineers equate their value with the amount of
           | lines they can pump out, so there's definitely a demand for
           | tools like these.
           | 
           | There's also some legitimate stuff. There's a lot of very
           | silly thing I have to google every time I do because I have a
           | bad memory. It saves the step of googling. In a way, it was
           | the same debate around autocomplete at the very beginning,
           | but pushed to the next level. Autocomplete turned out to be a
           | very good thing (even though new languages and tools keep
           | coming out without it).
        
             | theshadowknows wrote:
             | I never commit something that I can easily google (with a
             | high quality solution) to memory
        
               | [deleted]
        
           | amluto wrote:
           | As the owner of a fairly normal American address that is
           | either corrupted by the UPS address validation service, this
           | is a good time to remind everyone: accept the address that
           | your customer enters. If you offer a service to try to
           | improve your customer's address, keep in mind that it's a
           | value added service, it may be wrong, and you MUST test the
           | flow in which your customer tells your service to accept the
           | address as entered. And maybe even collect examples in which
           | the address change is accepted to make sure it does something
           | useful.
           | 
           | Vendors have lost sales to me because they were too
           | incompetent to allow me to ship things to my actual address.
           | Oops.
           | 
           | P.S. for the US, you need to offer at least two lines for the
           | address part. And you need to accept really weird things that
           | don't seem to parse at all. I know people with addresses that
           | have a PO Box number and a PMB number _in the same address_.
           | Lose one and your mail gets lost.
           | 
           | P.P.S. If you offer discounted shipping using something like
           | SurePost, make sure you let your customers pay a bit extra to
           | use a real carrier. There are addresses that are USPS-only
           | and there are addresses that work for all carriers except
           | USPS (and SurePost, etc). Let your customer tell you how to
           | ship to them. Do not second-guess your customer.
        
           | JamesAdir wrote:
           | Isn't address storage and validation a solved problem? Why is
           | it so complicated?
        
             | mgsouth wrote:
             | Ex:
             | 
             | 412 1/2 E E NE
             | 
             | 412 1/2 A E
             | 
             | 1E MAIN
             | 
             | 1 E MAIN
             | 
             | FOO & BAR
             | 
             | 123 ADAM WEST RD
             | 
             | 123 ADAM WEST
             | 
             | 123 EAST WEST
        
             | ezfe wrote:
             | You are right that USPS maintains a database of canonical
             | delivery points. However, it's inevitable this database
             | might not be correct or up to date.
             | 
             | If you don't want to validate, then yes addresses are just
             | a series of text fields. However, mapping them to that
             | delivery point is where the problems arise.
        
         | gameswithgo wrote:
         | post process with some language aware heuristics maybe
        
       | amelius wrote:
       | Co-pilot fixes the wrong problem.
       | 
       | It should be a tool capable of one-shot learning.
       | 
       | I.e., I'm in the middle of a refactoring operation and have to do
       | lots of repetitive work; the tool should help me by understanding
       | what I'm trying to do after I give it 1 example.
        
       | marcodiego wrote:
       | Now, consider quake is GPL'ed. Any proprietary software using
       | such code will have to bow to the license terms.
        
       | anyonecancode wrote:
       | I think copilot is solving the wrong problem. A future of
       | programming where we're higher up the abstraction tree is
       | absolutely something I want to see. I am taking advantage of that
       | right now -- I'm a decently good programmer, in the sense that I
       | can write useful, robust, reliable software, but I'm pretty high
       | up the stack, working in languages like Java or even higher up
       | the stack that free me from worrying about the fine details of
       | memory allocation or the particular architecture of the hardware
       | my code is running on.
       | 
       | Copilot is NOT a shift up the abstraction tree. Over the last few
       | years, though, I've realized the the concept of typing is. Typed
       | programming is becoming more popular and prominent beyond just
       | traditional "typed" languages -- see TypeScript in JS land,
       | Sorbet in Ruby, type hinting in Python, etc. This is where I can
       | see the future of programming being realized. An expressive type
       | system lets you encode valid data and even valid logic so that
       | the "building blocks" of your program are now bigger and more
       | abstract and reliable. Declarative "parse don't validate"[1] is
       | where we're eventually headed, IMO.
       | 
       | An AI that can help us to both _create_ new, useful types, and
       | then help us _choose_ the best type, would be super helpful. I
       | believe that's beyond the current abilities of AI, but can
       | imagine that in the future. And that would be amazing, as it
       | would then truly be moving us up the abstraction tree in the same
       | way that, for instance, garbage collection has done.
       | 
       | [1] https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-
       | va...
        
         | shadowgovt wrote:
         | A taller abstraction tree makes tradeoffs of specialization:
         | the deeper the abstractions, the more one has to understand
         | when the abstractions break or when one chooses to use them in
         | novel ways.
         | 
         | This is something I'm interested in regarding this approach...
         | When it works as intended, it's basically shortening the loop
         | in the dev's brain from idea to code-on-screen _without_ adding
         | an abstraction layer that someone has to understand in the
         | future to interpret the code. The result is lower density, so
         | it might take longer to read... Except what we know about
         | linguistics suggests there 's a balance between density and
         | redundancy for interpreting information (i.e. the bottleneck
         | may not be consuming characters, but fitting the consumed data
         | into a usable mental model).
         | 
         | I think the jury's out on whether something like this or the
         | approach of dozens of DSLs and problem-domain-shifting
         | abstractions will ultimately result in either more robust or
         | more quickly-written code.
         | 
         | But on the topic of types, I'm right there with you, and I
         | think a copilot for a dense type forest (i.e. something that
         | sees you writing a {name: string; address: string} struct and
         | says "Do you want to use MailerInfo here?") would be pretty
         | snazzy.
        
         | krick wrote:
         | Yeah, but generating tons of stupid verbose code that nobody
         | will be able to read and understand is more fun. Also, your
         | superiors will be sure you are a valuable worker if you write
         | more code.
        
       | mkl95 wrote:
       | Copilot is one of the worst ideas that have made it to production
       | in recent years. I predict it will be quite successful
       | considering Microsoft's track record.
        
       | dempsey wrote:
       | I've always wondered this about the realistic photo generators.
       | How do we know they're generating new faces and not just
       | regurgitating ingested faces?
        
       | antpls wrote:
       | One has to admit, Copilot raises many questions regarding global
       | code quality, reviewing processes and copyright. It's a marketing
       | success.
        
         | qayxc wrote:
         | Honestly, I see this exact issue as the main accomplishment of
         | Copilot. It shows that the black-box machines are to be
         | considered harmful and are incompatible with the current
         | intellectual property and privacy frameworks.
         | 
         | This issue goes way beyond just code - imagine GPT-like systems
         | being used in medical diagnosis and results can suddenly depend
         | on the date of the CT-scan or the name of patient, because the
         | black-box simply regurgitates training data...
        
       | [deleted]
        
       | nightowl_games wrote:
       | Almost feels like a developer cultural thing to hate on something
       | like this. If you dont like it, dont use it. If you dont want
       | your team using it, become senior and then set the rules.
       | 
       | Kinda seems like maybe there's some level of insecurity at play
       | here in the criticism. Like a "I coulda came up with that but its
       | a bad idea" type of hater philosophy.
        
       | ChrisMarshallNY wrote:
       | I've always assumed that we would eventually have a low-code, or
       | no-code junior dev replacement, and was wondering if this was it.
       | GH and MS actually have _[Ed. had?]_ some cred for this kind of
       | thing.
       | 
       | Nope. Game over. Play again?
        
         | marcosdumay wrote:
         | Most low-code and no-code platforms go for junior dev
         | empowerment, and senior dev replacement. This one also seems to
         | be aimed at empowering juniors, but looks like it missed the
         | senior replacement by miles.
        
           | [deleted]
        
         | jgilias wrote:
         | Copying GPLed code as your own and passing it under an MIT
         | license is not too far fetched of a thing for a junior dev to
         | do.
         | 
         | Jokes aside, to have a proper junior dev replacement you need
         | something that is able to learn and grow to eventually become a
         | senior dev, an architect, or a CTO. That is the most important
         | value of a junior dev. Not the ability to produce subpar code.
        
           | ChrisMarshallNY wrote:
           | Depends on who you ask.
           | 
           | I think a lot of modern software development shops, these
           | days, exist only to make their founder[s] as rich as
           | possible, as quickly as possible.
           | 
           | If they are willing to commit their entire future to a
           | lowest-bid outsourcing shop, then I don't think they are too
           | concerned about playing the long game.
           | 
           | Also, the software development industry, as an aggregate, has
           | established a pervasive culture, based around developers
           | staying at companies for 18-month stints. I don't think many
           | companies feel it's to their advantage to incubate people who
           | will bail out, as soon as they feel they have greener
           | pastures, elsewhere.
        
       | abeppu wrote:
       | I may be over-reading, but I think this kind of example not only
       | demonstrates the pragmatic legal issues, but also the fundamental
       | weaknesses of a solely text-oriented approach to suggesting code.
       | It doesn't really seem to have a representation of the problem
       | being solved, or the relationship between things it generates and
       | such a goal. This is not surprising in a tool which claims to
       | work at least a little for almost all languages (i.e. which isn't
       | built around any firm concept of the language's semantics).
       | 
       | I'd be much more excited by (and less unnerved by) a tool which
       | brought program synthesis into our IDEs, with at least a partial
       | description of intended behavior, especially if searching within
       | larger program spaces could be improved with ML. E.g. here's an
       | academic tool from last year which I would love to see
       | productionized. https://www.youtube.com/watch?v=QF9KtSwtiQQ
        
         | computerex wrote:
         | I think it's pretty clear that program synthesis good enough to
         | replace programmers requires AGI.
         | 
         | This solely text based approach is simply "easy" to do, and
         | that's why we see it. I think it's cool and results are
         | intriguing but the approach is fundamentally weak and IMO
         | breakthroughs are needed to truly solve the problem of program
         | synthesis.
        
         | whimsicalism wrote:
         | > fundamental weaknesses of a solely text-oriented approach to
         | suggesting code.
         | 
         | I don't think it is clear that such "fundamental weaknesses"
         | exist. A text-based approach can get you incredibly far.
        
           | abeppu wrote:
           | I mean, the cases where it tries to assign copyright to
           | another person in a different year highlights that context
           | other than the other text in the file is semantically
           | extremely important, and not considered by this approach.
           | Merely generating text which looks appropriate to the model
           | given surrounding text is ... misguided?
           | 
           | If you think about it, program synthesis is one of the few
           | problems in which the system can have a perfectly faithful
           | model dynamics of the problem domain. It can run any
           | candidate it generates. It can examine the program graph. It
           | can look at what parts of the environment were changed. To
           | leave all that on the table in favor of blurting out text
           | that seems to go with other text is like the toddler who
           | knows that "five" comes after "four", but who cannot yet
           | point to the pile of four candies. You gotta know the
           | referents, not just the symbols. No one wants a half-broken
           | Chinese Room.
        
             | whimsicalism wrote:
             | > generating text which looks appropriate to the model
             | given surrounding text is ... misguided?
             | 
             | Agreed - it represents a failure to adequately
             | model/understand the task, but I don't think it is a
             | "fundamental weakness" of text-based 'Chinese room'
             | approaches.
             | 
             | > You gotta know the referents, not just the symbols. No
             | one wants a half-broken Chinese Room.
             | 
             | "Knowing the referents" is not at all clearly defined. It's
             | totally possible that, under the constraint of optimizing
             | for next-word prediction, the model could develop an
             | understanding of what the referents are.
             | 
             | You can't underestimate the level of complex behavior
             | emerging from a big enough system under optimization. After
             | all, all the crazy stuff we do - coding, art, etc. is
             | produced by a system under evolutionary optimization
             | pressure to make more of itself.
        
               | abeppu wrote:
               | > "Knowing the referents" is not at all clearly defined.
               | It's totally possible that, under the constraint of
               | optimizing for next-word prediction, the model could
               | develop an understanding of what the referents are.
               | 
               | Well, in this case, it would have been good to understand
               | that "V. Petkov" is a person unrelated to the project
               | being written, and that "2015" is a year and not the one
               | we're currently in. Sometimes the referent will be a
               | method defined in an external library, which perhaps has
               | a signature, and constraints about inputs, or properties
               | which apply to return values.
               | 
               | > You can't underestimate the level of complex behavior
               | emerging from a big enough system under optimization.
               | After all, all the crazy stuff we do - coding, art, etc.
               | is produced by a system under evolutionary optimization
               | pressure to make more of itself.
               | 
               | I think this can verge into a kind of magical thinking.
               | Yes, humans also look like neural nets, and we might even
               | be optimizing for something. But we learn to program (and
               | we do our best job programming) by having a goal for
               | program behavior, and we use interactive access to try to
               | run something, get an error, set a break point, try
               | again, etc. I challenge anyone to try to learn to "code"
               | by never being given any specific tasks, never
               | interacting with docs about the language, an interpreter,
               | a compiler, etc, but merely to try to fill in the blank
               | in paper code snippets. You might learn to fill in some
               | blanks. I highly doubt you would learn to code.
               | 
               | This is totally a case where the textual representation
               | of programs is easier to get and train against, and that
               | tail is being allowed to wag the dog to frame both the
               | problem and the product.
               | 
               | None of this is to say that high-bandwidth DNN approaches
               | don't have a place here -- but I think we should be
               | looking at language-specific models where the DNN
               | receives information about context (including some
               | partial description of behavior) and outputs of the DNN
               | are something like the weights in a PCFG that is used in
               | the program search.
        
       | guhayun wrote:
       | Copilot NEEDs to be trained on licensed code,so that it doesn't
       | produce them
        
       | _tom_ wrote:
       | I'm reminded of the old saying:
       | 
       | The best person to have on your team is a productive, high-
       | quality coder.
       | 
       | The worst is a productive, low-quality coder.
       | 
       | Copilot looks like it would give us more of the latter.
        
       | ezoe wrote:
       | Similar story.
       | 
       | He tried to write Quine in Ruby, ended up conjuring up a
       | copyright claim comment and fake licensing term.
       | https://twitter.com/mametter/status/1410459840309125121
        
       | gok wrote:
       | Quake and GitHub are both owned by Microsoft now, perhaps we can
       | assume this is relicense?
        
         | Jyaif wrote:
         | Wow, Quake _is_ owned by Microsoft. This is mind blowing, and a
         | little sad.
        
           | josefx wrote:
           | It belongs to id software -> ZeniMax -> Xbox Game Studios ->
           | Microsoft.
        
         | johndough wrote:
         | Is it possible that Copilot just put Quake's source code into
         | the public domain?
         | 
         | From the Copilot FAQ:                   Who owns the code
         | GitHub Copilot helps me write?              GitHub Copilot is a
         | tool, like a compiler or a pen.         The suggestions GitHub
         | Copilot generates, and the code you write with its help, belong
         | to you, and you are responsible for it.         We recommend
         | that you carefully test, review, and vet the code, as you would
         | with any code you write yourself.
         | 
         | Copilot can probably recite most of Quake's source code and
         | according to the FAQ, the output of Copilot belongs to the
         | user.
         | 
         | I think a point where this argumentation might fail is that
         | Quake's source code does not belong to Github directly, but
         | instead both Github and Quake belong to Microsoft. However, I
         | am not a lawyer, so I might be wrong.
        
       | [deleted]
        
       | sdevonoes wrote:
       | Not a problem. Just don't use Copilot :)
        
       | hi41 wrote:
       | I saw the gif on Twitter. Sorry, I am not able to understand what
       | is going on. Is copilot a character in the Quake game?
        
         | dpassens wrote:
         | Copilot seems to be an AI tool to generate code for you[0]. In
         | the gif, it's copying code from Quake, which is GPLv2 or later.
         | If copying GPLed code wasn't bad enough, it then adds a MIT-
         | like license header.
         | 
         | [0] https://copilot.github.com/
        
       | yk wrote:
       | So, what would happen if I train a neural network to recreate
       | Disney movies?
        
         | avaldes wrote:
         | Isn't already?
        
       | oiu45hunegn wrote:
       | This reminds me of an issue that came up when I was working with
       | a intelligence agency, training machine translation.
       | 
       | If you think about language in general, individual words aren't
       | very sensitive. The word for bomb in any language is public
       | knowledge. But when you start getting to jargony phrases, some
       | might be unique to an organization. And if you're training your
       | MT on translated documents surreptitiously intercepted from West
       | Nordistan's nuclear program, and make your MT model public, the
       | West Nordistanis might notice - "hey, this accurately translates
       | our non-public documents that contain rather novel phrases ... I
       | think someone's been listening to us!"
        
       | dredmorbius wrote:
       | Backstory?
       | 
       | WTF is "Copilot"?
        
         | gregsadetsky wrote:
         | It's a new product launched by GitHub in association with
         | OpenAI
         | 
         | https://news.ycombinator.com/item?id=27676266
        
         | loloquwowndueo wrote:
         | "Your AI pair programmer" - auto-completes entire functions
         | while you're coding. https://copilot.github.com/
        
           | dredmorbius wrote:
           | Thanks.
        
       | klohto wrote:
       | I'm really dumbfounded by the Copilot team decision to not
       | exclude GPL licensed code.
       | 
       | Why was this direction chosen? Is the inclusion of GPL really
       | worth the risk and potential Google v. Oracle lawsuit? I'd like
       | to know the reasoning.
        
         | throwaway287391 wrote:
         | Isn't it entirely possible that they _did_ exclude GPL licensed
         | code, but somebody somewhere has violated copyright and copy-
         | pasted that snippet into non-GPL-licensed code that they
         | trained on?
         | 
         | They could try to trace every single code snippet they train on
         | to its "true source" and use the license for that, but that's
         | not very well-defined, and is a lot harder, and it's never
         | going to be 100%.
        
           | another-dave wrote:
           | Which raises another question: ideally Copilot wouldn't be
           | trained on "somebody somwhere", but is that happening?
           | 
           | To use the old trope -- if the majority of programmers can't
           | implement Fizzbuzz, but they do have a Github profile, are
           | they being included too?
           | 
           | Hopefully there's some quality bar for the training set, i.e.
           | some subset of "good" code (e.g. release candidate tags from
           | fairly established OSS tools/frameworks in different
           | languages) rather than any old code on the internet.
        
           | ttt0 wrote:
           | Nope. They did include GPL code.
           | 
           | > Once, GitHub Copilot suggested starting an empty file with
           | something it had even seen more than a whopping 700,000
           | different times during training -- that was the GNU General
           | Public License.
           | 
           | https://docs.github.com/en/github/copilot/research-
           | recitatio...
        
             | pessimizer wrote:
             | Looks like Copilot is smart enough to understand its own
             | licensing situation. It should continue to suggest this for
             | any empty file.
        
         | goodpoint wrote:
         | Apache / MIT / BSD all have restrictions e.g. attribution
         | clause.
         | 
         | Excluding GPL does not solve the problem.
        
         | Anon1096 wrote:
         | Why would excluding GPL'd code be enough to not violate
         | licenses? I don't understand why people think MIT or other
         | licenses are free for alls to take code as they wish. The MIT
         | license includes an attribution clause. And, as the linked
         | video shows, Copilot is more than happy to take its code and
         | put your pet license and copyright notice on instead. Isn't
         | that equally as infringing as stealing GPL code? The idea of
         | mining GitHub for training data was doomed from the start
         | copyright-wise, as there's so much code that's misattributed,
         | wrongly-licensed, or unlicensed.
        
           | NavinF wrote:
           | Has anyone ever been sued IRL for using MIT/Apache/... code?
           | Or are we stuck in imaginary land where this is something to
           | be worried about?
           | 
           | Btw the GPLv2 death penalty is rather unique and I don't
           | think anyone will deny that including GPL code in proprietary
           | code is a hell of a lot worse in every way (liability,
           | ethically, etc) than including permissively licenced code and
           | forgetting to attribute it
        
           | ghaff wrote:
           | At some level though, this suggests that the only way to be
           | safe if you're writing a program (outside of a Copilot
           | context) is probably simply not to look at GitHub (or maybe
           | Stack Overflow and other code sources) except for, perhaps,
           | using properly attributed entire functions. If you take a
           | couple lines of code and tweak it a bit are you now required
           | to attach copyright attribution? IANAL, but I'm guessing not.
        
           | aj3 wrote:
           | Copilot is a tool. If you take copilot's suggestions
           | uncritically and push them to Github, - that's on you.
        
             | croes wrote:
             | Yeah, because I always check the code of my programming
             | partner for license violations.
             | 
             | That's more Trainee than Copilot.
        
               | aj3 wrote:
               | If you use it as a programming partner it will simply
               | autofill whatever you're writing line-by-line. You're not
               | forced to use code completion at a whole-function level
               | and it's not even the suggested use-case.
        
             | GhostVII wrote:
             | Sure but if you have to audit every suggestion to see if it
             | violates copyright laws that's not a particularly useful
             | tool.
        
               | aj3 wrote:
               | Depends. If you find useful code on Github, Stack
               | Overflow or anywhere else in the internet, you still need
               | to check whether it is suitable with your licensing or
               | not.
        
               | TeMPOraL wrote:
               | If you find useful code on Github or StackOverflow, you
               | can check for the license directly there, or you can try
               | to find where it was copied from, and look for a license
               | there.
               | 
               | Copilot isn't copying, it's regurgitating patterns from
               | its training dataset. The result may be subject to a
               | license you don't know about, but modified enough that
               | you won't find the original source. The result can be a
               | _blend_ of multiple snippets with varying licenses. And
               | there 's no way to extract attribution from Copilot - DNN
               | models can give you an output for your input, they can't
               | tell you which exact parts of the training dataset were
               | used to generate that output.
        
               | FemmeAndroid wrote:
               | But Copilot won't accurately tell you if it's directly
               | copying code, and if so what the license is. If it
               | provides MIT licensed code that I then need to include,
               | how do I know that? Do I need to search for each set of
               | lines of code it provides on GitHub?
               | 
               | When a person gets code from another source on the
               | internet, they generally know where the code has come
               | from.
        
               | aj3 wrote:
               | In a real world scenario you wouldn't be mindlessly
               | pressing Tab right after linebreak and accepting the
               | first suggestion that comes your way. While entertaining,
               | nobody gets paid to do that.
               | 
               | What you get paid is to write your own code. When you
               | write your own code, generally you think first and then
               | type. Well, with Copilot you think first and then start
               | typing a few symbols before seeing automatic suggestions.
               | If they are right, you accept changes and if they happen
               | to be similar to any other code out there, you deal with
               | it exactly the same as if you typed those lines yourself.
        
               | user-the-name wrote:
               | But it is not the same as if you typed it yourself.
               | 
               | If you happen to type code that is similar to copyright
               | code, that is generally considered legally OK.
               | 
               | If you copypaste copyrighted code, that is not legally
               | OK.
               | 
               | If you accept that same code from an autocomplete tool,
               | that can easily be seen as equivalent to the latter case
               | rather than the former.
        
             | user-the-name wrote:
             | Then name a usage of the tool that is legally sound. I can
             | not think of one.
        
               | aj3 wrote:
               | Code completion that can suggest the whole line instead
               | of a single word (e.g. often it guesses function
               | parameters and various math operations when you haven't
               | even typed function name yet).
        
           | summerlight wrote:
           | At least that will reduce the chance of license violation as
           | well as make a good legal argument for any uncovered
           | violations as "unintentional" incidents.
        
       | [deleted]
        
       | LeicaLatte wrote:
       | Curious if Microsoft is training Co-Pilot on my private
       | repositories.
        
       | bencollier49 wrote:
       | This does make me wonder if this is susceptible to the same form
       | of trolling as that MS AI got. Commit a load of grossly offensive
       | material to multiple repos, and wait for Copilot to start
       | parroting it. I think they're going to need some human
       | moderation.
        
         | lawl wrote:
         | Way better. It's susceptible to copyright trolling.
         | 
         | Put up repos with snippets for things people might commonly
         | write. Preferably use javascript so you can easily "prove" it.
         | Write a crawler that crawls and parses JS files to search for
         | matching stuff in the AST. Now go full patent troll, eh, i mean
         | copyright troll.
        
           | handrous wrote:
           | 1) Write a project heavily using Copilot (hell, automate it
           | and write thousands of them, why not?)
           | 
           | 2) AGPL all that code.
           | 
           | 3) Search for large chunks of code very similar to yours, but
           | written after yours, licensed more liberally than AGPL.
           | Ideally in libraries used by major companies.
           | 
           | 4) Point the offenders to your repos and offer a "convenient"
           | paid dual-license to make the offenders' code legal for
           | closed-source use, so they don't have to open source their
           | entire product.
           | 
           | 5) Profit?
        
             | armatav wrote:
             | 6) Arms race with someone who trained an obfuscation
             | version that goes through your AGPL code and tweaks it to
             | not be in violation.
        
               | SSLy wrote:
               | I love living in cyberpunk already.
        
         | gruez wrote:
         | Offensive code is the least of my worries. What about
         | vulnerable/exploitable code?
        
           | tjpnz wrote:
           | Given that code is easier to write than it is to read this
           | one is troubling.
           | 
           | I certainly wouldn't want to be using this with languages
           | like PHP (or even C for that matter) with all the decades of
           | problematic code examples out there for the AI to learn from.
        
           | macNchz wrote:
           | This was my first thought when reading about Copilot...it
           | feels almost certain that someone will try poisoning the
           | training data.
           | 
           | Hard to say how straightforward it'd be to get it to produce
           | consistently vulnerable suggestions that make it into
           | production code, but I imagine an attacker with some
           | resources could fork a ton of popular projects and introduce
           | subtle bugs. The sentiment analysis example on the Copilot
           | landing page jumped out to me...it suggested a web API and
           | wrote the code to send your text there. Step one towards
           | exfiltrating secrets!
           | 
           | Never mind the potential for plain old spam: won't it be fun
           | when growth hackers have figured out how to game the system
           | and Copilot is constantly suggesting using their crappy,
           | expensive APIs for simple things!? Given the state of Google
           | results these days, this feels like an inevitability.
        
             | joe_the_user wrote:
             | Targeted attacks to elicit output only at a give context
             | are generally possible with AIs. And here, writing an
             | implementation of a difficult and vulnerable process seems
             | easy. Bad implementations of various hard things become
             | common 'cause people cut and paste the code without looking
             | closely since they don't understand it anyway.
             | 
             | //Implement eliptic cryptography below
             | 
             | //Sanitize input for SQL call below
             | 
             | Etc.
        
           | bencollier49 wrote:
           | Yep, trivial to implement as an attack.
        
           | guhayun wrote:
           | Just ask it to prioritize safety
        
           | littlestymaar wrote:
           | 1- re-upload all the shell script you can find, after having
           | inserted `rm -rf --no-preserve-root /` every other line
           | 
           | 2- ...
           | 
           | 3- profit
        
         | raffraffraff wrote:
         | Perhaps they think that any code that passed a review and got
         | merged = human moderated
        
         | NullPrefix wrote:
         | Coding with Adolf?
        
           | heavyset_go wrote:
           | Jojo Rabbit except Adolf is in the cloud and not in a kid's
           | imagination.
        
             | mhh__ wrote:
             | YC submission when?
        
       | tyingq wrote:
       | Just the MP4, since it's hard to read in the smaller size:
       | https://video.twimg.com/tweet_video/E5R5lsfXoAQDRkE.mp4
        
       | hawski wrote:
       | Copilot may do more to move open source projects out of GitHub
       | than the message that Microsoft is the buyer. Now you can host
       | the code on GitHub to get your license violated, or DMCA-ed in a
       | long run, when your code will become a part of some big
       | proprietary project. At least it makes me think about my choice
       | for code hosting more then whatever happened before.
        
       | woah wrote:
       | It looks like the author of the linked tweet intended for it to
       | reproduce the Quake code, by using the exact same function name
       | and comment. Whatever the merits of CoPilot, in this case the
       | human intended to write the quake function into their file, and
       | put the wrong license on it.
        
       | king_magic wrote:
       | Yep, Copilot is insanely poorly thought out. Astonishing they'd
       | release something as half-baked as this.
        
       | [deleted]
        
       | toss1 wrote:
       | License Laundering
       | 
       | Like Money Laundering for cash equivalents, or Trust-Washing for
       | disinformation, but for hijacking software IP.
       | 
       | It might not be the intended use case, but that winds up being
       | the practical result.
       | 
       | (on a related note, it would make me want to fun GPT-* output
       | through plagiarism filters, but maybe they already do that before
       | outputting?)
        
       | rasz wrote:
       | "0.1% of the time" indeed
        
       | unknownOrigin wrote:
       | I'm honestly kinda amazed this as upvoted here as it is.
       | Typically anything ML-related is upvoted to the top positions and
       | any dissent harshly ridiculed. Anyways... it appears those who
       | thought about this as if it was a glorified code search engine
       | were close to being right.
        
         | qayxc wrote:
         | I still don't think it's _just_ a glorified code search engine.
         | 
         | Context-sensitive data retrieval is undoubtedly a part of it,
         | though and the question is how big and relevant is that part
         | and what are the consequences?
         | 
         | To me the biggest issue is that it's impossible to tell whether
         | the suggestions are verbatim reproductions of training material
         | and thus problematic.
         | 
         | It goes to show that this tool and basically every tool relying
         | on the same or similar technology must now be assumed to do
         | this and thus any code suggestion must be regarded plagiarism
         | until proven otherwise. As a consequence such tools are now
         | off-limits for commercial or open source development...
        
       | coolspot wrote:
       | Time to write a GPL-licensed Win32 and Win64 -compatible OS with
       | the help of CoPilot...
        
       | fencepost wrote:
       | So what I'm reading here is that "Tay for code" is maybe going to
       | need to be rethought and perhaps trained differently?
        
       | 0-_-0 wrote:
       | This is a very famous function [0] and likely appears multiple
       | times in the training set (Google gives 40 hits for GitHub),
       | which makes it more likely to be memorized by the network.
       | 
       | [0]:
       | https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overv...
        
         | 0-_-0 wrote:
         | It's worth keeping in mind that what a neural network like this
         | (just like GPT3) is doing is generating the most probable
         | continuation based on the training dataset. Not the _best_
         | continuation (whatever that means), simply the most likely one.
         | If the training dataset has mostly bad code, the most likely
         | continuation is likely to be bad as well. I think this is still
         | valuable, you just have to think before accepting a suggestion
         | (just like you have to think before writing code from scratch
         | or copying something from Stack Overflow).
        
           | abecedarius wrote:
           | > the most probable continuation based on the training
           | dataset
           | 
           | This is not wrong, but it's easy to misread it as implying
           | little more than a glorified Markov model. If it's like
           | https://www.gwern.net/GPT-3 then it's already significantly
           | cleverer, and so you should expect to sometimes get the kind
           | of less-blatant derivation that companies aim to avoid using
           | a cleanroom process or otherwise forbidding engineers from
           | reading particular sources.
        
           | dematz wrote:
           | I have no idea how this or GPT3 works or how to evaluate
           | them, but couldn't you argue that it's working as it should?
           | You tell copilot to write a fast inverse square root, it
           | gives you the super famous fast inverse square root. It'd be
           | weird and bad if this _didn 't_ happen.
           | 
           | As far as licenses go, idk. Presumably it could delete
           | associated comments and change variable names or otherwise
           | obscure where it's taking code from. Maybe this part is
           | shady.
        
             | tirpen wrote:
             | Maybe I could build a robot that goes out in the city and
             | steal cars.
             | 
             | As far as licenses go, idk. Presumably it could delete the
             | number plate and repaint the car or otherwise obscure where
             | it's taking the car from. Maybe this part is shady.
             | 
             | Maybe.
        
             | 0-_-0 wrote:
             | > couldn't you argue that it's working as it should?
             | 
             | Let's say that it's doing exactly what it was trained to
             | do.
        
           | bee_rider wrote:
           | In particular, fast approximate inverse square root is an x86
           | instruction, and not a super new one. I'd be surprised if it
           | wasn't in every major instruction set.
           | 
           | This is an interesting issue. I suspect training on datasets
           | from places like Github would be likely to provide lots of
           | "this is a neat idea I saw in a blog post about how they did
           | things in the 90's" codes.
        
       | LeanderK wrote:
       | I think the problem might be in the training data. Famous code
       | examples are probably copied a lot and therefore appear multiple
       | times in the training data, prompting the neural network to
       | memorise it completely.
        
         | avian wrote:
         | Famous code examples are also much more likely to be noticed.
         | For all I know, the thing might be spewing random GPL'd code
         | from the long tail of GitHub all the time and nobody notices
         | because it was written by some random guy and not John Carmack.
        
           | mkl wrote:
           | Carmack didn't write this:
           | https://en.wikipedia.org/wiki/Fast_inverse_square_root
        
           | LeanderK wrote:
           | Well, it's sure speculation on my part what the root cause
           | is, but i think OpenAI is already trying to ensure the
           | network generalises. It's just common behaviour for neural
           | network to memorise frequent samples, so I think my guess is
           | quite realistic. I don't think OpenAI would not notice large-
           | scale memorisation in their model. But as long as they don't
           | publish more details it's just guesswork.
           | 
           | Just keep in mind that it's a statistical tool. You can't
           | really formally prove that it won't memorise, but I think
           | with enough work you can get it unlikely enough that it won't
           | matter. It's their first iteration.
        
             | visarga wrote:
             | Hash 10-grams and make a bloom filter. It will not generate
             | more than 10 GPLed tokens from a source.
        
         | deckard1 wrote:
         | Also the Pareto principle. 80% of code is shit that you _don
         | 't_ want to copy. The vast majority of github is awful hacks
         | and insecure code that should not be touched with a ten foot
         | pole.
        
         | cblconfederate wrote:
         | Is this function used verbatim in multiple projects? I know
         | it's famous but how often does one use an approximation of
         | inverse sqrt instead of the readily available cpu call in the
         | past 20 years
        
       | ocdtrekkie wrote:
       | Probably an excellent reminder that both Google and Microsoft
       | decided to use your private emails for a training set to create
       | Smart Reply behavior that can "write emails for you", and they
       | swore up and down there's no way that could ever leak private
       | information.
       | 
       | We need legislation banning companies from ingesting data into AI
       | training sets without explicit permission.
        
         | nebuke wrote:
         | Makes me wonder if github are using private repos in their
         | training data.
        
           | ocdtrekkie wrote:
           | GitHub clearly stated they only used publicly available repos
           | in this project. However, as many people are rightfully
           | pointing out, those projects might still be either closed
           | source or copylefted, and if Copilot regurgitates chunks of
           | those projects, people who use it may be subject to
           | infringement lawsuits in the future.
        
       | grawprog wrote:
       | I'm not surprised to be honest. I've played around with AI
       | dungeon, which also uses GPT-3. It regularly reproduces content
       | directly from its training material, including even comments
       | attached to the stories they trained the ai on.
        
       | Tarucho wrote:
       | Is Copilot aimed at programmers or at non-technical hiring
       | managers?
       | 
       | I mean, it goes right away with the devaluing narrative of
       | programming that is going around from the last couple of years.
       | To the "anyone can code" narrative we are adding "more so, if
       | they have AI assisted Copilot"
        
       | KoftaBob wrote:
       | Seems like this is less of an "AI that intelligently generates
       | code based on context given" and more of a "google search
       | autocomplete for code".
        
       | thinkingemote wrote:
       | From the GPLv2 licensed code:
       | 
       | https://github.com/id-Software/Quake-III-Arena/blob/master/c...
       | 
       | copilot repeats it word for word almost, including comments, and
       | adds an MIT like license up the top
        
         | arksingrad wrote:
         | I guess this confirms John Carmack to be an AI
        
           | OskarS wrote:
           | Apparently Carmack was not the original author, the origin I
           | believe is SGI somewhere in the deep dark 90s.
        
             | Haga wrote:
             | Was a optimization for a fluid simulation originally..
        
           | Fordec wrote:
           | I get why some people were saying it made them a better
           | programmer. Of course it did, it's copy-pasting Carmack code.
        
         | Thomashuet wrote:
         | Actually the indentation of the first comment and the lack of
         | preprocessor show it's not copied from this code directly but
         | from Wikipedia (https://en.wikipedia.org/wiki/Fast_inverse_squa
         | re_root#Overv...) So It could be that the Quake source code is
         | not part of the training set but the Wikipedia version is.
        
           | SamBam wrote:
           | While I strongly doubt they would use Wikipedia as a training
           | set, has anyone done a search of GitHub code to see if other
           | projects have copied-and-pasted that function from Wikipedia
           | into their more-permissive codebases?
        
             | edgyquant wrote:
             | It's GPT though and the GPT models were trained on data
             | from Wikipedia
        
             | ajayyy wrote:
             | It is probably based off GPT-3 with a layer on top trained
             | for programming specifically, like what is done with AI
             | dungeon.
        
               | an_opabinia wrote:
               | Wait until people on the toxic orange site find out what
               | has happened to AI Dungeon.
        
               | SamBam wrote:
               | I'm out of the loop.
        
               | grawprog wrote:
               | https://gitgud.io/AuroraPurgatio/aurorapurgatio
               | 
               | https://www.reddit.com/user/non-taken-name
        
               | zxzax wrote:
               | I don't get it, that seems like standard fare for an
               | R-rated movie? And then it seems like some complained
               | because they decided to start editing it down to a PG-13
               | movie?
        
               | grawprog wrote:
               | Essentially, from my understanding, there was a data leak
               | they never commented on, they instituted a poorly made
               | content filter without saying anything. The filter
               | frequently has false positives and negatives, someone
               | discovered they trained the game using content the filter
               | was designed to block, meaning the ai itself would
               | frequently output filter triggering stuff, more people
               | found out their private unpublished stories were being
               | read by third parties after a job ad and the stories were
               | posted on 4Chan, people recognized stories they wrote
               | that had triggered the filter that were posted, and then
               | they started instituting no warning bans.
               | 
               | I might have missed something, but that's the gist of it.
        
             | armatav wrote:
             | It's pre-trained, partially, on Wikipedia. GPT-2 did this
             | sort of thing all the time: native to the architecture to
             | surface examples from the fine-tuning training set by
             | default.
        
             | bootlooped wrote:
             | Almost 2000 results for one of the comment lines. I'm not
             | going to read through those or check the licenses, but I
             | think it's safe to say that block of code exists in many
             | GitHub code bases, and it's likely many of those have
             | permissive licenses. Given how famous it is (for a block of
             | code) it's not unexpected.
             | 
             | https://github.com/search?q=%22evil+floating+point+bit+leve
             | l...
             | 
             | A question that popped into my head is: if the machine sees
             | the same exact block of code hundreds of times, does that
             | suggest to it that it's more acceptable to regurgitate the
             | entire thing verbatim? Not that this incident is totally
             | 100% ok, but if it was doing this with code that existed in
             | only a single repo that would be much more concerning.
        
               | Animats wrote:
               | _if the machine sees the same exact block of code
               | hundreds of times, does that suggest to it that it 's
               | more acceptable to regurgitate the entire thing
               | verbatim?_
               | 
               | From a copyright standpoint, quite possibly. This is
               | called the "Scenes a faire" doctrine. If there are some
               | things that have to be there in a roughly standard form
               | to do a standard job, that applies.
               | 
               | [1]
               | https://en.wikipedia.org/wiki/Sc%C3%A8nes_%C3%A0_faire
        
         | nojito wrote:
         | It's up to the end user to accept the suggestions.
        
           | user-the-name wrote:
           | And it is completely impossible for the user to do so.
           | 
           | So, the tool is worthless if you want to use it legally.
        
             | nojito wrote:
             | Doubtful.
             | 
             | You can be almost certain it's being widely used or will be
             | widely used shortly.
             | 
             | The conversations around copilot are eerily similar to the
             | conversations around the first autocomplete tools
        
               | gnulinux wrote:
               | It's more like a writer using an autocomplete tool to
               | write the first chapter to their novel.
        
               | caconym_ wrote:
               | As someone who gets paid to write code (nominally) and
               | has also written a few novels, I don't agree with this
               | characterization. From what I've seen of Copilot, it's
               | more like having a text editor generate your next
               | sentence or paragraph^[1]. The idea (as I see it) is that
               | you might use it to generate some prose "boilerplate",
               | e.g. environmental descriptions, and hack up the results
               | until you're satisfied.
               | 
               | It's content generation at a fragmentary level where each
               | "copied" chunk does not form a substantive whole in the
               | greater body of the new work. Even if you were training
               | it on other authors' works rather than just your own, as
               | long as it wasn't copying _distinctive_ sentences
               | wholesale, I think there 's a strong argument for it
               | falling under fair use--if it's even detectable.
               | 
               | On the other hand, if it regurgitated somebody else's
               | paragraph wholesale, I don't think that would be fair
               | use. Somewhere in-between is where it gets fuzzy, and
               | really interesting; it's also where internet commenters
               | seem to prefer flipping over the board and storming out
               | convinced they're _right_ to exploring the issues with a
               | curious and impartial mind. I see way too much unreasoned
               | outrage and hyperbolic misrepresentation of the Copilot
               | tool in these threads, and it 's honestly kind of
               | embarrassing.
               | 
               | As far as this analogy goes, it's worth noting that the
               | structure of a computer program doesn't map onto the
               | structure of a piece of fiction (or any work of prose) in
               | a straightforward way. Since so much of code _is_
               | boilerplate, I would (speculatively, in the copyright law
               | sense) actually give more leeway to Copilot in terms of
               | absolute length of copied chunks than I would for a prose
               | autocompleter. For instance, X program may be licensed
               | under the GPL, but that doesn 't mean X's copyright
               | holder(s) can sue somebody else because their program
               | happened to have an identical expression of some RPC
               | boilerplate or whatever. It would be like me suing
               | another author because their work included some of the
               | same words that mine did.
               | 
               | ^[1] At least one tool like this (using GPT-3) has been
               | posted on HN. At this point in time I wouldn't use it,
               | but I have to admit that it was sort of cool.
        
               | user-the-name wrote:
               | That does not seem like a response to what I just said?
               | 
               | I said that it is impossible for the user to check that
               | the code copilot gives is OK, license-wise, and
               | therefore, they can not be sure that it is legally OK to
               | include in any project.
        
           | freshhawk wrote:
           | And it's up to the end user to evaluate the tool that makes
           | the suggestions.
        
           | flatiron wrote:
           | As someone who does code reviews the thought the developer
           | didn't code the code submitted to be merged never would cross
           | my mind.
        
           | croes wrote:
           | Good luck checking every code line for license violations
        
             | duckmysick wrote:
             | SaaS idea: code linter, but for licenses.
        
               | adrianN wrote:
               | Extend Fossology: https://www.fossology.org/
        
               | SahAssar wrote:
               | That's one of blackduck's offerings:
               | https://www.synopsys.com/software-integrity/open-source-
               | soft...
               | 
               | At a previous job we had a audit from them, it seemed to
               | not be too accurate but probably good enough for
               | companies to cover their asses legally.
        
             | dr_kiszonka wrote:
             | There will be a VSCode extension for that.
        
               | TheDong wrote:
               | It's impossible to automate checking for code license
               | violations.
               | 
               | If you and I write the exact same 10 lines of code, we
               | both have independent and valid copyrights to it. Unlike
               | patents, independent derivation of the same code _is_ a
               | defense for copyright.
               | 
               | If I write 10 lines of code, publish it as GPL (but don't
               | sign a CLA / am not assigning it to an employer), and
               | then re-use it in an MIT codebase, I can do that because
               | I retained copyright, and as the copyright holder I can
               | offer the code under multiple incompatible licenses.
               | 
               | There's no way for a machine to detect independent
               | derivation vs copying, no way for the machine to know who
               | the original copyright holder was in all cases, and
               | whether I have permission from them to use it under
               | another license (i.e. if I email the copyright holder and
               | they say 'yeah, sure, use it under non-gpl', it suddenly
               | becomes legal again)...
               | 
               | It's not a problem computers can solve 100% correctly.
        
               | croes wrote:
               | Same trust issue
        
               | atatatat wrote:
               | It's people for your lawyers to blame, all the way down!
               | 
               | /s
        
               | croes wrote:
               | It's the same problem s with self driving cars, you gets
               | sued. The company that provides the service/car or the
               | the programmer/driver? I think the latter.
        
         | rmorey wrote:
         | This exact code is all over github, >1k hits
         | 
         | https://github.com/search?q=%22i++%3D+%2A+%28+long+%2A+%29+%...
        
           | ajklsdhfniuwehf wrote:
           | that will make a great defense at a copyright court.
           | 
           | "your honor, i would like to plead not guilty, on the basis
           | that i just robbed that bank because i saw that everyone was
           | robbing banks on the next city"
           | 
           | ...on the other hand, that was the exact defense tried for
           | the capitol rioters. So i don't know anything anymore.
        
       | mattowen_uk wrote:
       | With apologies to Martin Niemoller[1]:                 First the
       | automation came for the farmers, and I did not speak out --
       | Because I was not a farmer.            Then the automation came
       | for the factory workers, and I did not speak out --
       | Because I was not a factory worker.            Then the
       | automation came for the accountants, and I did not speak out --
       | Because I was not a accountant.            Then the automation
       | came for me (a programmer) --         and there was no one left
       | to speak for me.
       | 
       | ---
       | 
       | [1] https://en.wikipedia.org/wiki/First_they_came_...
        
         | flatiron wrote:
         | Honestly I've automated a large chunk of my day job. The trick
         | is keeping it secret!
        
           | mattowen_uk wrote:
           | Well... wait until all the programmer salaries crash to
           | minimum wage because Management believe that "CoPilot does
           | most of the work anyway".
        
             | Rooster61 wrote:
             | Then wait for them to realize how brittle the code is when
             | nobody is considering the context into which this code is
             | being foisted. They'll TRIPLE our salaries! :D
        
       | timdaub wrote:
       | I felt the need to write an article about this whole situation
       | too:
       | 
       | "Built on Stolen Data":
       | https://rugpullindex.com/blog#BuiltonStolenData
        
       | GenerocUsername wrote:
       | A closed beta with only a few previews out in the wild has bugs.
       | Unbelievable.
       | 
       | I cannot believe GitHub would do this.
        
       | rozularen wrote:
       | Guess someone had to try it
        
       | mouzogu wrote:
       | I can't even get intellisense to work correctly half the time.
        
       | [deleted]
        
       | swiley wrote:
       | So I guess we can just copy around copyrighted source now? Great!
       | Now we can share all the proprietary driver and DSP code from
       | Qualcomm.
        
         | supernintendo wrote:
         | I wonder when someone will try to use the "it came from
         | Copilot" defense to get away with stealing copyrighted code.
        
       | louthy wrote:
       | This is utterly damning. I have already instructed my team that
       | Copilot can never be used for our projects. Compromising the
       | product because of unknowable license demands isn't acceptable in
       | the professional world of software engineering.
       | 
       | But if we put the licensing to one side for a moment...
       | 
       | 1/ Everything I've seen it generate so far is 'imperative hell'.
       | It is practically a 'boilerplate generator'. That might be useful
       | for pet projects, smaller code bases, or even unit-test writing.
       | But large swathes of application code looking like the examples
       | I've seen so far is hard to manage.
       | 
       | 2/ The boilerplate is what bothers me the most (as someone who
       | believes in the declarative approach to software engineering).
       | The future for programming and programming languages should be an
       | attempt to step up to a higher level of abstraction, that has
       | been historically the way we step up to higher levels of
       | productivity. As applications get larger and code-bases grow
       | significantly we need abstraction, not more boilerplate.
       | 
       | 3/ As someone who develops a functional framework for C# [1], I
       | could see Copilot essentially side-lining my ideas and my
       | approach to writing code in C#. Not just style, but choice of
       | types, etc. I wonder if the fall out of what is Copilot's 'one
       | true way' of generating code was ever considered? It appears to
       | force a style that is at odds with many who are looking for more
       | robust code. At worst it will homogenise code "people who wrote
       | that, also wrote this" - stifling innovation and iterative
       | improvements in the industry.
       | 
       | 4/ Writing code is easy. Reading and understanding code written
       | by another developer is hard. Will we spend most of our time as
       | code-reviewers going forwards? Usually, you can ask the author
       | what their intentions were, or why they think their approach is
       | the correct one. Copilot (as far as I can tell) can't justify its
       | decisions. So, beyond the simple boilerplate generation, will
       | this destroy the art of programming? I can imagine many juniors
       | using this as a crutch, and potentially never understanding the
       | 'why'.
       | 
       | I'm not against productivity tools per se; it's certainly a neat
       | trick, and a very impressive feat of engineering in its own
       | right. I am however dubious that this really adds value to
       | professional code-bases, and actively may decrease code quality
       | over time. Then there's the grey area of licensing, which I feel
       | has been totally brushed to one side.
       | 
       | [1] https://github.com/louthy/language-ext
        
         | ethangk wrote:
         | This is a little off topic, but your framework looks really
         | interesting! How come you opted for building a functional
         | framework in C#, vs using F#? I couldn't see anything in the
         | README about what was specifically frustrating about F#? I ask
         | because we're looking at introducing it at my company.
        
           | louthy wrote:
           | I cofounded a company in 2005, the primary product is a
           | never-ending C# web-application project. As the code-base
           | grew to many millions of lines of code I started to see the
           | very real problems of software engineering in the OO
           | paradigm, and had the _functional programming enlightenment
           | moment_.
           | 
           | We started building some services in F#, but still had a
           | massive amount of C# - and so I wanted the inertia of my team
           | to be in the direction of writing declarative code. There
           | wasn't really anything (outside of LINQ) that did that, so I
           | set about creating something.
           | 
           | We don't write F# any more and find functional C# (along with
           | the brilliant C# tooling) to be very effective for us
           | (although we also now use PureScript and Haskell).
           | 
           | I do have a stock wiki post on the repo for this though [1].
           | You might not be surprised to hear it isn't the first time
           | I've been asked this :)
           | 
           | [1] https://github.com/louthy/language-ext/wiki/%22Why-
           | don't-you...
        
             | ethangk wrote:
             | Ha, it's good to see I'm full of original thoughts.
             | 
             | That post in the wiki sums it up perfectly, much
             | appreciated!
        
         | bostonsre wrote:
         | I'm not sure we should throw the baby out with the bath water
         | here due to the large blurbs it stubs in when when it doesn't
         | have a lot to go on in mostly empty files. It is a preview
         | release. They are working on proper attribution of suggested
         | code and explainability [1]. Having a stochastic parrot that
         | types faster than I do would be useful in a lot of cases.
         | 
         | Yes, better layers of abstraction could make us more productive
         | in the future, but we're not there yet. By all means, don't
         | accept the larger blurbs it proposes, but there is productivity
         | to be gained in the smaller suggestions. If it correctly
         | intuits the exact rest of the line that you were thinking of,
         | it will save time and not make you lose understanding of the
         | program.
         | 
         | In some areas complete understanding and complete code
         | ownership is required but in a lot of places, it's not. If it
         | produces the work of a moderately skilled developer it would be
         | sufficient. I don't remember all code I write as time passes.
         | If it produces work that I would have produced, then I don't
         | see how that's any different that work that was produced by my
         | past self.
         | 
         | It may feel offensive but a lot of the comments against it
         | sound like rage against the machine/industrialization opponents
         | and the arguments sound pretty similar to those made in the
         | past by those that had their jobs automated away. I'm not sure
         | we're all as unique snowflakes as we like to think we are.
         | Sure, there will be some code that requires an absolute master
         | that is outside the capabilities of this tool. But I'd guess
         | there is a massive amount of code that doesn't need that
         | mastery.
         | 
         | [1] https://docs.github.com/en/github/copilot/research-
         | recitatio...
        
           | vlovich123 wrote:
           | I think it depends on how you look at it.
           | 
           | For small snippets that have likely been already written by
           | someone else, this probably works great. For those though,
           | the time savings is probably at most 5-10 min down to 1 or
           | less. The challenge is that that's not where my time goes
           | unless I'm working in an unfamiliar language.
           | 
           | As someone who writes a lot of code quickly, I'm usually
           | bottlenecked by reviews. For more complex changes I'm
           | bottlenecked by understanding the problem and experimenting
           | with solutions (and then reviews, domain-specific tests
           | usually, fixing bugs etc). Writing code isn't like waiting
           | for code to compile since I'm not actually ending up task
           | switching that frequently.
           | 
           | This does sound like a fantastic tool when I'm not familiar
           | with the language although I wonder if it actually generates
           | useful stuff that integrates well as the code gets larger (eg
           | can I say "write an async function that allocates a record
           | handle in the file with ownership that doesn't outlive the
           | open file's lifetime"). I'm sure though that this is what a
           | lot of people are overindexing on. For things like that I
           | expect normal evolution of the product will work well. For
           | things like "cool, understand your snippets but also weight
           | my own codebase higher and understand the specifics of my
           | codebase", I think there's a lot of groundbreaking research
           | that would be required. That is what I see as a true
           | productivity boost - I'd make this 100% required for anyone
           | joining the codebase. The more mentorship can be offloaded,
           | the lower the cost is to growing teams. OSS projects can more
           | easily scale similarly.
        
         | 0xbadcafebee wrote:
         | > programming languages should be an attempt to step up to a
         | higher level of abstraction
         | 
         | Adding abstraction buries complexity. If all you do is keep
         | adding more abstractions, you end up with an overcomplicated,
         | inefficient mess. Which is part of why application sizes are so
         | bloated today. People just keep adding layers, as long as they
         | have room for more of them. Everything gets less efficient and
         | definitely not better.
         | 
         | The right way to design better is to iterate on a core design
         | until it cannot be any simpler. All of the essential complexity
         | of software systems today comes from 40 year old conventions.
         | We need a redesign, not more layers.
         | 
         | One example is version management. Most applications today
         | _can_ implement versioned functions and keep multiple versions
         | in an application, and track dependencies between external
         | applications. Make a simple DAG of the versions and let apps
         | call the versions they were designed against, or express what
         | versions are compatible with what, internally. This would make
         | applications infinitely backwards-compatible.
         | 
         | The functionality exists right now in GNU Libc. You can
         | literally do it today. But rather than do that, we stumble
         | around replacing entire environments of specific versions of
         | applications and dependencies, because we can't seem to move
         | the entire industry forward to new ideas. Redesign is hard,
         | adding layers is easy.
        
           | louthy wrote:
           | > Adding abstraction buries complexity. If all you do is keep
           | adding more abstractions, you end up with an overcomplicated,
           | inefficient mess. Which is part of why application sizes are
           | so bloated today. People just keep adding layers, as long as
           | they have room for more of them. Everything gets less
           | efficient and definitely not better.
           | 
           | Presumably you're writing code in binary then? This is a non-
           | argument, because there's evidence that it's worked.
           | Computers were first programmed with switches and punch
           | cards, then tape, then assembly, then low level languages
           | like C, then memory managed languages etc.
           | 
           | Abstraction works when side-effects are controlled.
           | Composition is what we're after, but we must compose the
           | bigger bits from smaller bits that don't have surprises in.
           | This works well in functional programming, a good example
           | would be monadic composition: monads remove the boilerplate
           | of dealing with asynchrony, value availability, list
           | iteration, state management, environment management, etc.
           | Languages that have first-class support for these tend to
           | have significantly less boilerplate.
           | 
           | The efficiency argument is also off too. Most software
           | engineering teams would trade some efficiency for more
           | reliable and bug free code. At some point (and I would argue
           | we're way past it) programs become too complex for the human
           | brain to comprehend, and that's where bugs come from. That's
           | why we're overdue an abstraction lift.
           | 
           | Tools like Copilot almost tacitly agree, because they're
           | trying to provide a way of turning the abstract into the
           | real, but then all you see is the real, not the abstract.
           | Continuing the assault on our weak and feeble grey matter.
           | 
           | I spent the early part of my career obsessing over
           | performance on crippled architectures (Playstation 3D engine
           | programmer). If I continued to write applications now like I
           | did then, nothing would go out the door and my company
           | wouldn't exist.
           | 
           | Of course there are times when performance matters. But the
           | vast majority of code needs to be correct first, not the most
           | optimal it can be for the architecture.
        
         | mdellavo wrote:
         | Generating code has never been a problem for developers :)
         | 
         | I'd be more interested in a tool that notices patterns and
         | boilerplate. It could offer a chance for generalization,
         | abstraction or use of a common pattern from the codebase. This
         | is of course much harder.
        
         | bohemian99 wrote:
         | My question is would Copilot be useful if you could choose the
         | codebase it would be drawing from? Almost as an internal
         | company tool?
        
           | freshhawk wrote:
           | That would actually be potentially useful, it could do a kind
           | of combination of autocompletion of internal libraries,
           | automatic templates for common patterns and internal
           | style/linting type tasks all in one. Certainly augmenting
           | those other things.
           | 
           | It would be interesting how much code you would need before
           | it was useful (and how good does it have to be to be useful?
           | Does even a small error rate cost so much that it erases
           | other gains, because so many of the potential errors in usage
           | of this type of tool are very subtle?)
        
           | saurik wrote:
           | If you find yourself copying code someone else in your
           | organization wrote rather than abstracting it to a function
           | in a shared library or building a more declarative framework
           | to manage the problem, something horrible has happened.
        
             | maccard wrote:
             | Sometimes boilerplate is unavoidable. As an example, how do
             | you send a GET request with libcurl in C with an
             | authorization header? I can't tell you offhand, but I can
             | tell you the file in my codebase that does have it, because
             | I've duplicated the logic for two separate systems.
        
               | saurik wrote:
               | So you are saying you would rather every project in the
               | world have at least one--if not, thanks to making it
               | easier via Copilot, many--copies of this code rather than
               | one shared library that provides a high-level abstraction
               | for libcurl?... At least for your own code, how did you
               | end up with two copies of duplicated logic rather than a
               | shared library of functionality?
        
               | maccard wrote:
               | > So you are saying you would rather every project in the
               | world have at least one--if not, thanks to making it
               | easier via Copilot, many--copies of this code.
               | 
               | Absolutely not, not at all. I'm suggesting that copying
               | and pasting happens, particularly in the context of a
               | single project.
               | 
               | > At least for your own code, how did you end up with two
               | copies of duplicated logic rather than a shared library
               | of functionality?
               | 
               | At what point is it worth introducing an abstraction
               | rather than copying? Using my libcurl example, you can
               | create an abstraction over the~ 10 lines of
               | initialization, but if you need to change it to a POST,
               | then you're just implemnenting an abstraction over
               | libcurl, which is just silly.
        
               | saurik wrote:
               | If you have 10 lines of repeated code with one line
               | changed to make it GET vs POST, introducing an
               | abstraction isn't "silly": it is simultaneously both
               | ergonomic and advantageous, as if you ever need to add
               | another line of code to that initialization--which
               | totally happens, due to various security extensions you
               | might want to make to what TLS settings you accept, or to
               | tune performance parameters related to connection
               | caching, or to add a header to every request (for any
               | number of reasons from debugging to authentication)--you
               | can do it in one place instead of umpteen number of
               | places. And like... further to the point: I use libcurl
               | as a _fallback_ on Linux, but if you want to correctly
               | support the user 's settings for proxy servers--which are
               | sometimes needed for your requests to work at all--my
               | code is abstracted so I can plug in _entirely different
               | backends_ to libcurl, such as Apple 's CFNetwork. You act
               | like abstraction is somehow a bad thing or a complex
               | thing, when it should absolutely take you less time to
               | wrap duplicated code into a function than to duplicate
               | it.
        
           | tyingq wrote:
           | That sounds interesting, though it still feels like it would
           | need work. Like a way to annotate suggestions with comments,
           | or flag them. Definitive licensing shown for each snippet. A
           | way to mark deprecated code as deprecated to the training
           | algorithm, etc.
        
           | louthy wrote:
           | It would certainly alleviate the license concerns. If it was
           | possible to train it to a level (that produces effective
           | output), then sure.
           | 
           | As a thought experiment, I thought "what would happen if we
           | trained it on our 15 million lines of product code + my
           | language-ext project". It would almost certainly produce
           | something that looks like 'us'.
           | 
           | But:
           | 
           | * It would also trip over a million or so lines of generated
           | code
           | 
           | * And the legacy OO code
           | 
           | * It will 'see' some of the extreme optimisations I've had to
           | built into language-ext to make it performant. Something like
           | the internals of the CHAMP hash-map data-structure [1]. That
           | code is hideously ugly, but it's done for a good reason. I
           | wouldn't want to see optimised code parroted out upfront.
           | Maybe it wouldn't pick up on it, because it hasn't got a
           | consistent shape like the majority of the code? Who knows.
           | 
           | Still, I'd be more willing to allow my team to use it if I
           | could train it myself.
           | 
           | [1] https://github.com/louthy/language-
           | ext/blob/main/LanguageExt...
        
             | carlmr wrote:
             | > legacy OO code
             | 
             | Aside from OO vs FP. A concern with that I'd have is that
             | it would encourage and enforce idiosyncracies in large
             | corporate codebases.
             | 
             | If you've ever worked for a large corporation on their
             | legacy code, you know you don't want any of that to be
             | suggested to colleagues.
             | 
             | This would enforce bad behaviors and make it even harder
             | for fresh developers to argue against it.
        
               | louthy wrote:
               | > This would enforce bad behaviors and make it even
               | harder for fresh developers to argue against it.
               | 
               | I think this is a significant point. It maintains the
               | status quo. We change our guidance to devs every other
               | year or so. New language features become available, old
               | ones die, etc. But we're not rewriting the entire code-
               | base every time, we know if we hit old code, we refactor
               | with the new guidance; but we don't do it for the sake of
               | it, so there's plenty of code that I wouldn't want in a
               | training set (even if I wrote it myself!)
        
         | [deleted]
        
         | goodpoint wrote:
         | 5/ Boilerplate is easy to write but expensive to maintain in
         | large quantities. Proper abstraction/templating requires
         | careful thinking. Copilot encourages the first and discourages
         | the second.
         | 
         | 6/ Copilot learns from the past. It can only favor popularity
         | and familiarity in code patterns over correctness and
         | innovation.
        
         | hallqv wrote:
         | Neural net: "It's all in the training data, stupid."
        
         | reader_mode wrote:
         | >2/ The boilerplate is what bothers me the most (as someone who
         | believes in the declarative approach to software engineering).
         | The future for programming and programming languages should be
         | an attempt to step up to a higher level of abstraction, that
         | has been historically the way we step up to higher levels of
         | productivity. As applications get larger and code-bases grow
         | significantly we need abstraction, not more boilerplate.
         | 
         | Just the other day someone on copilot threads was arguing that
         | this kind of boilerplate optimizes for readability... It's like
         | Java Stockholm syndrome and the old myth of easy to approach =
         | easy to read (how long it took them to introduce var).
         | 
         | I've always viewed code generators as a symptom of language
         | limitations (which is why they were so popular in Java land)
         | that lead to unmaintainable code, this seems like a fancier
         | version of that - with all the same drawbacks.
        
           | 3pt14159 wrote:
           | I'm all for abstracting. I like Rails, for example. That
           | said, it gets _truly_ difficult to add or change stuff at the
           | more abstract layers. For example, adding recursive querying
           | to an existing ORM is _tough_. And on the rare occasion that
           | there is a bug in the abstract layer, debugging that from the
           | normal application code is also tough.
           | 
           | I understand why some corporations prefer dumb boilerplate
           | everywhere for some applications. If there is an outage it's
           | usually easy to fix quickly. Sometimes it's not, if it's a
           | issue in the boilerplate (say, Feb 29 rolls around and all of
           | the boilerplate assumed a 28 day month) that means a huge
           | update all across the system, but that rarely happens in
           | practice.
        
             | reader_mode wrote:
             | I would say ORM is tough with code gen or with
             | metaprogramming because it maps two mismatched paradigms
             | (OOP and relational) and tries to paper over the
             | differences.
             | 
             | I do agree on the debugging aspect - especially in dynamic
             | languages - metaprogramming stack traces can be really hard
             | to follow.
        
           | slumdev wrote:
           | Tools like xsd or T4 (in the .NET ecosystem) are great time-
           | savers, but you would never consider directly modifying the
           | code they generate. You would leave the generated code
           | untouched (in case it ever needed to be generated again) and
           | subclass it to make whatever changes you intend.
           | 
           | I think Copilot is so unfortunate because it's not building
           | abstractions and expecting you to override parts of them.
           | It's acting as an army of monkeys banging out Shakespeare on
           | a typewriter. And the code it generates is going to require
           | an army to maintain.
        
             | hu3 wrote:
             | Linq2Db is a great example of T4 code generation that
             | works. It creates partial classes from database schema.
             | Together with C# I have strongly typed database access.
             | 
             | https://github.com/linq2db/linq2db
        
             | reader_mode wrote:
             | Even there I feel like code generators are just a band aid
             | around the fact that metaprogramming facilities suck. If
             | you would never modify the generated code why generate in
             | the first place. You could argue that stack traces are
             | easier to follow but TBH generated code is rarely pretty in
             | that regard as well.
             | 
             | For example I think F# idea of type providers > code
             | generators.
        
           | ethbr0 wrote:
           | Code generators = out of practice, out of mind
        
         | mimixco wrote:
         | Awesome summary and thanks for trying it for the rest of us!
         | 
         | Copilot sounded terrible in the press release. The idea that a
         | computer is going to pick the right code for you (from
         | comments, no less) is really just completely nuts. The belief
         | that it could be better than human-picked code is really way
         | off.
         | 
         | You bring up a really important point. When you use a tool like
         | Copilot (or copypasta of any kind), you are introducing the
         | _additional_ burden of understanding that other person 's code
         | -- which is worse than trying to understand your own code or
         | write something correct from scratch.
         | 
         | I think you've hit the nail on the head. Stuff like Copilot
         | makes programming worse and more difficult, not better and
         | easier.
        
           | res0nat0r wrote:
           | Isn't the entire point of this to _suggest_ code you may use,
           | not to just blindly accept is correct without thinking?
        
           | petercooper wrote:
           | While I accept most of the concerns, it's better than your
           | comment suggests. I see some promise for it as a tool for
           | reminding you of a technique or inspiring you to a different
           | approach than you've seen before.
           | 
           | For example, I wrote a comment along the lines of "Find the
           | middle point of two 2D positions stored in x, y vectors" and
           | it came up with two totally different approaches in Ruby -
           | one of which I wouldn't have considered. I did some similar
           | things with SQL, and some people might find huge value in it
           | suggesting regexes, too, because so many devs forget the
           | syntax and a reminder may be all it takes to get out of a
           | jam.
           | 
           | I'm getting old enough now to see where these sorts of
           | prompts will be a game changer, especially when dabbling in
           | languages I'm not very proficient in. For example, I barely
           | know any Python, so I just created a simple list of numbers,
           | wrote a "Sort the numbers into reverse order" comment, and it
           | immediately gave me the right syntax that I'd otherwise have
           | had to Google, taking much longer.
           | 
           | Maybe to alleviate the concerns it could be sandboxed into a
           | search engine or a separate app of its own rather than
           | sitting constantly in my main editor - I would find that a
           | fair compromise which would still provide value but require
           | users to engage in more reflection as to what they're using
           | (at least to a level that they would with using SO answers,
           | say).
        
             | rob74 wrote:
             | Yeah, but... I mean, I guess we all agree that copying code
             | from, let's say StackOverflow without checking if it really
             | does what you want it to do is a bad thing? Now here we
             | have a tool that basically automates that (except it's
             | copying from GitHub, not StackOverflow), and that's
             | supposed to be a good thing? Even if its AI is smarter, you
             | would still have to check the code it suggests, and that
             | can actually be harder than writing it yourself...
        
               | ethbr0 wrote:
               | The big boost, that I think parent is alluding to, is for
               | rusty (not Rust!) languages in the toolbox, where you may
               | not have the standard library and syntax loaded into your
               | working memory.
               | 
               | As a nudge, it's a great idea. As a substitute for
               | vigilance, it's a terrible idea.
               | 
               | I suspect that's why they named it Copilot instead of
               | Autopilot, but it's unfortunately more likely to be used
               | as the latter, humans being humans.
        
               | [deleted]
        
             | toss1 wrote:
             | Right, so it might occasionally be useful as a search tool
             | for divergent ideas of different approaches to a problem,
             | and your suggestion to sandbox it in a separate area works
             | for that.
             | 
             | But that does not seem to be it's advertised or configured
             | purpose, sitting in your main editor.
        
             | mimixco wrote:
             | This is good stuff. As a search engine, it could very well
             | be useful. As another poster pointed out, if some context
             | or explanation were provided along with the source
             | suggestions, its utility as a reference would really grow.
             | 
             | I totally agree with you that prompted help is a big deal
             | and just going to get bigger. We have developed a language
             | for fact checking called MSL that works exactly this way in
             | practice -- suggesting multiple options rather than just
             | inserting things.
             | 
             | One of the things that interests me about this thread is
             | the whole topic of UI vs. AI and how much help really comes
             | from giving the user options (and a good UI to discover
             | them) vs how much is "AI" or really intelligence. I think
             | the intelligence has to belong to the user, but a computer
             | can certainly sift through a bunch of code to find a search
             | engine result and, those results could be better than you
             | get now from Google &Co.
        
               | osmarks wrote:
               | If they're using something like GPT-3 on the backend,
               | which they probably are, it probably _can 't_ provide any
               | explanations or context (unless the output is memorized
               | training data, like this); the output can be somewhat
               | novel code not from any particular source, and while it
               | might be possible to find relevant information on similar
               | code, this would be a hard problem too.
               | 
               | EDIT: they appear to be interested in making it look for
               | similar code, see here:
               | https://docs.github.com/en/github/copilot/research-
               | recitatio...
        
           | vmception wrote:
           | Hm odd takes here.
           | 
           | It's really weird for software engineers to judge something
           | by its current state and not by its potential state.
           | 
           | To me, it's clearly solvable by Copilot filtering the input
           | code by that repository's license. It should only be certain
           | open source licenses, maybe even user-choosable, or code-
           | creators can optionally sublicense their code to Copilot in a
           | very permissable way.
           | 
           | Secondly, a way for the crowd to code review suggestions
           | would be a start.
        
             | gpm wrote:
             | Practically every open source license requires attribution,
             | if copilot has a licensing issue, training a model on only
             | repositories with the same license won't fix it except for
             | the extremely rare licenses which do not require
             | attribution.
        
               | vmception wrote:
               | why not? it can just generate an attribution file or
               | reminder
        
               | gpm wrote:
               | Because it's an opaque neural network on the backend, it
               | doesn't know if or from whom it copied code.
        
               | buu700 wrote:
               | Could they handle this by generating a collective
               | attribution file that covers every (permissively
               | licensed) repository that Copilot learned from?
               | 
               | Of course this would be massive, so from a practical
               | consideration the attribution file that Copilot generates
               | in the local repository would have to just link to the
               | full file, but I don't think that would be an issue in
               | and of itself.
        
               | gpm wrote:
               | Maybe? Might depend on the license, I doubt the courts
               | would be amused.
               | 
               | Almost certainly a link would not suffice, basically
               | every license requires that the attribution be directly
               | included with the modified material. Links can rot, can
               | be inaccessible if you don't have internet access, can
               | change out from underneath you, etc.
               | 
               | (I am not a lawyer, btw)
        
               | buu700 wrote:
               | Makes sense. Maybe something like git-lfs/git-annex would
               | be sufficient to address the linking issue, but it seems
               | like the bigger concern is whether a court would accept
               | this as valid attribution. In a sense it reminds me of
               | the LavaBit stunt with the printed key.
        
               | joeyh wrote:
               | I think a judge could be persuaded that a list of every
               | known human does not constitute a valid attribution of
               | the actual author, even though their name is on the list.
               | The purpose of an attribution is to acknowledge the
               | creator of the work, and such a list fails at that.
        
               | buu700 wrote:
               | Makes sense. That's probably the best interpretation
               | here. Any other decision would make attribution lists
               | optional in general for all practical purposes.
        
             | mimixco wrote:
             | I've been in the business a long time and I just don't
             | believe in generalized AI at all. Writing code requires
             | general (not artificial) intelligence. All of these "code
             | helping" tools break down quickly because they may be
             | searching for and finding relevant code blocks (the
             | "imperative hell" referred to by another commenter), but
             | they don't understand the _context_ or the overall behavior
             | and goals of the program.
             | 
             | Writing to overall goals and debugging actual behavior are
             | the real work of programmers. Coming up with syntax or
             | algorithms are 3rd and 4th on the priority list because,
             | lets face it, it's not that hard to find a reference for
             | correct syntax or the overall recipe implied by an
             | algorithm. Once you understand those, you can write the
             | correct code for your project.
             | 
             | I do think Copilot has potential as a search engine and
             | reference tool -- if it can be presented that way. But the
             | idea of a computer actually coming up with the right code
             | in the full context of the program seems like fantasy.
        
               | gpm wrote:
               | If we're coming up with potential uses, I think they got
               | the direction wrong.
               | 
               | Don't tell me what to do, tell me what not to do. "this
               | line doesn't look like something that belongs in a code
               | base", "this looks like a line of code that will be
               | changed before the PR is merged". Etc.
        
               | mimixco wrote:
               | _That_ would be fantastic! Imagine if it could catch
               | common errors before you make them. So many things in
               | loops and tests that we mess up all the time. My favorite
               | is to confuse iterating through an array vs an object in
               | JS. I 'd love to have Gazoo step in and say, "Don't you
               | mean, _this_ , David?"
        
             | slumdev wrote:
             | > It's really weird for software engineers to judge
             | something by its current state and not by its potential
             | state.
             | 
             | No, we're not afraid of Copilot replacing us. The thought
             | is ridiculous, anyway. If it actually worked, we would be
             | enabled to work in higher abstractions. We'd end up in even
             | higher demand because the output of a single engineer would
             | be so great that even small businesses would be able to
             | afford us.
             | 
             | Yes, we are afraid of Copilot making the entire industry
             | worse, the same way that "low-code" and "no-code" solutions
             | have enabled generations of novices to produce volumes of
             | garbage that we eventually have to clean up.
        
               | vmception wrote:
               | Sounds like projecting because thats not what I was
               | referring to
               | 
               | I'm saying copilot can be better with very simple tweaks
        
           | [deleted]
        
           | onion2k wrote:
           | _Stuff like Copilot makes programming worse and more
           | difficult, not better and easier._
           | 
           | Copilot makes programming worse and more difficult if you're
           | aiming for a specific set of coding values and style that
           | Copilot doesn't generate (yet?). If Copilot generates the
           | sort of code that you would write, and it does for _a lot_ of
           | people, then it 's definitely no worse (or better) than
           | copying something from SO.
           | 
           | The author of a declarative, functional C# framework likely
           | has very different ideas to what code should be than some PHP
           | developer just trying to do their day-to-day job. We
           | shouldn't abandon tools like Copilot just because they don't
           | work out at the more rigorous ends of the development
           | spectrum.
        
             | serf wrote:
             | >If Copilot generates the sort of code that you would
             | write, and it does for a lot of people, then it's
             | definitely no worse (or better) than copying something from
             | SO.
             | 
             | Disagree.
             | 
             | Most SO copy-paste must be integrated into your project --
             | maybe it expects different inputs, maybe it expects or
             | works with different variables -- whatever, it must be
             | partially modified to work with the existing code-base that
             | you're working with.
             | 
             | Copilot does the integration tasks for you. When one might
             | have had to read through the code from SO to understand it
             | enough to integrate it, the person using Copilot need not
             | even invest that much understanding.
             | 
             | Because of these workflow differences, it seems to me as if
             | Copilot enables an even more low-quality workflow than
             | offered by copy-pasting from SO and patching together
             | multiple code-styles and paradigms while hoping for the
             | best; Copilot does that without even the wisdom that an SO
             | user might have that 'this is a bad idea.'
        
               | buu700 wrote:
               | I'm not firmly for or against the concept of Copilot, but
               | it's fascinating to me that it will introduce an entirely
               | new class of bugs. Rather than specific mistakes in
               | certain blocks of code and edge case errors in handling
               | certain inputs, now we're going to have
               | lazy/overworked/junior developers getting complacent and
               | committing code they haven't reviewed that isn't even
               | close to their intent. Like you could have a backend
               | method that was supposed to run a database query, but
               | instead it sends the content of an arbitrary variable in
               | a POST request to a third-party API or invokes a shell to
               | run `rm -rf /`.
        
               | marcosdumay wrote:
               | To me, the most interesting aspect is the new class of
               | supply chain security vulnerabilities it will create. How
               | people will act to exploit or protect1 against those will
               | be very interesting.
               | 
               | 1 - I don't expect "not using a tool that generates bad
               | code" to be the top option.
        
             | nightpool wrote:
             | The arguments that the GP makes are not based on a specific
             | style or value of coding. Instead, they're based on the
             | simple truth that it is harder to understand code that
             | somebody else wrote.
             | 
             | In some cases the benefits of doing so outweigh the costs
             | (such as using a stack overflow answer that's stood the
             | test of time for something you don't know how to do), but
             | with Copilot you don't even get the benefit of upvotes,
             | human intent, or crowdsourced peer review.
        
             | mimixco wrote:
             | I don't think they work out past trivial applications. Any
             | non trivial app requires an understanding of a much larger
             | part of the codebase than a tool like Copilot is looking at
             | at any one time.
             | 
             | Copilot does not understand the code _in toto_ and is
             | therefore really useless for debugging (70% of all coding)
             | and probably useless for anything other than very simple
             | parts of an app.
        
               | onion2k wrote:
               | _Any non trivial app requires an understanding of a much
               | larger part of the codebase than a tool like Copilot is
               | looking at at any one time._
               | 
               | I don't think that's important. Copilot, at least as it's
               | been demo'd so far judging by the examples, is to help
               | you write small, standalone functions. It shouldn't need
               | to know about the rest of the application. Just as the
               | functions that you write yourself shouldn't need to know
               | about the rest of the application either.
               | 
               | If your functions need a broad understanding of the
               | codebase as a whole how the heck do you write tests that
               | don't fail the instant anything changes?
        
               | mimixco wrote:
               | The reality of code is that stuff breaks when connected
               | to other stuff, as it eventually must be for real work to
               | happen. There's no getting around that.
               | 
               | Since that's where the work of programming is, debugging
               | connected applications (not writing fresh, unencumbered
               | code, a rare luxury), a tool that offers no help for that
               | is, well, not much help.
        
         | GlennS wrote:
         | I'm inclined to agree with you, and actually I'm rather
         | mistrustful of even basic autocomplete ever since a colleague
         | caught me using it without even looking at the screen!
         | 
         | But I wonder...
         | 
         | Is this a difference of programmer culture?
         | 
         | I think there are people who write successful computer programs
         | for successful businesses without delving into the details.
         | Without considering all the things that might go wrong. Without
         | mapping the code they're writing to concepts.
         | 
         | Lots of people.
         | 
         | What would they do with this?
        
           | louthy wrote:
           | > What would they do with this?
           | 
           | Not get a job working for me ;)
           | 
           | More seriously, when I think back to when I was first
           | learning programming - in the heady days of 1985 - I would
           | often copy listings out of computing magazines, make a
           | mistake whilst doing it, and then have no idea what was
           | wrong. The only way was to check character by character. I
           | didn't have the deeper understanding yet, and so I couldn't
           | contribute to solving the problem in any real way.
           | 
           | If they're at that level as a programmer, to the point where
           | their code is being written for them and they don't really
           | understand it, then they're going to make some serious
           | mistakes eventually.
           | 
           | If you want to step up as a dev, understanding is key.
           | Programming is hard and gets harder as you step up and bite
           | off bigger and more complex problems. If you're relying on
           | the tools to write your code, then your job is one step away
           | from being automated. That should be enough to light a fire
           | under your ambition!
        
             | biztos wrote:
             | I also typed stuff in from magazines in the 80's, and my
             | fast but imperfect typing really helped me learn
             | programming: I often had to stop, go back to the first
             | page, and actually _read_ the damned thing in order to make
             | it work.
        
         | code4you wrote:
         | Great points. Really makes me question why so many developers
         | were excited / worried about programming jobs being automated
         | away by this technology. I really doubt that many jobs are
         | going to be displaced by what is at best an improvement to
         | autocomplete/intellisense and at worst an unreliable, copyright
         | infringing boilerplate generator. Also agree with point #3 - I
         | could see Copilot steering devs away from new code patterns
         | toward whatever was most commonly seen in the existing
         | codebases it was trained on. Doesn't seem good for innovation
         | in that sense.
        
       | influx wrote:
       | I get why marketing calls machine learning "AI". I don't get why
       | engineers would think this is.
       | 
       | Dumb.
        
         | squeaky-clean wrote:
         | I still consider anything with more than 3 if-statements to be
         | AI. We just need more sensible expectations about what AI can
         | do haha.
        
         | SEMW wrote:
         | > I don't get why engineers would think this is.
         | 
         | This claim that "AI" only means artificial general / human-
         | equivalent intelligence completely ignores the long history of
         | how that term has been used, by computer science researchers,
         | for the last 70-odd years, to include everything from Shannon's
         | maze-solving algorithms, to Prolog-y systems, to simple
         | reinforcement learning, and so on.
         | 
         | https://web.archive.org/web/20070103222615/http://www.engagi...
         | 
         | It's true that there has been linguistic drift in the direction
         | of the definition getting narrower (to the point where it's a
         | joke that some people use 'AI' to mean whatever computers can't
         | do _yet_). And you can have reasons to prefer your own very-
         | narrow definition. But claiming that your own definition is the
         | only valid one to the point that anyone using a wider
         | definition (one that has a long etymological history, and which
         | remains in widespread use) are "dumb" is... not how language
         | works.
        
           | influx wrote:
           | It hasn't been AI the entire time. It's borderline fraud,
           | tbh.
        
         | konfusinomicon wrote:
         | it's the marketing magic bullet. each person shot is entranced
         | by its promises, and given unlimited ammo to spread its lies.
         | few possess armor capable of stopping them
        
       | yepthatsreality wrote:
       | Co-pilot is just lowest common denominator solutions with flashy
       | tabbing.
        
       | axiosgunnar wrote:
       | I hate to be the one that says this but I think it's true:
       | 
       | "So you are an SWE and you take a break from work to go to
       | Hackernews to complain that Github's Copilot, which is an AI-
       | based solution meant to help SWEs, is utter shit and completely
       | unusuable.
       | 
       | And then you go back to writing AI-based solutions for some other
       | profession. Which is totally not shit or anything."
       | 
       | Can anybody put this more elegantly?
        
         | MajorBee wrote:
         | A variation of the Gell-Mann Amnesia effect?
         | 
         | "Briefly stated, the Gell-Mann Amnesia effect is as follows.
         | You open the newspaper to an article on some subject you know
         | well. In Murray's case, physics. In mine, show business. You
         | read the article and see the journalist has absolutely no
         | understanding of either the facts or the issues. Often, the
         | article is so wrong it actually presents the story backward--
         | reversing cause and effect. I call these the "wet streets cause
         | rain" stories. Paper's full of them. In any case, you read with
         | exasperation or amusement the multiple errors in a story, and
         | then turn the page to national or international affairs, and
         | read as if the rest of the newspaper was somehow more accurate
         | about Palestine than the baloney you just read. You turn the
         | page, and forget what you know."
         | 
         | https://www.goodreads.com/quotes/65213-briefly-stated-the-ge...
        
           | edgyquant wrote:
           | I would do this with Reddit posts. I'd see the top comment
           | under something I was familiar with and see it was full of
           | holes or just incorrect but then I'd go to a post about
           | something I didn't know all that well and take the top
           | comment at face value.
        
         | joe_the_user wrote:
         | SWEs create AI based solutions to X 'cause people pay them.
         | Entrepreneurs and investors are the one who actually think
         | they're the answer to everything.
         | 
         | Also, Copilot might (or might not) be useless or even interfere
         | with real work. But it's probably low on the scale of awful
         | things SWEs have helped create. The AI parole app is a thing
         | that should haunt the nightmare of whoever created it, for
         | example. But lots of AI apps may be useless but are probably
         | also harmless so doing that might not be worst thing.
        
         | SamBam wrote:
         | "'I never thought leopards would eat MY face,' sobs woman who
         | voted for the Leopards Eating People's Faces Party."
        
         | Hamuko wrote:
         | > _And then you go back to writing AI-based solutions for some
         | other profession._
         | 
         | I don't know what you're talking about, I'm a webshit
         | developer.
        
         | [deleted]
        
         | skinkestek wrote:
         | You mean like the insanely annoying AIs that replaced Google
         | search? The idiotic one that files Javascript books under "Law"
         | in Amazon or the insulting one who runs Ad Sense and thinks my
         | wife isn't good enough and I am stupid enough to leave her for
         | some mail order bride?
        
           | donkeybeer wrote:
           | Javascript books under "Law" is hilarious
        
             | drdaeman wrote:
             | I'm in for JavaScript Penal Code. Make that unwarranted
             | type-coercing operator use punishable by law.
        
               | axiosgunnar wrote:
               | Instead of prison you go to callback hell
        
           | shadilay wrote:
           | Maybe the Google AI is a polygamist and thinks you ought to
           | have a 2nd wife?
        
         | marcosdumay wrote:
         | There are probably good ways to apply AI to software
         | development (has anybody tried to build a linter already?). It
         | is this product that is very bad.
         | 
         | The same certainly apply to other tasks.
        
         | thinkingemote wrote:
         | The most common example of this would probably be complaining
         | about advertising whilst working for a business that depends on
         | advertising to survive.
         | 
         | Ultimately it's a kind of Kafkaesque trap that modern living
         | has us all in to a larger or lesser extent.
        
           | srcreigh wrote:
           | That's a bit different. Advertising is like a race to the
           | bottom, where everybody to survive takes part. You can do
           | that meanwhile wish that it could somehow not be that way.
           | Same with environmental issues.
           | 
           | The GP comment by contrast is about hypocrisy. I personally
           | found it funny that I didn't ever read about (or consider)
           | copyright violations of deep learning until they tried to do
           | it with code :-)
           | 
           | Of course programmers would find the problem with AI as soon
           | as it exploited _them_.
        
         | [deleted]
        
         | TeMPOraL wrote:
         | Dunno. I go to HN because it's the one place where I can whine
         | about AI being total bullshit, for the exact reasons as we're
         | now complaining about wrt. Copilot.
        
       | rpmisms wrote:
       | I like tabnine. It's an autocomplete tool and doesn't pretend to
       | be anything more.
        
       | celeritascelery wrote:
       | From the Copilot FAQ:
       | 
       | > The technical preview includes filters to block offensive words
       | 
       | And somehow their filters missed f*k? That doesn't give a lot of
       | confidence in their ability filter more nuanced text. Or maybe it
       | only filters truly terrible offensive words like "master".
        
         | spoonjim wrote:
         | Blocks offensive words, but doesn't block carefully crafted
         | malware.
        
         | minimaxir wrote:
         | In my testing of Copilot, the content filters only work on
         | _input_ , not output.
         | 
         | Attempting to generate text from code containing "genocide"
         | just has Copilot refuse to run. But you can still coerce
         | Copilot to return offensive output given certain innocuous
         | prompts.
        
           | aasasd wrote:
           | Maybe Github just doesn't have many repos to control death
           | factories and execution squads?
        
           | Jackson__ wrote:
           | Interesting how this continues to be an issue for GPT3 based
           | projects.
           | 
           | A similar thing is happening in AI Dungeon, where certain
           | words and phrases are banned to the point of suspending a
           | users account if used a certain amount of times, yet they
           | will happily output them when it is generated by GPT3 itself,
           | and then punish the user if they fail to remove the offending
           | pieces of text before continuing.
        
           | Closi wrote:
           | Ahh, so it's the most pointless interpretation of the phrase
           | "filters to block offensive words", where it is stopping the
           | user from causing offense to the AI rather than the other way
           | around.
        
             | verdverm wrote:
             | They probably don't want to repeat Microsoft's incident
             | with Tay, though they seem to have created their own
             | incident which dooms the product if it wasn't already
        
             | derefr wrote:
             | I believe the concept is to stop users from prompting the
             | AI to generate offensive stuff specifically, and then
             | publishing the so-generated stream of offensive stuff as
             | negative PR for GitHub, in the same way the generated
             | stream of offensive stuff coming from Microsoft's AI was a
             | big PR disaster.
        
               | bambax wrote:
               | Maybe, but even if so, filtering the output would also
               | prevent this.
        
               | stingraycharles wrote:
               | I suppose you're referring to the AI Twitter bot that
               | initially was very lovely and within a day 4chan had
               | turned into a nazi. That was both very naive and
               | hilarious.
               | 
               | https://spectrum.ieee.org/tech-talk/artificial-
               | intelligence/...
               | 
               | The big difference in this case, however, is that this AI
               | was constantly learning based on user input, however,
               | which I do not think is the case for Copilot.
        
             | raffraffraff wrote:
             | Easily offended AI is exactly what the world needs
        
               | GenerocUsername wrote:
               | We have too many easily offended NPC's as is.
        
           | krick wrote:
           | Lol, how does _that_ make any sense? I mean, all these word
           | blacklists are always pretty stupid, but at least you can
           | usually see the motivation behind them. But in this case I 'm
           | not even sure what they tried to achieve, this is absolutely
           | pointless.
        
         | throwaway2037 wrote:
         | Just to be clear for other readers: Are you being sarcastic
         | about the last sentence that mentions the term 'master'? I hope
         | not.
         | 
         | As I understand, this movement (for lack of a better term!)
         | started in the United States, which has a long and complicated
         | history of slavery. In the last few years in my various jobs
         | (all outside the United States), there has been a concerted
         | effort to remove and instances of "master" and "slave" and
         | replace with terms like "primary" and "secondary".
         | 
         | For co-workers not familiar with the history of slavery in the
         | United States, there is always a pause, and then some confusion
         | about the changes. After explaining the historical context, 99%
         | of people reply: "Oh, I understand. Thank you to explain."
        
           | andrewzah wrote:
           | The word master has many usages. One specific context
           | (master/slave) is inappropriate, but that doesn't mean every
           | other context is unusable now.
           | 
           | Github changing master->main was the epitome of virtue
           | signaling. This literally does not affect black people at
           | all, nor does it do -anything- to help with racial inequality
           | in the US. It's actually quite patronizing and tone-deaf to
           | think that instead of all the things -Microsoft- could be
           | doing to help racial inequality, they're putting in as little
           | effort as possible.
           | 
           | Congrats on granting power over words to unreasonable people
           | who ignore things like context in language and common sense.
        
           | mdoms wrote:
           | I don't work in USA and I don't intend to. Your history of
           | slavery is none of my concern, especially when I'm just
           | trying to do my work.
           | 
           | The word 'master' is useful for me, and I don't believe for a
           | nanosecond that anyone, American or not, is ACTUALLY offended
           | by it. I believe that some people (mostly affluent white
           | Americans) are searching for things that they think they
           | SHOULD be offended by.
        
           | slackfan wrote:
           | And in my historical context power fists that your ideology
           | used were used by a regime that murdered millions. In the
           | past 100 years.
        
           | blindmute wrote:
           | > After explaining the historical context, 99% of people
           | reply: "Oh, I understand. Thank you to explain."
           | 
           | A similar percentage then think to themselves, privately,
           | "well that's pretty stupid."
        
           | Isinlor wrote:
           | Why is there no push-back against using the word Slave that
           | originates from word "Slav" due to enslavement of Slavic
           | people?
           | 
           | By analogy, you are basically using the word African to mean
           | "a person in possession of someone else".
           | 
           | https://www.etymonline.com/word/slave
           | 
           | @edit The fact that people down vote this highlights that the
           | whole issue is just virtue signaling.
        
           | RicardoLuis0 wrote:
           | while the word 'master' can indeed be used in the sense of
           | "master and slave", its use in git is more akin to the use of
           | 'master' in "master record", and doesn't refer to 'ownership'
           | in any way
        
             | [deleted]
        
           | sseagull wrote:
           | Everyone has a line of how much they are willing to change
           | their language, though. There will always come a point where
           | someone will think some change is "silly", even though the
           | old term may have upset some people. And almost every term
           | has some sort of baggage associated with it.
           | 
           | There was a post going around somewhere of a college's
           | earnest attempt and change some language (like avoiding "give
           | it a shot" because of the association of "shot" with guns).
           | Would renaming all the various things we call "triggers" be
           | ok, so we don't upset victims of gun violence?
           | 
           | So the master->main change was the line for some people, not
           | others.
        
             | andrewzah wrote:
             | As a matter of principle I don't think we should be moving
             | towards ignoring any and all contexts of words. Granting
             | this power of word banning to random arbiters is quite
             | crazy. In this case, master was moreso changed because it
             | -could- be deemed offensive, not that it -actually- is
             | offensive by itself. Not one person that I've spoken to
             | about it has actually cared.
             | 
             | Words having multiple usages is not really a novel concept.
             | If we ban words based on them potentially being offensive,
             | we'll end up with no words at all as people move onto using
             | different words, and so forth.
             | 
             | It is not silly to have pushback when someone wants to
             | grant themselves power over language usage. Dropping usage
             | of a word should have a strong, tenable argument and larger
             | community support than 0.00000001% of people caring.
        
             | LAC-Tech wrote:
             | > Everyone has a line of how much they are willing to
             | change their language, though.
             | 
             | But that line is constantly moving though. People are
             | forced to adapt, or they are ostracised socially and
             | economically.
             | 
             | If prestigious organisations, people and institutions
             | decide "master/slave" is an immoral thing to say, I have no
             | choice. Eventually I'll need to fall in line or my
             | livelihood will be at risk.
        
           | username90 wrote:
           | > For co-workers not familiar with the history of slavery in
           | the United States, there is always a pause, and then some
           | confusion about the changes. After explaining the historical
           | context, 99% of people reply: "Oh, I understand. Thank you to
           | explain."
           | 
           | Most people answer like this when they realize you are an
           | unreasonable person who refuse to listen. Happens all the
           | time, like "Oh, I understand (you are one of those). Thank
           | you for explaining!", and remember that they need to stop
           | using this word when working with you.
        
             | bestcoder69 wrote:
             | By rolling your eyes you accept my terms and conditions.
        
             | LAC-Tech wrote:
             | Imagine 'explaining' the historical context to someone from
             | say, Brazil.
        
           | pydry wrote:
           | Changing master to main was something Github did when they
           | were taking heat for their contract with ICE. It was a nice
           | bit of misdirection that cost them nothing, achieved nothing
           | and garnered praise in some quarters.
           | 
           | ICE, of course, runs an _actual_ concentration camp which has
           | a slightly more troublesome history than the word master.
           | 
           | Language policing is to racism what recycling is to global
           | warming - an attempt to shift the focus away from elite
           | responsibility for systemic issues to "personal
           | responsibility" and forestall meaningful reform by placing
           | emphasis on largely non-threatening symbolic gestures.
        
             | samatman wrote:
             | y'know it really seems like both purpose and outcome need
             | to be closely examined here, if we're going to be
             | emphasizing _actual_ next to concentration camps.
             | 
             | what's the paradigm of a concentration camp? if we go
             | straight for Auschwitz we'll get nowhere, how about the
             | Boer concentration camps? Origin of the term after all.
             | 
             | What was the purpose? To _concentrate_ the Boer population
             | during a total war against them, so they couldn 't supply
             | and hide the belligerents.
             | 
             | What was the outcome? Tens of thousands of preventable
             | deaths, mostly from disease. Success in the war, from the
             | British perspective.
             | 
             | So, let me turn my spectacles to your example of, may I
             | quote?
             | 
             | > an _actual_ concentration camp
             | 
             | Which appears to be a migrant detention center. To put it
             | succinctly, migrants who enter the country without filling
             | out paperwork, and get caught, end up in one of these
             | places for months-to-years while USG figures out what to do
             | with them.
             | 
             | So a Boer concentration camp is filled by the British
             | riding into a farmstead or town, kidnapping the women and
             | children, and driving them out to a field and sticking them
             | in a tent. A migrant detention center is filled with
             | someone enters the United States without following the
             | rules which govern that sort of behavior, and then, gets
             | caught.
             | 
             | Where is the war?
             | 
             | Where is the excess death?
             | 
             | Ah well. I'm out of time and patience to express my
             | contempt for your abuse of language and disrespect for the
             | real horrors which you cheapen with this kind of facile
             | speech.
             | 
             | Enjoy the 4th of July.
        
               | bloomark wrote:
               | Your vacuous argument about what is an _actual
               | concentration camp_ is out of place. This wasn't a
               | discussion about concentration camps, it was about
               | github's attempted misdirection, and their facetious show
               | of supporting inclusion, by eliminating the term
               | "master".
               | 
               | https://news.ycombinator.com/item?id=26487854
        
               | pydry wrote:
               | Is this an indirect way of saying that you support ICE?
               | 
               | Coz if so Id really rather hear it straight rather than
               | indirectly via an attempt to police my language.
        
             | SahAssar wrote:
             | I get what you mean, but in a discussion about semantics it
             | might be unhelpful to dilute the term "concentration camp",
             | especially if prefixed with "actual" in italics. That is
             | unless you actually mean that ICE camps serve the same
             | purpose and are equivalent to nazi concentration camps.
        
               | pydry wrote:
               | The Nazis ran what would more accurately be termed
               | extermination camps.
               | 
               | Though what they did certainly bore a strong resemblance
               | to the Boer war concentration camps/manzanar,etc. whose
               | purpose was to "concentrate" people into one place rather
               | than industrially slaughter them.
        
               | pmkiwi wrote:
               | To be correct, both existed.
               | 
               | A camp like Ravensbruck was a concentration camp (for
               | women) while Auschwitz-Birkenau was both a concentration
               | and extermination camp.
               | 
               | https://upload.wikimedia.org/wikipedia/commons/b/be/WW2_H
               | olo...
        
               | SahAssar wrote:
               | I don't know if I've ever heard anyone use the term
               | "concentration camp" without qualifiers to refer to
               | anything else than the nazi concentration camps (or
               | something equivalent).
               | 
               | Maybe it's just me, but I think it would have been more
               | clear if you said internment camp if your intent was to
               | refer to the broader context and not invoke a comparison
               | to nazis.
        
               | pydry wrote:
               | Wikipedia redirects concentration camp to:
               | 
               | https://en.wikipedia.org/wiki/Internment
               | 
               | Where it also makes the point that the nazi camps were
               | primarily extermination camps.
               | 
               | Maybe take it up with them and get back to me if you feel
               | truly passionate about this issue.
               | 
               | >Maybe it's just me, but I think it would have been more
               | clear
               | 
               | Gosh, it's awfully ironic that this sentence would happen
               | in a thread about how language policing is used as a
               | distraction from _important_ issues.
               | 
               | Is it more important to you how people _use_ the term
               | concentration camp or the fact that ICE lock up children
               | in internment /concentration/[ insert favorite word here
               | ] camps?
        
               | SahAssar wrote:
               | > So, is it more important to you how people use the term
               | concentration camp or the fact that ICE lock up children
               | in internment/concentration/[ insert favorite word here ]
               | camps?
               | 
               | Well, that escalated quickly.
               | 
               | I don't think I ever said anything for or against what
               | ICE is doing, in fact I tried not to because the only
               | thing I wanted to say was that when using the words
               | "literally concentration camps" people might read that as
               | "camps designed to kill people" since that is the way
               | I've been taught it (in history classes) and heard it (in
               | general use).
               | 
               | I don't even live in the US so I have no say in this in a
               | democratic sense. If I did I'd be against the way
               | migrants are treated and want more humane treatment, but
               | I don't think that should be relevant to what I said.
        
               | pydry wrote:
               | Your primary worry was that somebody _might_ read that
               | sentence and believe that the US is gassing immigrants?
               | 
               | Seems unlikely.
        
               | SahAssar wrote:
               | You seem to think I have some political motive, I don't.
               | I just saw a comment that from my perspective and
               | historical education seemed to equate two things that I
               | regard as different and said that it might be helpful to
               | not conflate those. It seems like you did not intend to
               | conflate them and it is a difference in what you and I
               | read into the term "actual concentration camp".
               | 
               | From my perspective this conversation is as if someone
               | said "working for XCompany is actual slavery" and I said
               | "Perhaps don't use 'actual slavery' as a term for
               | something that isn't that?"
        
               | junon wrote:
               | Historians themselves call what ICE is doing a
               | concentration camp. So your experience is very much
               | localized.
        
               | hdhjebebeb wrote:
               | It seems like a distinction without a difference, this
               | article for example uses them interchangably:
               | https://www.commondreams.org/views/2019/06/21/brief-
               | history-...
        
               | dragonwriter wrote:
               | Nazi "concentration camps" were not actual concentration
               | camps (a thing which long predates the Nazi camps), they
               | were extermination camps for which "concentration camp"
               | was a minimizing euphemism.
               | 
               | US WWII "internment" and "relocation" centers were actual
               | concentration camps ("relocation center" was itself a
               | euphemism, but "internment" referred to a formal legal
               | distinction impacting treaty obligations.)
        
               | SahAssar wrote:
               | Sure, but I don't know if I've ever heard anyone use the
               | term "concentration camp" without qualifiers to refer to
               | anything else than the nazi concentration camps (or
               | something equivalent).
               | 
               | If someone says that something is "_literally_ a
               | concentration camp" I think that most people will think
               | of ovens and genocide.
               | 
               | Perhaps it's a regional thing, but that is how I
               | interpreted it.
        
               | [deleted]
        
               | sombremesa wrote:
               | It's not so much a regional as a political thing. Want it
               | to sound worse? Use concentration camp. Want it to sound
               | better? Use internment camp (or in some cases, re-
               | education facility).
        
               | michael1999 wrote:
               | Or "Reserve".
        
               | dragonwriter wrote:
               | Relevant to that, the US WWII internment camps
               | were...placed on land taken from (with disputedly-
               | adequate compensation for the use) reservation land.
        
               | [deleted]
        
               | bambax wrote:
               | Why is this downvoted... It's simply the truth.
        
               | kanzenryu2 wrote:
               | There were only a handful of mass extermination camps.
               | There were tens of thousands of concentration camps.
               | https://encyclopedia.ushmm.org/content/en/article/nazi-
               | camps....
        
           | okamiueru wrote:
           | Pretty sure they were being sarcastic. I also don't find your
           | arguments persuasive in the slightest, and I find myself
           | being skeptical of these recent moral outcries. I'm skeptical
           | of its sincerity, and I don't buy it. "Master" has an
           | etmylogical background far more diverse than the dichotomy to
           | "slave". I can wholeheartedly say that I've not once thought
           | to make that association. It's been a title for centuries.
           | Master blacksmith, etc. (See
           | https://en.wikipedia.org/wiki/Master for a list)
           | 
           | Another example of what seems like a fake moral outcry is
           | "blackface". And, I mean what it is being referred to now,
           | and not the actual meaning. The racist ridicule by
           | stereotyping ethnicity. That was "Blackface". Yet, for some
           | reason, context doesn't matter anymore, and we end up with
           | removing episodes of Community because someone painted their
           | face in a cosplay of an dark elf, in exact commentary of
           | this.
           | 
           | There is a significan systemic racism in the US that affects
           | almost everything. In order to deal with those things, the
           | very first thing would be to properly be able to identify
           | racism. Context matters. Renaming "Master" branches is not
           | progress. Ostracising a kid for dressing up as Michael
           | Jackson isn't it.
           | 
           | Whenever I see outrage over such things I cynically think
           | that the person is probably white, and probably doing it for
           | attention. One thing is for sure, it only serves to detract
           | from the real issues.
        
             | rorykoehler wrote:
             | Check out the recent Marc Rebillet stream with Flying Lotus
             | and Reggie Watts. They absolutely destroy the bs around the
             | use of the word master. I think both FL and RW will be
             | quite representative of how African Americans (and the rest
             | of the world) feel about this.
        
               | okamiueru wrote:
               | Do you have a timestamp? As enjoyable as as it is to
               | listen to each of them, the stream was mostly music and
               | almost two hours long.
        
               | rorykoehler wrote:
               | The next couple of minutes from here
               | https://youtu.be/0J8G9qNT7gQ?t=3984
        
       | greyfox wrote:
       | Very interesting that this was posted as I literally JUST watched
       | an even MORE interesting youtube upload about this very bit of
       | code just last weekend.
       | 
       | Here's the very fun video if anyone wants to take a look:
       | 
       | https://www.youtube.com/watch?v=p8u_k2LIZyo
        
       | cblconfederate wrote:
       | Clearly, swearing is the only right way to write that function
        
       | stefan_ wrote:
       | Even includes the commented out code. Clearly Copilot has gained
       | a deep understanding of code and is not simply the slowest way to
       | make a terrible, opaque search engine ever!
        
         | mrfredward wrote:
         | From the tweet it looks like an awesome search feature. Just
         | type what you wanted to search for right inline and then it can
         | drop the result in without you ever changing a window or moving
         | a hand to the mouse.
         | 
         | Problem is you don't know whose code you're stealing, which
         | leads to all sorts of legal, security, and correctness issues.
        
         | aj3 wrote:
         | Does GitHub Copilot write perfect code?
         | 
         | No. GitHub Copilot tries to understand your intent and to
         | generate the best code it can, but the code it suggests may not
         | always work, or even make sense. While we are working hard to
         | make GitHub Copilot better, code suggested by GitHub Copilot
         | should be carefully tested, reviewed, and vetted, like any
         | other code. As the developer, you are always in charge.
         | 
         | https://copilot.github.com/
         | 
         | EDIT: the text above is a direct quote from the Copilot website
        
           | danparsonson wrote:
           | > ...may not always work, or even make sense...
           | 
           | Naively, as someone who just heard of this - that sounds
           | worse than useless. If you can't trust its output and have to
           | verify every line it produces _and_ that the combination of
           | those lines does what you wanted, surely it 's quicker just
           | to write the code yourself?
        
             | aj3 wrote:
             | Then write the code yourself. It's not like you're forced
             | to use this demo.
        
               | danparsonson wrote:
               | Well, you're right. I was somehow expecting there might
               | be a silver lining I'd missed but perhaps not.
        
               | cjaybo wrote:
               | Not exactly a confidence-inspiring reply from someone who
               | just identified themselves as representing the project
               | here!
        
               | aj3 wrote:
               | I don't work for Github (nor MS) and do not represent
               | Copilot.
        
             | vultour wrote:
             | Just today I needed to quickly load a file into a string in
             | golang. I haven't done that in a while, so I had to go look
             | up what package and function to use for that. I'd love a
             | tool that would immediately suggest a line saying
             | `ioutil.ReadFile()` after defining the function. I would
             | never accept a full-function suggestion from Copilot,
             | similarily to how I never copy and paste code verbatim from
             | StackOverflow. Using it as hints for what you might want to
             | use next seems like a nice productivity boost.
        
           | edgyquant wrote:
           | It's quite literally stealing code from repos under a GPL
           | license and suggesting them to people regardless of license
           | (if any) they're using. I do not see how this is legal.
        
             | aj3 wrote:
             | I disagree with this attitude. Many demos such as this one
             | with Quake code are intentionally looking for (funny)
             | outliers by bending the rules. But this is not how anyone
             | would use the system in a real scenario (no one should
             | select license by typing "// Copyright\t" and selecting
             | whatever gets auto-completed), so it doesn't really
             | demonstrate any new limits besides what you could
             | reasonably expect anyway (and what's mentioned on the
             | Copilot's landing page).
             | 
             | Basically, in order to fall victim for this "code theft"
             | (or any other "footguns" from Twitter threads) you'd need
             | to be actively working against all the best practices and
             | common sense. If you actually use it as a productivity tool
             | (the way it is marketed) you'll remain in full control of
             | your code.
        
       | comodore_ wrote:
       | funny, the youtube algo blessed me with an in dept video (~1y
       | old) about this quake function yesterday.
        
       | gumby wrote:
       | stack overflow at its automated finest.
       | 
       | Or should we call it the Tesla of software?
        
       | FlyingSnake wrote:
       | The rate at which these bots implode imply something about the
       | whole AI/ML zeitgeist.
        
       | meling wrote:
       | What I would love even more than copilot helping me write code is
       | a copilot to write my tests for the code I write.
        
       | danuker wrote:
       | They could train on solely MIT-licensed code, and dump ALL the
       | copyright notices of code used for training into a file. Problem
       | solved.
        
         | Uehreka wrote:
         | Plenty of people probably copy-paste GPL code with the comments
         | and stick MIT on it. This kind of thing violates the GPL, but
         | I'm pretty sure (IANAL) that such code is "fruit of the poison
         | tree", and if you then copy it, you too can be held
         | responsible. Sure, you might not get caught, but it's a rough
         | situation if you do.
        
         | rebolek wrote:
         | Have you read MIT license? It explicitly says: The above
         | copyright notice and this permission notice shall be included
         | in all copies or substantial portions of the Software.
        
       | dgellow wrote:
       | Another fascinating one, an "About me" page generated by copilot
       | links to a real person's Github and twittter accounts!
       | 
       | https://twitter.com/kylpeacock/status/1410749018183933952
        
         | bencollier49 wrote:
         | That's bonkers. And the beauty of it is that now someone could
         | realistically do a GDPR Erasure request on the Neural Net. I do
         | hope that they're able to reverse data out.
        
           | qayxc wrote:
           | Since the information is encoded in model weights, I doubt
           | that erasure is even possible. Only post-retrieval filtering
           | would be an option.
           | 
           | It only goes to show that intransparent black-box models have
           | no place in the industry. The networks leak information left
           | and right, because it's way too easy to just crawl the web
           | and throw terabytes of unfiltered data at the training
           | process.
        
             | ohazi wrote:
             | I think the fact that there's no way to delete the data in
             | question without throwing away the entire model is a
             | feature...
             | 
             | The strategic goal of a GPDR erasure request would be to
             | force GitHub to nuke this thing from orbit.
        
             | bencollier49 wrote:
             | > Only post-retrieval filtering would be an option.
             | 
             | And illegal, if the original information remains.
             | 
             | I assume that there must be a process for altering the
             | training data set and rerunning the entire thing.
        
               | gmueckl wrote:
               | The problem is that the information is in an opaque
               | encoding that nobody can reverse engineer today. So it's
               | impossible to prove that a certain subset of data has
               | been removed from the model.
               | 
               | Say, you have a model that repeats certain PII when
               | prompted in a way that I figure out. I show you the
               | prompt, you retrain the model to give a different, non-
               | offensive answer. But now I go and alter the prompt and
               | the same PII reappears. What now?
        
               | computerex wrote:
               | Yes, but the compute costs required for training are
               | probably in the range of hundreds of thousands of usd to
               | potentially millions of usd. Not to mention potentially
               | months of training time.
        
       ___________________________________________________________________
       (page generated 2021-07-02 23:00 UTC)