hngopher.com

       [HN Gopher] An example of LLM prompting for programming
       ___________________________________________________________________
        
       An example of LLM prompting for programming
        
       Author : mpweiher
       Score  : 398 points
       Date   : 2023-04-18 11:21 UTC (11 hours ago)
        
 (HTM) web link (martinfowler.com)
 (TXT) w3m dump (martinfowler.com)
        
       | paphillips wrote:
       | One initial reaction to the prompting style is how similar it is
       | to a human-to-human interaction. For example, a team lead
       | communicating requirements to a wider team composed of less
       | experienced engineers may also follow this type of iterative
       | exchange, continuing until he or she is satisfied that the team
       | understands the work to be done and has the guide rails to be
       | successful.
       | 
       | I recently heard a description about the way this technology will
       | change technical work that resonated: we will become more like
       | the movie director, and less like the actors.
        
       | dpflan wrote:
       | This got me wondering about best techniques for integrating LLM
       | code assistants into day-to-day software development, and hence
       | Ask HN: What is your GitHub Copilot (code LLM assistant)
       | workflow?
       | 
       | Please share your experience here:
       | https://news.ycombinator.com/item?id=35613576
       | 
       | I'd like to learn what is working and useful.
        
       | yoyohello13 wrote:
       | I feel like from an information theory perspective there is a
       | lower bound on how little we can write to get a sufficiently
       | specific spec for the AI to generate correct code.
       | 
       | This example seems like almost as much work as just writing the
       | code myself. I think English is just too fuzzy, maybe eventually
       | we will get a language tailored to AI that will put more specific
       | limits on the meanings of works. But then how is it all that
       | different from Python?
        
         | blackbear_ wrote:
         | > a lower bound on how little we can write to get a
         | sufficiently specific spec for the AI to generate correct code.
         | 
         | Interesting though. I believe a lower bound for the number of
         | bits must be at least the log (in base 2) of the probability of
         | such code to appear "in the wild", and larger if the training
         | set is biased and/or the model not fully trained
        
       | gitgud wrote:
       | A useful approach, but this is a tiny green field project. I'm
       | not so sure it would work in a large existing proprietary system,
       | where you shouldn't describe too much of the " _NDA protected
       | context_ "...
        
         | themodelplumber wrote:
         | To me, this is likely an area where we'll see future coders
         | tested:
         | 
         | Interviewer: Here is a very specific project. And this part
         | here is NDA covered. We have provided a context prompt with all
         | the generals. Let's say you are new here and we need you
         | effective today. Show us how you'll cover the last mile with
         | the LLM by writing prompts that do not violate NDA but get the
         | needed work done. Then whiteboard for your team a prompt schema
         | & policy that you think will work for this project.
         | 
         | I.e. a creativity exercise at the very least. You want someone
         | who can code _for a prompt, to solve coverage problems_, and
         | this is still coding.
         | 
         | For now I think a lot of people will hoard this kind of prompt
         | info/leverage-pattern stuff when they discover it. It's not
         | about the individual prompts.
        
       | nice_byte wrote:
       | do people enjoy working this way? wasting time verbalizing your
       | thoughts, stating the obvious, wordsmithing to get the thing to
       | "understand" what you _actually_ want?
        
       | choeger wrote:
       | That's refinement-style programming from a novel angel, but still
       | clearly refinement-style.
        
       | chewbacha wrote:
       | I guess this is neat but I'd rather write code myself.
        
         | [deleted]
        
         | Sateeshm wrote:
         | Yes. I would rather write the code myself too. But it's a good
         | idea to use it to explore solutions or alternate
         | implementations
        
           | killingtime74 wrote:
           | It's like talking to a student or an intern. Which is not bad
           | normally because we are also educating them.
        
         | koheripbal wrote:
         | I'd rather farm all my own food, build my own house, and teach
         | my own kids, but I don't have infinite time each day.
        
           | nsxwolf wrote:
           | The prompt was almost as much work as the code, and there was
           | no way to write that prompt without a CS education and/or
           | years of development experience.
        
             | grrdotcloud wrote:
             | I have found that this applies to many Crafts.
             | 
             | Cutting wood is easy. Simple really. Crafting an attractive
             | and functional chair requires discipline. Designing it?
             | Brilliance.
        
           | throwaway290 wrote:
           | If you are optimizing for an end goal rather than enjoying
           | the process... What is that goal? Why does it matter?
        
           | goatlover wrote:
           | Presumably you have time to write your own code as a
           | developer, since you're not being paid to be a farmer or
           | carpenter?
        
       | chrisco255 wrote:
       | > He's using a generic application example in here: one thing to
       | be wary of when interacting with ChatGPT and the like is that we
       | should never put anything that may be confidential into the
       | prompt, as that would be a security risk. Business rules, any
       | code from a real project - all these must not enter the
       | interaction with ChatGPT.
       | 
       | Remember, when storing your business code on Github servers
       | hosted by Microsoft, it is important to not place real code from
       | a project into OpenAI servers hosted by Microsoft. That would be
       | a security risk.
        
         | hbn wrote:
         | The hosting is not the issue. Github would have different
         | security requirements for code hosted in a private repo for a
         | paying org than OpenAI would for free users sending prompts to
         | an LLM. It can and should be assumed anything you type into
         | ChatGPT is being logged to be potentially read by a human.
        
       | nbzso wrote:
       | Useful for form of learning and experimentation. Not applicable,
       | in my view, at all due to lack of ownership of the generated
       | code. There is no ability to copyright and protect the
       | intellectual output from generative AI processes.
       | 
       | Even when your prompts are clearly the pseudocode which creates
       | the scope of the generated response. Until this situation is
       | legally cleared, I will be very cautious to include LLM's outside
       | rapid prototyping and conceptual phase. Not to mention the
       | madness of AutoGPT or more realistic approach of LangChain.
       | 
       | It is early in the game and the Hype train is riding more rapidly
       | than crypto and web3 combined.
       | 
       | I see a lot of AI startups introducing the same capabilities
       | through OpenAI API and prompts, without consideration of prompt
       | injection risk. So we will see who will survive.
        
       | mk89 wrote:
       | For me chatGPT or phind (which is based on chatGPT4, if I
       | understood right) are great documentation tools and also general
       | productivity tools, nothing to say about it.
       | 
       | The main issue is that sometimes they really f** it up bad, they
       | make you rethink your knowledge quite deeply (do I remember
       | wrong? did I maybe understand this wrong? is chatGPT wrong?) and
       | this is for me something that can be worse than having to do it
       | myself, because it creates some sort of insecurity, as you always
       | have to challenge your self thinking, and this is not how we work
       | in our daily job, isn't it? At least this doesn't happen so
       | frequently to me - from time to time we have arguments in the
       | team, but this kind of "wrong information" feels more like
       | "hidden" traps than someone else arguing (with valid arguments,
       | of course).
        
         | themodelplumber wrote:
         | One thing that really bothers me is that I want it to use best
         | practices and it doesn't really know which ones I'm talking
         | about, and then I realize they are _my_ set of best practices,
         | made from others' nameless best practices.
         | 
         | So I have to decide if it's just a matter of manually
         | converting the 5-10 little things like using `env bash` in the
         | header, etc. Or do I ask it to remember that and proceed to the
         | next layer of the project, and feel like Katamari Coder, which
         | is quite a feeling of what-is-this-fresh-encumbrance at times.
         | 
         | There is a nascent sense that the interface is not even close
         | to where it needs to be to efficiently support that kind of
         | recall for working memory on the coder's end.
         | 
         | I can definitely see a new LLM relativistic-symbolic
         | instruction code & IDE-equivalent (with yet-unseen
         | presentational and let's even say modal editing factors) being
         | extremely useful, which is a bit funny but also that's what
         | those things are good for... Right now I can scroll up through
         | my prompts to supplement my working memory, but that's another
         | place where the whole activity starts to seem very tedious.
         | 
         | (Is the LLM coming for the coders, or are coders coming for the
         | LLM?)
        
           | dpkirchner wrote:
           | > Or do I ask it to remember that and proceed to the next
           | layer of the project
           | 
           | I think this could be solved with a good browser extension.
           | Something that provides an easy to access (e.g., keyboard-
           | only) way to paste customized prompt preludes that enforce
           | your style (or styles if, say, you're using multiple
           | languages).
           | 
           | It looks like Maccy could do the job, albeit not as an
           | extension. I haven't tried it yet.
        
             | themodelplumber wrote:
             | I tried one kinda like this. Setting aside the extension
             | feel of it, what I'd like to see is a move from prompt-
             | helper to pattern language for visually reporting the
             | process of working with the LLM, to which the LLM has
             | parsing access.
             | 
             | So, let's say you can see your conversation as normal, but
             | you can also see your actual code project as a node-based
             | procedural design layout in an editable window. The
             | relevant conversation details are used to draw the nodes.
             | 
             | You go to one node representing a bash script and click its
             | Patterns tab and search-to-type for the community pattern,
             | "Joe's Best Bash Practices". It's added to your quick
             | palette and LLM offers to add similar patterns to other
             | nodes in Nim and Pascal and ABS, but actually for ABS
             | there's a "concept" symbol that indicates it's only going
             | to be able to guess what you would want based on the
             | others.
             | 
             | Then it offers to gradually teach you node-shorthand as you
             | edit the project, so eventually you don't need to write any
             | prompts, just basic shorthand syntax. Where the syntax gets
             | clunky, or when you buy a custom keyboard just for this
             | syntax but with a few gotchas, you can work together and
             | change syntax to fit.
             | 
             | Nbdy hus lrnd shrtnd nos knda whr m gng wths.
        
           | mason55 wrote:
           | I think that Copilot is much better/more promising for this
           | kind of thing because it's looking at the code you've already
           | written without you having to constantly prompt it.
           | 
           | I had a lot of the same hangups as you when I had played
           | around with ChatGPT. How do I get it to handle the monotonous
           | stuff without me having to spend all my time teaching it?
           | 
           | I finally tried Copilot the other day and it was stunning. I
           | had a half-written golang client that was a wrapper around an
           | undocumented and poorly structured API for a tool we use. I
           | had written the get and create methods. Then I added a
           | comment with an example URL for delete and Copilot auto-
           | completed the entire method in the same style as the two
           | methods I had already written. In some cases, like formatting
           | & error handling, it was exactly the same as what I'd
           | written, but other cases, like variable naming, string
           | templating, etc., it replicated the spirit of my style but
           | adapted for this new "delete" method.
           | 
           | I think ChatGPT is just the wrong interface for this kind of
           | thing (at least right now).
        
             | Filligree wrote:
             | They're complementary, I'd say. GPT-4 handles greenfield
             | development better; you can tell it to write a quick
             | script, and usually it more or less works. Copilot doesn't
             | do much when you're looking at a blank page.
             | 
             | This would make copilot the better tool in 90% of cases,
             | but I've been using GPT-4 to script a lot of things I
             | previously would never have scripted at all. It reduces the
             | cost to where even one-off scripts for a twenty minute job
             | are usually worth writing.
        
         | rootusrootus wrote:
         | One thing ChatGPT (specifically, the GPT4 version) keeps doing
         | to me is confidently lying, and when I call it out, apologizing
         | and spitting out another response. Sometimes the right answer,
         | sometimes another wrong one (after a couple tries it then says
         | something like "well, I guess I don't have the right answer
         | after all, but here is a general description of the problem")
         | 
         | Part of me laughs out loud (literally, out loud for once) when
         | it does that. But the other part of me is irritated at the
         | overconfidence. It is a potentially handy tool but keep the
         | real documentation handy because you'll need it.
        
         | moonchrome wrote:
         | Honestly to me it happens more than it doesn't - but maybe
         | that's because I've tried it in cases where I've already used
         | traditional approaches to come up with the answer and going to
         | GPT and phind to benchmark their viability.
         | 
         | I've mentioned it on other thread, but phind's "google-fu" is
         | weak, it does a shallow pass and bing index (I'm assuming) is
         | worse than google. It's also slow as hell with GPT4 which makes
         | digging deeper slower than just manually going in.
        
       | isaacfrond wrote:
       | The article stresses to _never put anything that may be
       | confidential into the prompt_. Yet, chatGpt offers to out-out
       | from using your data for training.
       | 
       | For most purposes that seems to be sufficient doesn't it? Or are
       | there reasons not to trust OpenAi on this one?
        
         | vharuck wrote:
         | I will never have full trust in an assertion unless (a) it's
         | included in a contract that binds all parties, (b) the same
         | contract includes a penalty for breaking the assertion that's
         | severe enough to discourage it, and (c) I know the financial
         | and other costs of litigation won't be severe for me.
         | 
         | In short, unless my large employer will likely win in punishing
         | OpenAI should they break a promise, that promise is just
         | aspirational marketing speak.
         | 
         | For data retention and usage, I'd also need a similar
         | contractual agreement to tie the hands of any company that
         | would acquire them in the future.
        
         | twelve40 wrote:
         | Copilot for individuals stores code snippets by default
         | according to their TOS. Sure, you can probably find a way to
         | opt out of that somewhere as well, but you'd have to read the
         | TOS for every plugin and service you use, find the opt-out
         | links and make sure you don't opt-in again via some other route
         | (such as not Copilot but ChatGPT proper or some other Github,
         | VSCode or some other plugin or service button or knob).
        
         | themodelplumber wrote:
         | > Or are there reasons not to trust OpenAi on this one?
         | 
         | Yes, more related to general tech history and not a dig on
         | OpenAI though.
        
         | blowski wrote:
         | From a GDPR or commercial confidentiality perspective, it
         | doesn't matter what OpenAI say they'll do with your data, you
         | can't share it with them.
         | 
         | Let's say your doctor enters sensitive info about you, and
         | despite having told OpenAI not to train data with it, they use
         | it anyway due to a bug. A year from now, ChatGPT is generating
         | personal information tells everyone and anyone about your
         | sensitive info.
         | 
         | Would you exclusively blame ChatGPT?
        
         | dustypotato wrote:
         | There was a bug where chat history of some users were visible
         | to others
        
         | clarge1120 wrote:
         | > are there reasons not to trust OpenAi on this one?
         | 
         | Yes, the fact that they are closed, not open, for one. And that
         | they switched from open to closed the moment it benefited them
         | to do so.
        
       | [deleted]
        
       | pcthrowaway wrote:
       | I've tried using ChatGPT for writing Vitest tests, and it can't
       | do it, full stop.
       | 
       | If you look at the end, it parroted out some tests for _jest_.
       | True, the APIs are mostly compatible and you can probably change
       | that to Vitest with a couple of lines of code changed, but for
       | more advanced tests, that won 't necessarily work.
       | 
       | Really disappointed to see this so highly upvoted, when it's pure
       | garbage
        
         | joshribakoff wrote:
         | That library doesn't even appear to have a stable release yet,
         | and was at v0.0.x as of a year or so ago... you also may be
         | using chatGPT 3.5 which may predate this library. As a dev with
         | 15 years of experience I haven't even switched over from jest
         | (but plan to)... all this to say, maybe we can give the bot
         | some slack here. It should be possible to include vitest docs
         | and examples in your prompts to teach it in context, did you
         | try that?
        
           | pcthrowaway wrote:
           | Sure, I realize it's unsuccessful at using vitest _because_
           | it 's (relatively) new.
           | 
           | I'm just saying, this was a really telling example of how to
           | use it for prompting.
           | 
           | A _very large_ chunk of the tools I use in Javascript-land
           | are  "too new" for ChatGPT to work with properly.
           | 
           | Giving context unfortunately doesn't really work as ChatGPT
           | usually prioritizes what it's absorbed through the corpus
           | over anything you tell it.
           | 
           | To be clear, it does _fine_ with new information if the
           | things you ask it for don 't match token sequences it's
           | already been trained on, so if you give it a fictional
           | library and ask it to perform some task with it, that doesn't
           | seem too much like the things it might do with another
           | library that accomplishes a similar thing with a similar API,
           | it will actually use the custom code more successfully.
           | 
           | But for Vitest, it can't accept enough of the docs you might
           | provide for it to be useful to you (though admittedly,
           | sometimes it will show how to do something with jest that at
           | least makes finding the right thing in vitest easier).
           | 
           | By the way, if you are planning to switch over in the future,
           | the path for doing that is seemingly well documented by
           | vitest and seems to be pretty straightforward as well, though
           | I haven't meaningfully used Jest for comparison
           | 
           | edit: to be clear, I'm very impressed with ChatGPT's
           | capabilities, and I think there are good examples of
           | prompting where it does meaningful work in tandem with the
           | human driver exercising their own judgment.
           | 
           | This was an example of a person asking it for things while
           | not pointing out its limitations, which downplays the extent
           | to which one needs to exercise one's judgment when using it.
           | If they failed to point out the things ChatGPT got wrong
           | which _I_ know about, why would I trust that the other things
           | I don 't know it got wrong are accurate.
        
       | [deleted]
        
       | upwardbound wrote:
       | Public service announcement that myself and others are actively
       | trying to poison the training data used for code generation
       | systems. https://codegencodepoisoningcontest.cargo.site/
       | 
       | See previous discussion here:
       | https://news.ycombinator.com/item?id=35545442
        
       | irrational wrote:
       | Maybe this will get people to finally sit down and do some
       | thinking, planning, pseudo-code, etc. before diving in and
       | starting to code.
        
       | afro88 wrote:
       | The article shows everything that works for this approach. But
       | it's a bit disingenuous. At the end:
       | 
       | > Once this is working, Xu Hao can repeat the process for the
       | rest of the tasks in the master plan.
       | 
       | No, he can't. After that much back and forth and getting it to
       | fix little things where it gives responses with the full code
       | listing again, he would have easily hit the token limit (at least
       | with any chat LLM capable of this quality code and conversation -
       | ChatGPT). The LLM will start hallucinating the task list, the
       | names of functions it wrote earlier etc. and the responses would
       | get less and less useful with more and more "this doesn't work,
       | can you fix X".
       | 
       | So anyone following this approach will hit a footgun after task
       | 1.
       | 
       | For anyone that really wants to follow this approach, the next
       | step is to start a new chat and copy/paste the inital requirement
       | prompt, put the task list in there, any relevant code, adjust the
       | instruction (ie "help me with task 2") and go from there.
       | 
       | It is of limited utility though. By step 3 (or even 2) you end up
       | with so much code that you're at the token limit anyway and it
       | can't write code that fits together.
       | 
       | Where I've found ChatGPT 4 useful is getting me going on
       | something, providing boilerplate, and unblocking me.
       | 
       | If you don't know how to approach a problem like the "awareness
       | layer" (like I didn't before reading the post), you can get a
       | great breakdown and starting point from ChatGPT. Similarly, if
       | you're not sure how to approach that view model, or write tests
       | etc. And if you want a first draft of code or tests.
       | 
       | All that said, I'm looking forward to much larger and affordable
       | token limits in future.
        
         | Tostino wrote:
         | You iterate on your plan after it is generated step by step.
         | You go and edit the prompt chain you started working on step 1
         | on, and modify it to start working on step 2 (including any
         | ideas or fixes you have identified while implementing step 1.
         | Repeat until complete.
         | 
         | You can still absolutely hit the context limit, but you are far
         | less likely to do so if you go back and start a new prompt
         | chain for each different thought process you are going through
         | with it.
        
           | afro88 wrote:
           | Great idea. But does it get hard to navigate back to
           | something in older chat histories though?
           | 
           | I find a new separate chat with the revised initial prompt to
           | be easier.
        
             | williamcotton wrote:
             | I've been using another call to an LLM to write or rewrite
             | code that is separate from the main "conversation".
             | 
             | What I mean is that I've got a dialog going with an LLM and
             | I've trained it to call a build() function with
             | instructions that then returns the function, with the text
             | of the function kept out of the dialog with the main
             | thread.
        
         | ryanjshaw wrote:
         | Your experience matches mine closely. I've had ChatGPT-4 do
         | great and then it just gets confused after a while. I can
         | literally tell it "task X is done" and it'll apologise and show
         | me a list of tasks where X is still not done - this is clearly
         | not just a context window issue, as I have repeated variations
         | of my statement over and over in the same session and the issue
         | persists.
         | 
         | I have ended up using it the same way you have - it's honestly
         | the best anti-procrastination tool I've ever used because I can
         | tell it my intentions, what I've thought of so far... and it'll
         | spit out a list of bite-sized chunks that get me going. I find
         | myself looking forward to telling the AI I've completed a task.
         | 
         | Similarly, if I'm facing a tricky design decision, I find that
         | just writing it out for ChatGPT is extremely helpful for
         | clarifying my thought process. I actually used to do this
         | conversational decision making process in a text editor long
         | before ChatGPT, but when I know there's an AI on the other end
         | my thinking becomes clearer and more goal-oriented. And unlike
         | talking to myself or a human friend, it's happy to just say
         | "well if these are your concerns, let's start HERE and then see
         | what happens".
        
           | peterashford wrote:
           | Ooh! That's a really good point - ChatGPT is effectively
           | rubber-ducky as a service =)
        
             | Fuzzwah wrote:
             | This is exactly how I've been explaining LLM tech to my
             | "non-geek" friends and family. I start by explaining rubber
             | ducking, and how I now use chatgpt as a more advanced
             | version of the process.
        
           | travisjungroth wrote:
           | Good rule of thumb with ChatGPT: you can't exit loops. Once
           | you've gone A > B > A, your best move is to start a new chat.
           | Even then it may reproduce and you should do some similar but
           | different task. Remember that it's a prediction engine,
           | weighing heavily on the existing prompt. So you say B again,
           | or B1 and it's like, I know what to do! A! Cause last time
           | was A->B so let's do it again.
           | 
           | In your case this would be "[]Task1", "Task1 is done",
           | "[]Task1", [here is where you start a new chat or fix it
           | yourself if possible].
        
           | hn_throwaway_99 wrote:
           | Hmm, I also use ChatGPT as an anti-procrastination tool and
           | task manager, and it's never made a mistake with keeping
           | track of my task list (except that when it sums the estimated
           | times of subgroups of tasks, sometimes those sums are wrong).
           | 
           | Note that it outputs my updated task list every time I add or
           | remove a task (I only asked it to do that one time), so even
           | if old messages go outside of the context window, it's not a
           | big deal because the full updated state of the list is output
           | basically every other message.
        
         | quijoteuniv wrote:
         | It's great to see that there's now a term for the type of
         | prompting, "generated knowledge". I've been experimenting with
         | this technique since the beginning, and I've noticed a
         | significant improvement in version 4. The process involves
         | outlining the project, creating tasks, and feeding them back to
         | chatGPT as you progress. This approach has helped me complete
         | projects that would have otherwise taken me much longer to
         | finish.
         | 
         | It's also useful for creating practical tutorials. While there
         | are plenty of tutorials available online, sometimes you need
         | guidance on a specific set of technologies. By using generated
         | knowledge prompts, you can get a good outline and tasks to help
         | you understand how these technologies interact.
         | 
         | One thing to keep in mind is to avoid derailing the
         | conversation with questions that are not relevant to the core
         | tasks. If you get stuck on something and need to debug, it's
         | best to use a separate conversation to avoid derailing the
         | project's progress and the allucinations & forgettingness
        
           | AzzieElbab wrote:
           | Something must be wrong with me. I could never get anything
           | useful from Martin Fowler's writings, and coincidentally I
           | cannot get any functional code out of ChatGPT. Even the
           | boilerplate it produces for me needs to be corrected. I still
           | use chatGPT to produce examples of abstract things but was
           | not able to get any working code that matches concrete
           | problems or even compiles.
        
             | afro88 wrote:
             | Are you using the GPT4 model? There's a very significant
             | improvement between 3.5 (the free one) and 4.
        
               | AzzieElbab wrote:
               | I am supposedly on GPT4 via GPT+. I try using it for
               | boilerplatey things, like terraform, and the results are
               | simply incorrect. It seems more helpful in providing
               | examples, even for some far more complex tech - like rust
               | code.
        
               | simonw wrote:
               | Does it say GPT-4 at the top of the screen?
        
           | afro88 wrote:
           | Absolutely, and same here. I've done multiple tools that
           | would have taken 2-3 days each in 2-3 hours each.
           | 
           | > One thing to keep in mind is to avoid derailing the
           | conversation with questions that are not relevant to the core
           | tasks. If you get stuck on something and need to debug, it's
           | best to use a separate conversation to avoid derailing the
           | project's progress and the allucinations & forgettingness
           | 
           | Definitely. Great advice.
           | 
           | Another tip: don't bother asking it to fix small things. Just
           | mention you fixed it in the next reply and move on.
        
       | hinkley wrote:
       | What I would really love is if we had a broader linting tool
       | built on this sort of tech that could go the other way.
       | 
       | So often we are halfway through refactoring the code from a bad
       | pattern that has a proven track record of issues, to one that at
       | least prevents the worst ravages of the old one. There are never
       | any guarantees that you will get everyone on board for this.
       | Someone will defect, and they will keep copying and pasting the
       | old pattern and if they code faster than you then you never get
       | to the end.
       | 
       | Give me a way to mark a bunch of code as 'the old way' and hook
       | that information into autocomplete or even just a linter that
       | runs at code review time.
        
       | 1024core wrote:
       | For some reason, this reminds me of how we used to give
       | instructions to Indian coders in the 90s and early 2000s. You
       | would have to spell out everything. What you got back was nearly
       | there, but some back-and-forth was involved.
       | 
       | This brings back some terrible memories.
        
         | mpaepper wrote:
         | The big difference is that you get the results immediately and
         | iterations take minutes not days
        
           | DonHopkins wrote:
           | And no time zone differences!
        
           | krupan wrote:
           | Yes, you can a ton more code that you have to check over with
           | a fine toothed comb in much less time! Is that a win?
        
       | m3kw9 wrote:
       | Asking LLM to write complex code could approach the speed of
       | writing the code yourself. Having it plan it out could however
       | kick start a nice direction. LLMs are great for single clear
       | function code.
        
       | acomjean wrote:
       | Isn't the point of code to express what we want in a succinct and
       | expressive way.
       | 
       | If we're needing all this software to help us, maybe we should
       | look at the languages we're using and make better more intuitive
       | ones.
        
       | cwp wrote:
       | To me, this is a great illustration of why chat is a terrible
       | interface for a coding tool. I've gone down this path as well,
       | learning that you need to have a detailed prompt that establishes
       | a lot of context, and iteratively improve it to generate better
       | code. And yup, generating a task list and working from that is
       | definitely a key strategy for getting GPT to do anything bigger
       | than a few paragraphs.
       | 
       | But compare that to Copilot: Copilot doesn't help much when
       | you're starting from scratch, and there's nothing for it to work
       | with. But once you have a bit of structure, it starts to make
       | recommendations. Rather than generating large chunks of code, the
       | recommendations are small, chunks of a few lines or maybe even
       | one line at a time. And it's sooooo good at picking up on
       | patterns. As soon as you start something with built-in
       | symmetries, it'll quickly generate all the permutations. It's
       | sort of prompting by pointing.
       | 
       | This is so. much. better. than writing prompt for the chat
       | interface. I'm really excited to see where these kinds of tools
       | lead.
        
         | mjr00 wrote:
         | Absolutely. People will quickly realize that for coding, the
         | natural language part of LLMs is a distraction. Copilot is
         | _much_ better for someone actually writing code, but
         | unfortunately doesn 't get as emphasized due to the narrative
         | surrounding LLMs right now.
        
           | moffkalast wrote:
           | Has the Copilot backend been updated to use anything more
           | advanced yet? I tried it out when it was new and free for a
           | while and it really struggled with anything that wasn't
           | incredibly common. GPT 4 in its chat form works a whole lot
           | better for niche stuff than that one did.
        
             | gunapologist99 wrote:
             | It's definitely far better than when it was free but not
             | GPT4 yet for most people.
             | 
             | It's the opposite of chatGPT: it takes more time to produce
             | useful output but it gets much better in more complex
             | programs while ChatGPT gets worse.
        
             | kgeist wrote:
             | Copilot's original underlying model is currently
             | deprecated, if I remember correctly
        
           | yodsanklai wrote:
           | > Copilot is much better for someone actually writing code
           | 
           | I haven't used copilot yet, but I'm using occasionally
           | chatgpt with prompts such as "write a bash/python script take
           | takes these parameters and perform this tasks". Then I
           | iterate if needed, and usually, i can get what i want faster
           | than without using chatgpt. It's not a game changer, but it's
           | a performance boost.
           | 
           | How natural language is a distraction here? and how copilot
           | would do much better for the same task?
        
             | visarga wrote:
             | > It's not a game changer, but it's a performance boost.
             | 
             | The story of all AI in 2023 - maybe 2x performance
             | improvement, maybe a bit less. The big problem is that you
             | can't trust it on its own, so it doesn't improve
             | productivity 100x. Not even a receipt reader is good enough
             | to reach 100%, you got to check the total, maybe it missed
             | the dot and you get the 100x boost after all.
        
             | mjr00 wrote:
             | Try not using natural language and just type what you'd
             | type into Google. You'll get the same results and realize
             | that all of the natural language fluff is totally
             | unnecessary. I just typed in "bash script recursive chmod
             | 777 all files" (as a dumb toy example) and got a resulting
             | script back. It was surrounded by two natural language GPT
             | comments:
             | 
             | > It's generally not recommended to give all files and
             | directories the 777 permission as it can pose a security
             | risk. However, if you still want to proceed with this,
             | here's a bash script that recursively changes the
             | permission of all files and directories to 777: [...] Make
             | sure to replace "/path/to/target/directory" with the path
             | of the directory you want to modify. To run the script,
             | save it as a file (e.g., "chmod_all.sh"), make it
             | executable with the command "chmod +x chmod_all.sh", and
             | then run it with "./chmod_all.sh".
             | 
             | It's up to the reader to decide if those are necessary, but
             | I'd lean towards no.
        
               | [deleted]
        
               | gunapologist99 wrote:
               | No script needed:                 chmod ugo+rwX . -R
               | 
               | (This is for GNU chmod like in Linux, BSD will be
               | slightly different)
               | 
               | Of course, that's not exactly what you asked for (it's
               | better, read the chmod man page: X applies executable
               | only to directories) but you could just replace ugo+rwX
               | with 777 or 0777.
        
               | kenjackson wrote:
               | I tried this with the following:
               | 
               | "Bash script to add a string I specify to the beginning
               | of every file in a directory, unless the file begins with
               | "archive""
               | 
               | I tried looking for this on Google and didn't find
               | anything that did this -- although I could cobble
               | together a solution with a couple of queries.
               | 
               | The interesting thing is that I wanted ChatGPT to append
               | the string to the filename -- that's what I meant. But it
               | actually append the string to the actual file. That's
               | actually what I said, so I give it credit for doing what
               | I said, rather than what I meant. And honestly my intent
               | isn't necessarily obvious.
               | 
               | I definitely see this as a value add over just searching
               | with Google.
        
               | scarface74 wrote:
               | > Try not using natural language and just type what you'd
               | type into Google. You'll get the same results and realize
               | that all of the natural language fluff is totally
               | unnecessary.
               | 
               | I can get _similar_ results with Google sometimes and I
               | can put together what I learned from different places.
               | 
               | But I can get scripts that meet my exact requirements
               | with ChatGPT. Most of my ChatGPT related code is
               | scripting AWS related code and CloudFormation templates.
               | 
               | I've asked it to translate AWS related Python code to
               | Node for a different projects and a bash shell script.
               | It's well trained on AWS related code.
               | 
               | I don't know PowerShell from a hole in the wall. But I
               | needed to write PS scripts and it did it. I've also used
               | it to convert CloudFormation to Terraform
        
               | mjr00 wrote:
               | I think you (and kenjackson above) are misinterpreting
               | what I was saying. I'm not saying use Google instead of
               | ChatGPT; I'm saying _pretend ChatGPT is Google_ and
               | interact with the ChatGPT text prompt the same way. You
               | don 't need fully formed coherent sentences like you
               | would when talking to a person; just drop in relevant
               | keywords and ChatGPT will get you what you want.
        
               | scarface74 wrote:
               | Isn't that the game changer though that you can use
               | natural language and treat it like the "worlds smartest
               | intern" and I can just give it the list of my
               | requirements?
               | 
               | It's the difference between:
               | 
               | "Python script to return all of the roles with a given
               | policy AWS" (answer found on StackOverflow with Google)
               | 
               | And with ChatGPT
               | 
               | "Write a Python script that returns AWS IAM roles that
               | contain one or more policies specified by one or more -p
               | arguments. Use argparse to accept parameters and output
               | the found roles as a comma separated list"
        
               | mjr00 wrote:
               | > "Write a Python script that returns AWS IAM roles that
               | contain one or more policies specified by one or more -p
               | arguments. Use argparse to accept parameters and output
               | the found roles as a comma separated list"
               | 
               | Again, this is completely unnecessary. This is like in
               | the old days when technically illiterate people would
               | quite literally Ask Jeeves[0] and search for full
               | questions because they didn't know how to interface with
               | a search engine.
               | 
               | A prompt that does exactly what you're asking: "python
               | script get AWS IAM roles that contain a policy, policy as
               | -p command line argument, output csv"
               | 
               | We'll see more of that terse, efficient, style as people
               | get more comfortable, similar to how people have (mostly)
               | stopped using full questions to search on Google. The
               | "talk to ChatGPT like a human" part is entirely a
               | distraction from taking advantage of the LLM for coding
               | purposes. Perhaps more importantly, the responses being
               | humanized is a distraction, too.
               | 
               | [0] https://en.wikipedia.org/wiki/Ask.com
        
               | scarface74 wrote:
               | At first, when I didn't specify "use argparse" it would
               | use raw argument parsing
               | 
               | It also thought I actually wanted a file called
               | "output.csv" based on your text and gave me an actual
               | argument to specify the output file that I didn't want.
               | 
               | There is a lot of nuance to my requirements that ChatGPT
               | missed with your keywords.
               | 
               | Sidenote: there is a bug in both versions and also when I
               | did this for real. Most AWS list APIs use pagination. You
               | have to tell it that "this won't work with more than 50
               | roles" and it will fix it.
        
               | vorticalbox wrote:
               | you can always include the instruction to only return the
               | code and no other text
        
               | mjr00 wrote:
               | Sure, but I want a system built for coding that does that
               | by default... like Copilot.
        
               | iudqnolq wrote:
               | ... and it'll describe the code anyway, at least to me.
        
           | avereveard wrote:
           | Idk for sure autocomplete is a great interface for someone in
           | the ide coding, but LLM can understand requirements whole and
           | spit out full classes and validate that the output from the
           | server matches the specs, they work great from outside an
           | ide.
        
         | supernikio2 wrote:
         | Exactly this. I've tried to implement ChatGPT into may daily
         | workflow, but you have to give it an excruciating level of
         | detail to get something that remotely resembles real code I'd
         | use, and even then you have to hold its hand to guide it in the
         | correct direction, and still have to make some manual final
         | touches at the end.
         | 
         | This is why I'm looking forward to Copilot X so much. It will
         | hold much more context than the current implementation, and
         | integrate the Chat interface that's so natural to us.
        
         | maroonblazer wrote:
         | As a hobbyist developer with no formal training, I wish Copilot
         | had a 'teaching' or "Senior Dev" mode, where I can play the
         | role of the Junior Dev. I'd like it to pick up on what I'm
         | trying to write, and then prompt me with questions or hints,
         | but not straight up give me the code.
         | 
         | Or, if that's too Clippy-like annoying, let me prompt it when
         | I'm stuck, and only then suggest hints or ask suggestive
         | questions that guide me to a solution.
         | 
         | I agree, very exciting to see where all this goes.
        
           | cwp wrote:
           | One thing you might try with Copilot is to ask it to explain
           | the code. It can often give insight, even on code that you
           | yourself wrote a few minutes ago.
        
           | ukuina wrote:
           | The Github Copilot Labs extension has "codebrushes" that can
           | transform and explain existing code instead of generating new
           | code, but none of it only gives "hints". Maybe one of the
           | codebrushes can take a custom prompt.
        
             | SparkyMcUnicorn wrote:
             | You can create custom brushes, or open the "CoPilot Labs"
             | panel and "explain" with a custom prompt.
        
         | SamPatt wrote:
         | I've noticed that after using copilot on a code base for a
         | while, you can effectively prompt the AI just by creating a
         | descriptive comment.
         | 
         | // This function ends the call by sending a disconnection
         | message to all connected peers
         | 
         | Bam, copilot will recommend at least the first line, with
         | subsequent lines usually being pretty good, and more and more
         | frequently, it will recommend the whole function.
         | 
         | I still use GPT-4 a lot, especially for troubleshooting errors,
         | but I'm always pleasantly surprised at how good copilot can be.
        
           | armchairhacker wrote:
           | Copilot is a game-changer and very underrated IMO. GPT4 is
           | smart but not really used in production yet. Copilot is
           | reportedly generating 50% of new code and I can't imagine
           | going without it.
        
             | Keyframe wrote:
             | I would really love to see that. So far, all I've seen is
             | cookie cutter code to reduce a bit of typing time.
             | Everything else was more or less hot garbage that just
             | stood in the way of typing. Maybe in a few iterations or
             | years. So far, personally, I haven't seen anything useful.
             | Not saying there isn't anything, just that I haven't seen
             | any use and code offered by it stank. Is there a demo of
             | someone using it to showcase this game-changing power?
        
               | armchairhacker wrote:
               | Copilot only writes boilerplate, it can't really handle
               | anything non-trivial. But I write a lot of boilerplate,
               | even using abstraction and in decent programming
               | languages. A surprising amount of code is just
               | boilerplate, even just keywords and punctuation; and
               | there's a lot of small, similar code snippets that you
               | _could_ abstract, but it would actually produce more code
               | and /or make your code harder to understand, so it isn't
               | worth the effort.
               | 
               | Plus, tests and documentation (Copilot doubles as a good
               | sentence/"idea" completer when writing).
        
               | SamPatt wrote:
               | It surprises me to hear this. Have you used it as I
               | described by writing a descriptive comment first then
               | waiting to see its response?
               | 
               | I only noticed it getting good at this after I was
               | somewhat far along on a project, so I assume it requires
               | an overall knowledge of what you're trying to do first.
        
             | smashface wrote:
             | Where do you get that 50% number? Do you mean 50% of all
             | new code in the industry? That seems beyond extremely
             | unlikely.
        
               | moyix wrote:
               | The number is 40%, and it's 40% of code written _by
               | Copilot users_. It 's also just for Python:
               | 
               | > In files where it's enabled, nearly 40% of code is
               | being written by GitHub Copilot in popular coding
               | languages, like Python--and we expect that to increase.
               | 
               | https://github.blog/2022-06-21-github-copilot-is-
               | generally-a...
        
               | Nifty3929 wrote:
               | It's all about the denominator!
        
               | iudqnolq wrote:
               | I wonder if this properly counts cases where copilot
               | writes a bunch of code and then I delete it all and
               | rewrite it manually.
        
               | moyix wrote:
               | From what I remember they check in at a few intervals
               | after the suggestion is made and use string matching to
               | check how much of the Copilot-written code remains.
        
               | jvanderbot wrote:
               | There was some discussion by the copilot team that x% of
               | new code _in enabled IDEs_ was generated by copilot.
               | 
               | It varies, but here's one post with x=46 from last month.
               | So, very close to half.
               | 
               | https://github.blog/2023-02-14-github-copilot-for-
               | business-i...
        
               | 2devnull wrote:
               | Measuring output by LOC is not a very useful metric. The
               | sort of code that's most suited to ai is closer to data
               | than code.
        
               | [deleted]
        
               | fnordpiglet wrote:
               | (I read it as 50% of their code)
        
             | jvanderbot wrote:
             | For my side projects, copilot easily generates 80% of the
             | code. It snoops around the local filesystem and picks up my
             | naming schemes and style to help recommend better. It makes
             | me so much more productive.
             | 
             | For work projects, I tried it on some throwaway work
             | because we're still not allowed to use it for IP reasons,
             | but it is very good at finding small utility functions to
             | help with DRY, and can help with step by step work, but
             | can't generate helpful code quite as easily since some of
             | our API and codebase just doesn't follow its own norms or
             | conventions, and it seems to me that copilot makes a lot of
             | guesses based on its detected conventions.
        
               | iudqnolq wrote:
               | > It snoops around the local filesystem and picks up my
               | naming schemes and style to help recommend better.
               | 
               | Are you sure about this? It doesn't seem to work on my
               | machine. I think it will infer things that might be in
               | other modules, but only based on the name. I'm basing
               | this on the fact it assumes my code has an API shape
               | that's popular but that I don't write (eg free functions
               | vs methods).
        
               | armchairhacker wrote:
               | It looks at your recently-viewed files in your IDE. I
               | don't think it looks at anything outside your open
               | workspace but maybe...
        
         | throwaway202303 wrote:
         | People have different preferences and habits. Having tried both
         | models I much prefer having a conversation in one window and
         | constructing my code from that in another. Although copilot is
         | about to add some interesting features that may win me back.
        
         | barbariangrunge wrote:
         | Either way, you're sending your companys biggest asset to
         | another company, aren't you? I'll try these tools when they
         | start being able to run locally
        
           | koonsolo wrote:
           | I surely hope they use my copyrighted code and make millions
           | out of it. Ideal case for me to sue them for lots of money.
        
             | freedomben wrote:
             | How would you ever know? It will come in chunks of a dozen
             | or less lines at a time and it will be written into your
             | competitor's proprietary codebase (that you don't have
             | access to).
        
               | survirtual wrote:
               | Right.
               | 
               | If you are building something truly valuable locally, and
               | it is innovative or otherwise disruptive and relies on
               | being a first mover, centrally hosted LLMs are a non-
               | starter.
               | 
               | Most software corps have countless millions of lines of
               | code. You'd be spending lifetimes tracing where someone
               | ripped your "copyrighted" techniques and methods.
               | 
               | The complete lack of security awareness and willingness
               | to compromise privacy for convenience in people deeply
               | saddens me.
        
               | HALtheWise wrote:
               | > GitHub Copilot [for business] transmits snippets of
               | your code from your IDE to GitHub to provide Suggestions
               | to you. Code snippets data is only transmitted in real-
               | time to return Suggestions, and is discarded once a
               | Suggestion is returned. Copilot for Business does not
               | retain any Code Snippets Data.
               | 
               | Likely, some employee would whistleblow that they're not
               | complying with their privacy policy, and either
               | government litigation or a class action lawsuit would
               | ensue. That legal process would involve subpoenas and
               | third-party auditors being granted access to
               | GitHub/Microsoft's internal code and communications
               | history, which makes it pretty hard to hide something as
               | big as collecting, storing, and then training from a huge
               | amount of uploaded code snippets they promised not to.
               | 
               | It's not inconceivable that they're noncompliant, but my
               | bet would be that if they _are_ collecting data they
               | explicitly promise not to it 's an accidental or
               | malicious action by an individual employee, and they will
               | _freak out_ when they discover it and delete everything
               | as soon as they can. If they intended to collect that
               | data, it would be much easier to write that into the
               | policy than deal with all the risk.
               | 
               | Notably, this applies to Copilot for Business, which is
               | _presumably_ what you 're using if you are at work.
        
               | zelphirkalt wrote:
               | Couldn't it happen more subtly, without having the code
               | lying around for long? The model could be doing online-
               | learning (ML term) and only then they discard code that
               | they get send. This means your code could appear in other
               | people's completions/suggestions, without it having to
               | lie anywhere. It is basically learned into the model. The
               | code could appear almost or even completely verbatim on
               | someone else's machine, possibly working for a
               | competitor. Even that it is your code would not be
               | obvious, because MS could claim, that Copilot merely
               | accidentally constructed the same code from other learned
               | code.
               | 
               | Not sure that this is how the model works, but it is
               | conceivable.
        
           | SanderNL wrote:
           | I sort of disagree that code is the biggest asset. Take the
           | Yandex leak. What can you do with it? Outcompete them?
        
             | visarga wrote:
             | > Take the Yandex leak. What can you do with it?
             | 
             | Obviously, add it to the big training set of the next code
             | model.
        
           | throwaway202303 wrote:
           | No or no company would be able to use it. As you type
           | fragments of code are sent and discarded after use. You need
           | to trust Microsoft to actually do the discarding but
           | contractually they do and you can sue them if they
           | accidentally or deliberately keep your code around or
           | otherwise mismanage it.
        
             | xiphias2 wrote:
             | They are obligated to give data to the government, and
             | government took part of spying in Brazil for Boeing in the
             | past, but I guess they are using this capability only for a
             | few strategic companies, and most companies are not that.
        
             | zelphirkalt wrote:
             | But that is naive, isn't it? Who has the money and time in
             | their life, to actually sue MS? Even if "you" is a
             | business, few will have the resources for that.
        
               | justinhj wrote:
               | Individuals do not (although a class action would be
               | feasible), but large companies that use Github and other
               | Microsoft products, of course they have both the means to
               | sue Microsoft and the motivation should their business be
               | impacted.
        
               | throwaway202303 wrote:
               | Exactly
        
       | [deleted]
        
       | themodelplumber wrote:
       | If somebody thinks an LLM is coming for everybody's coding job,
       | I'd say this article is a great counterpoint just for existing.
       | 
       | You could tell someone from decades ago that we now use a very
       | high level language for complex tasks in complex code ecosystems,
       | never even mention AI, explain that the parser is really
       | generalist-biased, and this article would make perfect sense as
       | an example of exemplary code by a modern coder working for a
       | living.
       | 
       | That's code in there, the stuff Xu Hao is writing.
       | 
       | And also, that's not even getting into the debugging part...
       | Which will be about other code, that looks different.
        
         | notacoward wrote:
         | The problem is that it's not _quite_ code. It 's _almost_ code,
         | but without the precision, which puts it into a sort of Uncanny
         | Valley of code-ness. It 's detailed instructions for someone to
         | write code, but the someone in this case is an alien or insane
         | or on drugs so they might interpret it the way you meant it or
         | they might go off on some weird tangent. You never know, and
         | that means you'll need to check it with almost as much care as
         | you'd take writing it.
         | 
         | Also, having it write its own tests doesn't mean those tests
         | will themselves be correct let alone complete. This is a
         | problem we already have with humans, because any blind spot
         | they had while writing the code will still be present for
         | writing the tests. Who _hasn 't_ found a bug in tests, leading
         | to acceptance of broken code and/or rejection of correct
         | alternatives? There's no reason to believe this problem won't
         | also exist with an AI, and they have more blind spots to begin
         | with.
        
         | Veedrac wrote:
         | 'Artists' jobs are safe because AI is bad at hands.'
        
           | themodelplumber wrote:
           | Artists' jobs are safe in part because they can also use AI,
           | and most already use relevant ecosystems that now incorporate
           | AI.
           | 
           | Consumers who can operate AI for clip art purposes are simply
           | still part of the same non-artist-paying demographic they
           | always were.
           | 
           | Same with code
        
             | Veedrac wrote:
             | As farmers' jobs were safe because farmers can use farming
             | tools.
             | 
             | These arguments don't track even vaguely. You are doing the
             | equivalent of analyzing the future of solar power by
             | assuming solar will cost the same in 10 years as it does
             | today, and that each new watt of solar is matched 1:1 with
             | new units of demand. Neither of these are sensible.
             | 
             | It may be that ML code tools never displace many people, or
             | even that they supercharge demand, but you don't get to
             | justified conclusions by assuming the future is just the
             | present but with a bigger UNIX timestamp.
        
               | all2 wrote:
               | Industrialization has made farming tools incredibly
               | complex, so I believe the statement "farmers' jobs were
               | safe because farmers can use farming tools" is correct.
               | You still need a farmer to farm, but you now need less
               | manpower to farm. The specialist is secure while the
               | untrained laborer is at risk.
        
             | ldhough wrote:
             | Sadly I don't think this is true for art:
             | 
             | https://restofworld.org/2023/ai-image-china-video-game-
             | layof...
             | 
             | I really hope it doesn't end up being the same with code :|
        
         | twelve40 wrote:
         | Exactly, I actually liked the systematic approach in the
         | article, but it seemed pretty labor-intensive and ... not that
         | much different from other types of programming
        
           | sanderjd wrote:
           | To me, that's the whole point of this. I think it is directly
           | analogous to the jump between assembly and higher level
           | compiled languages. You could have said about that, "it still
           | seems pretty labor intensive and not that much different than
           | writing assembly", and that's true, but it was still a big
           | improvement. Similarly, AI-assisted tools haven't solved the
           | "creating software requires work" problem. But I think
           | they're in the process of further shifting the cost curve,
           | making more software possible to make.
        
         | nextworddev wrote:
         | The opposite might be true, and here's why - 1) by using
         | English as spec, the barrier of entry has gone lower, 2) LLMs
         | can also write prompts and self introspect to debug.
        
           | notacoward wrote:
           | > LLMs can also write prompts and self introspect to debug.
           | 
           | Why should we assume that won't lead to a rabbit hole of
           | misunderstanding or outright hallucination? If it doesn't
           | know what "correct" really is, even infinite levels of
           | supervision and reinforcement might still be toward an
           | incorrect goal.
        
             | ModernMech wrote:
             | It's like when you continually refine a Midjourney image.
             | At first refining it gets better results, but if you keep
             | going the pictures start coming out...really weird. It's up
             | to the human to figure out when to stop using some sort of
             | external measure of aesthetics.
        
             | ben_w wrote:
             | To which the normal response[0] is: that's just like
             | humans.
             | 
             | Of course, it's still bad that humans do it; but despite
             | the scientific method etc., even successful humans often
             | work towards an incorrect goal.
             | 
             | [0] I am cultured, you're quoting memes, that AI is just a
             | stochastic parrot:
             | https://en.wikipedia.org/wiki/Emotive_conjugation
        
               | notacoward wrote:
               | But it's _not_ just like humans. For one thing it 's
               | built differently, with a different relationship between
               | training and execution. It doesn't learn from its
               | mistakes until it gets the equivalent of a brain
               | transplant, and in fact extant AIs are _notorious_ for
               | doubling down instead of accepting correction. Even more
               | importantly, the AI doesn 't have real-world context,
               | which is often helpful to notice when "correct" (to the
               | spec) behavior is not useful, acceptable, or even safe in
               | practice. This is why the idea of an AI controlling a
               | physical system is so terrifying. Whatever requirement
               | the prompter forgot to include will not be recognized by
               | the AI either, whereas a human who knows about physical
               | properties like mass or velocity or rigidity will
               | _intuitively_ honor requirements related to those. Adding
               | layers is as likely to magnify errors as to correct them.
        
               | ben_w wrote:
               | > But it's not just like humans. For one thing it's built
               | differently
               | 
               | I'm referring to the behaviour, not the inner nature.
               | 
               | > in fact extant AIs are notorious for doubling down
               | instead of accepting correction.
               | 
               | My experience suggests ChatGPT is _better_ than, say,
               | humans on Twitter.
               | 
               | I've had the misfortune of several IRL humans who were
               | also much, much worse; but the problem was much rarer
               | outside social media.
               | 
               | > Even more importantly, the AI doesn't have real-world
               | context, which is often helpful to notice when "correct"
               | (to the spec) behavior is not useful, acceptable, or even
               | safe in practice.
               | 
               | Absolutely a problem. Not only for AI, though.
               | 
               | When I was a kid, my mum had a kneeling stool she
               | couldn't use, because the woodworker she'd asked to
               | reinforce it didn't understand it and put a rod where
               | your legs should go.
               | 
               | I've made the mistake of trying to use RegEx for what I
               | thought was a limited-by-the-server subset of HTML,
               | despite the infamous StackOverflow post, because I
               | incorrectly thought it didn't apply to the situation.
               | 
               | There's an ongoing two-way "real-world context" miss-
               | match between those who want the state to be able to
               | pierce encryption and those who consider that to be an
               | existential threat to all digital services.
               | 
               | > a human who knows about physical properties like mass
               | or velocity or rigidity will intuitively honor
               | requirements related to those
               | 
               | Yeah, kinda, but also no.
               | 
               | We can intuit within the range of our experience, but we
               | had to invent counter-intuitive maths to make most of our
               | modern technological wonders.
               | 
               | --
               | 
               | All that said, with this:
               | 
               | > It doesn't learn from its mistakes until it gets the
               | equivalent of a brain transplant
               | 
               | You've boosted my optimism that an ASI probably won't
               | succeed if it decided it preferred our atoms to be
               | rearranged to our detriments.
        
               | notacoward wrote:
               | > I'm referring to the behaviour, not the inner nature.
               | 
               | Since the inner nature does affect behavior, that's a
               | _non sequitur_.
               | 
               | > we had to invent counter-intuitive maths to make most
               | of our modern technological wonders.
               | 
               | Indeed, and that's worth considering, but we shouldn't
               | pretend it's the common case. In the common case, the
               | machine's lack of real-world context is a disadvantage.
               | Ditto for the absence of any actual understanding beyond
               | "word X often follows word Y" which would allow it to
               | predict consequences it hasn't seen yet. Because of these
               | deficits, any "intuitive leaps" the AI might make are
               | less likely to yield useful results than the same in a
               | human. The ability to form a coherent - even if novel -
               | theory and an experiment to test it is key to that kind
               | of progress, and it's something these models are
               | fundamentally incapable of doing.
        
               | ben_w wrote:
               | > Since the inner nature does affect behavior, that's a
               | non sequitur.
               | 
               | I would say the reverse: we humans exhibit diverse
               | behaviour despite similar inner nature, and likewise
               | clusters of AI with similar nature to each other display
               | diverse behaviour.
               | 
               | So from my point of view, that I can draw clusters --
               | based on similarities of failures -- that encompasses
               | both humans and AI, makes it a non sequitur to point to
               | the internal differences.
               | 
               | > The ability to form a coherent - even if novel - theory
               | and an experiment to test it is key to that kind of
               | progress, and it's something these models are
               | fundamentally incapable of doing.
               | 
               | Sure.
               | 
               | But, again, this is something most humans demonstrate
               | they can't get right.
               | 
               | IMO, most people act like science is a list of facts, not
               | a method, and also most people mix up correlation and
               | causation.
        
           | have_faith wrote:
           | English as a spec is incredibly "fuzzy", there are many valid
           | interpretations of intent. I don't think that can be avoided?
        
             | sarchertech wrote:
             | It can't. Legalese is an attempt to do so, and it's
             | impenetrable by non experts and still frequently ambiguous.
        
           | mooreds wrote:
           | > by using English as spec, the barrier of entry has gone
           | lower,
           | 
           | I'm not sure that is true. The level of back and forth and
           | refinements needed indicate to me that the "English" used is
           | not the normal language I use when talking to people.
           | 
           | It's almost like a refined version of cucumber with syntax
           | that is slightly more forgiving.
           | 
           | Maybe I'm being a codger, but LLMs seem (at least for now)
           | far better for summarizing and giving high level overviews of
           | concepts rather than nailing precise code requirements.
        
             | ben_w wrote:
             | > It's almost like a refined version of cucumber with
             | syntax that is slightly more forgiving.
             | 
             | I don't know if "cucumber" is autocorrupt or an actual non-
             | vegetable thing; can you clarify?
        
               | omnicognate wrote:
               | https://cucumber.io/
               | 
               | That "did they actually mean that or was it autowrong?"
               | feeling is going to get worse I fear.
        
               | mjr00 wrote:
               | Not a typo.[0]
               | 
               | In the 00s/early 10s, software went through a fad phase
               | where people earnestly thought that by implementing
               | Gherkin frameworks like Cucumber, you'd be able to hand
               | off writing tests to "business people" in "plain
               | English." It went about as well as you'd expect.
               | 
               | [0] https://cucumber.io/docs/gherkin/
        
               | ben_w wrote:
               | Thanks!
               | 
               | Despite that period being when I finished my Software
               | Engineering degree, got my first job, and then attempted
               | self-employment, I'd never heard of it before.
               | 
               | Looking at the book titles -- "Cucumber Recipes" in
               | particular -- even if I had encountered it, I might have
               | assumed the whole thing was a joke.
        
               | gwright wrote:
               | https://cucumber.io/docs/guides/overview/
        
           | agentultra wrote:
           | But you can't determine if a statement is true by simply
           | reading more words.
           | 
           | It's also not efficient for doing higher level work. There
           | was a time before we had algebra where people were still
           | expressing the same ideas but the notation wasn't there.
           | Mathematics was expressed in "plain language." It's extremely
           | difficult to read for us. For mathematician's of the time
           | there was no other way to explain algorithms or expressions.
           | 
           | For simple programs I have no doubt that these tools enable
           | more people to generate code.
           | 
           | However it's not going to be helpful for people working on
           | hypervisors, networking stacks, operating systems,
           | distributed databases, cryptography, and the like yet. For
           | that you need a more precise language and an LLM that can
           | _reason_ about semantics and generate understandable proofs:
           | _not boilerplate_ proofs either -- they have to be elegant so
           | that a human reading them can understand the problem as well.
           | We 're still a ways from being able to do that.
        
             | nextworddev wrote:
             | Arguably reading code can't lead to definitive conclusions
             | about its bug-free-ness
        
               | agentultra wrote:
               | Precisely! And neither can generating a handful of unit
               | tests. As EWD would say, _they only prove the existence
               | of one error._ Not that there are no errors.
               | 
               | If we want more programs that are correct with respect to
               | their specifications we need to write better, precise
               | specifications... not wave our hands around.
               | 
               | However for a lot of line-of-business tasks we're
               | generally fine with ambiguous, informal specifications.
               | We're not certain our programs are correct with respect
               | to the specifications, if we had written them out
               | formally, but it's good enough.
               | 
               | I think most businesses that are writing software that
               | needs to be reliable and precise are not going to benefit
               | from these kinds of tools for some time.
        
               | tjr wrote:
               | This is true in aerospace software. Lots of process, lots
               | of specification, lots of verification. I wouldn't want
               | to say that GPT-seque tools would be useless here, but I
               | really don't see them offering the same kind of magic
               | leverage that they might offer on some other projects.
               | 
               | And vice-versa! Most software projects do not benefit
               | from the rigor used in aerospace, because it's just not
               | needed, and would be a waste of time.
               | 
               | I am definitely seeing ways that GPT tools could speed up
               | some aerospace work, but we need to be really really sure
               | that things are being done correctly... not just mostly
               | correct, or seemingly correct.
        
               | staunton wrote:
               | Reading and proving a spec can though. LLMs are in
               | principle capable of doing that. (If your objection is
               | that the spec might have bugs then "bug free" is
               | subjective and nothing at all can ever lead to definitive
               | conclusions about it)
        
           | gumballindie wrote:
           | I mean sure if the world were to run on basic code. Perhaps
           | wordpress developers may feel slightly threatened by even
           | that is well above all examples of a"i" code i've seen.
        
           | ZephyrBlu wrote:
           | I think English as a spec actually makes the barrier of entry
           | higher, not lower. Code itself is far easier to understand
           | than an English description of the code.
           | 
           | To understand an English description of code you already have
           | to have a deeper understanding of what the code is doing. For
           | code itself you can reference the syntax to understand what's
           | going on.
           | 
           | The prompt in this case is using very technical language that
           | a beginner will have no idea about. But if you gave them the
           | code they could at least struggle along and figure it out by
           | looking things up.
        
             | nextworddev wrote:
             | Yes but LLMs can also be used by laypeople to explain the
             | issue in plain English too. That's the problem. Not that
             | LLMs would need human to guide the debugging process
             | anyways (at least in a few years)
        
               | ZephyrBlu wrote:
               | You still have the same problem... You cannot describe a
               | technical field with plain English. If you did so the
               | semantics would be incorrect. There is a reason jargon
               | exists.
               | 
               | The first two paragraphs alone are absolutely chock with
               | terms that would not be easily explained to a layperson:
               | 
               |  _" The current system is an online whiteboard system.
               | Tech stack: typescript, react, redux, konvajs and react-
               | konva. And vitest, react testing library for model, view
               | model and related hooks, cypress component tests for
               | view._
               | 
               |  _All codes should be written in the tech stack mentioned
               | above. Requirements should be implemented as react
               | components in the MVVM architecture pattern._ "
               | 
               | What is every library in that list? What is a model? What
               | is a view model? What is a hook, component test, view,
               | MVVM, etc?
               | 
               | If a layperson could understand explanations for all
               | these things then they would not be a layperson.
        
             | [deleted]
        
             | lcnPylGDnU4H9OF wrote:
             | This reminds me of rubber ducking[0] in how it necessitates
             | a certain understanding. If one is able to explain it in
             | plain English it's because it is understood.
             | 
             | [0] https://en.wikipedia.org/wiki/Rubber_duck_debugging
        
           | bartimus wrote:
           | But there's still going to have to be a human who has the
           | ability to form a mental model of the thing that's needing to
           | be implemented. Functionally and technically. The results of
           | the LLM will vary depending on the level of know-how the
           | human instructor has.
        
         | SanderNL wrote:
         | Except you now have a way "upwards" from an abstraction POV.
         | Regular code is severely limited and highly surgical, by
         | design. This is not.
         | 
         | All these abstraction layers were invented to serve old style
         | manual coders. Why bother explaining in great detail about
         | "Konva" layers and react anymore? Give it a few years and let
         | it finetune on IT tech and I see this being reduced to "I want
         | whiteboard app with X general characteristics" at which point
         | I'd no longer speak about "programming".
        
           | themodelplumber wrote:
           | That "upwards" excludes a lot of relevant systems design
           | logic that won't go away though, insofar as it is abstraction
           | ad infinitum in the direction of fewer-relevant-details.
           | 
           | What'll happen is, details will continue to be relevant as
           | tastes adjust to the new normal.
           | 
           | Like for my work, today, React is enterprise-ready, which is
           | not good for me. It means it will likely dip my projects in
           | unnecessary maintenance costs as compared to another widget
           | of its type that does what I want in a lightweight manner.
           | When I troubleshoot something of React's complexity, even my
           | prompts will likely need to be longer.
           | 
           | But also, that's just one component of one component. And you
           | have to experience this stuff in the first place, to know
           | that you should pay attention to these details and not those
           | other ones, for a given job, for a given client, in a given
           | industry, with given specs.
           | 
           | So, if I was able to wave my hands I'd simply have all the
           | problems I had back when I was a beginner. Ergo, it comes
           | back to the clip art problem: Being able to buy clip art
           | never made anyone a designer. But it made a lot of designers'
           | jobs way easier.
           | 
           | We are simply regressing toward the mean with regard to
           | programming. It was never about computers in the first place,
           | never so concerned with syntax.
           | 
           | Anyway, back to browsing my theater program...
        
             | SanderNL wrote:
             | Fair enough, but don't we abstract "upwards" all the time?
             | Assembly won't go away, but do you deal with it?
        
               | themodelplumber wrote:
               | For one, assembly ceases to be a relevant detail and is
               | replaced by other relevant details.
               | 
               | So, I can't code fast games in a 1984 workplace,
               | currently, being too out of touch with assembly on a
               | given chipset. But I also can't wave my hands at an LLM
               | and expect a modern, fast game of the desired quality to
               | code itself. (Even though a clip art-style result is
               | possible, the requirements are always going to be special
               | details)
               | 
               | The upwards direction example is also interesting because
               | it's foundational to the cognitive functionality of one
               | of the Jungian personality types. But other personality
               | perspectives also apply to coding, which means in part
               | that the directional, metaphorical-abstraction view can
               | effectively be a blind spot if we map it as the preferred
               | view on outcomes.
               | 
               | The most common blind spot for this personality involves
               | questions of relevant details, and their intersection
               | with planning for yet-unknowns. There is a tendency to
               | hand-wave which ends up being similar to prophetic
               | behavior. Jung called this the "voice in the wilderness"
               | noting that it can easily detach from sensibility
               | (rationality) by departing from life details. Kind of
               | interesting stuff.
               | 
               | (Ni-dominant type)
        
               | SanderNL wrote:
               | Now you got me on the edge of my seat. What is this
               | personality type?
        
               | themodelplumber wrote:
               | Ni-dominant. It exists nowadays in various post-Jungian
               | models, many of which are really fascinating, having
               | fleshed it out a lot.
               | 
               | The opposing function to Ni is Se, which creates a
               | dichotomy of planning/foreseeing vs. doing/performing.
               | The functions oscillate as a kind of duty cycle, so a lot
               | of sages out there have hobbies as musicians, stage
               | magicians, etc.
               | 
               | This dichotomy also effectively shuts out detail memory
               | for context, dealing mostly with present vs. future. Even
               | nostalgia is often ignored on the daily. So a Ni-dom will
               | usually describe their memory as pattern-based, gestalt,
               | more vague or general, etc.
        
               | rootusrootus wrote:
               | I would like to subscribe to your newsletter.
               | 
               | Even if approximately 75% of that sailed right over my
               | head.
        
               | themodelplumber wrote:
               | Best I can do is RSS!
        
               | SanderNL wrote:
               | I couldn't quite tell if you found a beautiful way to
               | insult me, but it is fascinating indeed. I _am_ hand
               | wavey and I understand its failure modes quite well,
               | unfortunately. It 's cool to talk about it at this level
               | of abstraction.
        
               | themodelplumber wrote:
               | No insult intended... I don't really know how much it
               | applies in your case, but since you really took on that
               | viewpoint, that's when the personality theory side of me
               | goes, "well if this is a favored viewpoint then there IS
               | this idea about the population that favors this
               | viewpoint" :-) And thoughts about GPT are generally
               | crafted from general personality positions, in the
               | absence of other relevant self-development experience.
               | 
               | I agree, it's cool stuff
        
         | wpietri wrote:
         | Yeah, I think there's a "stone soup" effect going on with AI.
         | 
         | It's the same sort of thing you see happening with the
         | customers of psychics. People often have poor awareness of how
         | much they're putting in to a conversation. Or it's a bit like
         | the way Tom Sawyer tricks other kids into painting the fence
         | for him. For me a lot of the magic here is in knowing what
         | questions to ask and when the answers aren't right. If you have
         | those skills, is pounding out the code that hard?
         | 
         | The interesting part for me is not generating new bits of code,
         | but the long-term maintenance of a whole thing. A while back
         | there was a fashion for coding "wizards", things that would ask
         | some questions and then generate code for you. People were very
         | excited, as they saw it as lowering the barrier to entry. But
         | the fashion died out because it just pushed all the problems a
         | bit further down the road. Now you had novice developers trying
         | to understand and improve code they weren't competent to write.
         | 
         | I suspect that in practice, anything a person can get a LLM to
         | wholly write is also something that could be turned into a
         | library or framework or service or no-code tool that they can
         | just use. That, basically, if the novelty is low enough that an
         | LLM can produce it, the novelty is low enough that there are
         | better options than writing the code from scratch over and
         | over.
        
           | baq wrote:
           | I mostly agree except one critical detail: LLMs are _the_ low
           | code /no code service. You literally tell them what you want
           | and if they're fine tuned on the problem domain, you're all
           | set. Microsoft demo'd the office 365 integration and if it
           | works half as well in practice they'll own the space as much
           | as they have in 1997.
        
             | wpietri wrote:
             | Maybe they will be, but that's not proven yet. We'll see!
             | If anything, the article we're looking at suggests that the
             | "tell them what you want" step is not obviously much less
             | rigorous or effortful than coding. Tuning could make the
             | difference, or it could be one of those things that
             | produces better demos than results.
        
           | harlanlewis wrote:
           | Great points (and after checking your user name, I've been
           | nodding my head to posts of yours for about a decade now).
           | 
           | This is a bot tangential - your reference to stone soup is a
           | wonderful example of the information density possible with
           | natural language. And all the meaning and story behind the
           | phrase is accessible to LLMs.
           | 
           | I'll have to start experimenting with idiom driven
           | development, especially when prompt golfing.
        
           | 62951413 wrote:
           | I believe the Model Driven Architecture fad
           | (https://en.wikipedia.org/wiki/Model-driven_architecture) is
           | a better analogy than wizards. Back then the holy grail of
           | complete round trip UML->code->UML didn't get practical
           | enough to justify the effort.
        
       | zackmorris wrote:
       | This is an amazing demonstration, but I'm worried that when this
       | goes mainstream, we'll inherit a ton of baggage from today's
       | programming. Specifically:
       | 
       | * The tests are written in BDD style "it('should xyz')", which
       | programmers do in code like this for convenience. But if we're
       | automating their creation, then actual human-readable Cucumber
       | clauses would be more useful. Maybe the tests can be transpiled.
       | This isn't the AI's fault, but more of a symptom of how the
       | original spirit of BDD as a means for nonprogrammers to test
       | business logic seems to have been lost.
       | 
       | * React hooks and Redux syntax are somewhat contrived/derivative.
       | The underlying concepts like functional reactive programming and
       | reducers are great, but the syntax is often repetitive or
       | verbose, with a lot of boilerplate to accomplish things that
       | might be one-liners in other languages/frameworks. This is more
       | of a critique of the state of web programming than of the AI's
       | performance.
       | 
       | * MVVM is a fine pattern, but at the end of the day, it's an
       | awful lot of handwaving to accomplish limited functionality. What
       | do I mean by that? Mainly that I question whether the frontend
       | needs models, routes, controllers (which I realize are MVC), etc.
       | I mourn that we lost the idempotent #nocode HTML of the 90s and
       | are back to manually writing app interfaces by hand in Javascript
       | (like we did for native desktop apps in the C++ OOP days) when
       | custom elements/components would have been so much easier. HTMX
       | combined with some kind of distributed serverless lambda
       | functions (that are actually as simple as they should be) would
       | reduce pages of code to a WYSIWYG document that nonprogrammers
       | could edit.
       | 
       | What I'm really getting at is that I envisioned programming going
       | a different direction back in the late 90s. We got
       | GPUs/TensorFlow and Docker and WebAssembly and Rust etc etc etc.
       | And these things are all fine, but they're contrived/derivative
       | too. More formal systems might look like multicore/multimemory
       | transputers (or Lisp machines), native virtual machines with full
       | sandboxing built in so anything can run anywhere, immutable and
       | auto-parallelized languages like HigherOrderCO/HVM or true vector
       | processing with GNU Octave (MATLAB) so that we don't have to
       | manually manage vertex buffers or free memory, etc.
       | 
       | I've had architectures in mind for better hardware and
       | programming languages for about 25 years (that's why I got my
       | computer engineering degree) but I will simply never have time to
       | implement them. All I do is work and cope. I just keep watching
       | as everyone reinvents the same imperative programming wheel over
       | and over again. And honestly it's gone on so long that I almost
       | don't even care anymore. It feels more appealing in middle age to
       | maybe just go be a hermit, get out of tech. I've always known
       | that someday I'd have to choose between programming and my life.
       | 
       | Anyway, now that I'm way too old to begin the training, I wonder
       | if AI might help to rapidly prototype truly innovating tools.
       | Maybe more like J.A.R.V.I.S. where it's just on all of the time
       | and can iterate on ideas at a superhuman rate to assist humans in
       | their self-actualization.
       | 
       | Then again, once we have that, it becomes trivial to implement
       | the stuff that I rant about. Maybe we only have about 5-10 years
       | until all of the problems are solved. I mean all of them,
       | everywhere, in physics/chemistry/biology/etc. Rather than
       | automating creative acts and play as AI is doing now. If the
       | Singularity arrives in 2030 instead of 2040, that also seems like
       | a strong incentive to go be a hermit.
       | 
       | Does any of this resonate with anyone? That somehow everything
       | has gone terribly wrong, but it's more of a hiccup than a crisis?
       | That maybe the most impactful thing that any of us can do is..
       | wait for things to get better?
        
         | davidthewatson wrote:
         | This is deeply resonant with me for the following reasons:
         | 
         | 1) Age
         | 
         | 2) BDD-style or what I call a madlib proxy for playing cucumber
         | on TV. Not a fan having used it in an RoR context I can only
         | call hipster-engineering, not what DHH described.
         | 
         | 3) I just had the discussion on redux vs. datomic vs. riak with
         | friends yesterday.
         | 
         | 4) Ditto the conversation on MVVM and the implied constraint
         | complexity of putting nodejs and chromium in the same
         | deployment package and calling it electron while carrying on
         | how simple it is relative to... a world where everything is
         | actually native all the way down?
         | 
         | 5) Me too on the CASE era.
         | 
         | 6) Cue Donald Knuth on literate programming. One thing that
         | cucumber is not, but I think taking another iteration at
         | literate programming in light of GPT or LLMs is a good idea
         | since Knuth is never wrong just 50 years ahead of time, but we
         | needed a collaboration of human-computer agents that is
         | patterned on a sensemaking protocol that can resolve subjective
         | truth by consensus of man and machine. How else could you
         | possibly resolve the fact that the SOTA lies to me on a daily
         | basis while defending itself and its lack of veracity with
         | force in what can only be seen as emulating the culture of
         | one's parents.
         | 
         | 7) Yes, AI should help on the iterations. Those short design
         | sketch-to-demo we used to do at the design studio with sketch
         | on Monday and demo on Friday should be much easier today to go
         | from breakfast sketch to dinner demo, but I don't think they
         | are. The tooling is radically better but that better has come
         | at the cost of complexity and going sideways, neither of which
         | are being fully felt and accounted for reflectively, i.e.
         | they're not how you get to typing less and having the tools do
         | the work because when they break, the debugging is mind-
         | crushing.
         | 
         | 8) I think the thing that's missing in the trivial part is that
         | it's not actually trivial, but particularly because the
         | software is the message and that insight stems from the fact
         | that software has emergent properties such as extensibility,
         | composability, and a resultant rate of change that make it very
         | difficult to compare from decade to decade because software's
         | fundamental disequilibrium stems from the fact that the full
         | stack is in constant flux from a mad hatter's pop culture where
         | we never sing the same song twice. There's value in theme and
         | variations if it can be modeled as improvisational human-
         | computer design pairing rather than yet another orchestration.
         | Joe Beda was as right about improvisation as Knuth is about the
         | art of computer programming.
         | 
         | 9) I guess the t-shirt is: I'm not waiting...
         | 
         | 10) In the immortal words of Raymond Loewy: Never leave well
         | enough alone.
         | 
         | If there's a set of artifacts in software that achieve what I
         | hope for with AI, it's somewhere between Bret Victor and
         | https://iolanguage.org/
        
       | romland wrote:
       | I started a bit of an exploration around prompts and code a week
       | or three back. I want to figure out the down/up-sides and create
       | tools for myself around it.
       | 
       | So, for this project (a game), I decided "for fun" to try to not
       | write any code myself, and avoid narrow prompts that would just
       | feed me single functions for a very specific purpose. The LLM
       | should be responsible for this, not me! It's pretty painful since
       | I still have to debug and understand the potential garbage I was
       | given and after understanding what is wrong, get rid of it, and
       | change/add to the prompt to get new code. Very often completely
       | new code[1]. Rinse and repeat until I have what I need.
       | 
       | The above is a contrived scenario, but it does give some
       | interesting insights. A nice one is that since here is one or
       | more prompts connected to all the code (and its commit), the
       | intention of the code is very well documented in natural
       | language. The commit history creates a rather nice story that I
       | would not normally get in a repository.
       | 
       | Another thing is, getting an LLM (ChatGPT mostly) to fix a bug is
       | really hit and miss and mostly miss for me. Say, a buggy piece
       | comes from the LLM and I feel that this could almost be what I
       | need. I feed that back in with a hint or two and it's very rare
       | that it actually fixes something unless I am very very specific
       | (again, needing to read/understand the intention of the
       | solution). In many cases I, again, get completely new code back.
       | This, more than once, forced my hand to "cheat" and do human
       | changes or additions.
       | 
       | Due to the nature of the contrived scenario, the code quality is
       | obviously suffering but I am looking forward to making the LLM
       | refactor/clean things up eventually.
       | 
       | On occasion ChatGPT tells me it can't help me with my homework.
       | Which is interesting in itself. They are actually trying (but
       | failing) to prevent that. I am really curious how gimped their
       | models will be going forward.
       | 
       | I've been programming for quite long. I've come to realize that I
       | don't need to be programming in the traditional sense. What I
       | like is creating. If that means I can massage an LLM to do a bit
       | of grunt work, I'm good with that.
       | 
       | That said, it still often feels very much like programming,
       | though.
       | 
       | [1] The completely new code issue can likely be alleviated by
       | tweaking transformers settings
       | 
       | Edit: For the curious, the repo is here:
       | https://github.com/romland/llemmings and an example of a commit
       | from the other day:
       | https://github.com/romland/llemmings/commit/466babf420f617dd... -
       | I will push through and make it a playable game, after that, I'll
       | see.
        
         | celeritascelery wrote:
         | That is really interesting experiment! I have so many
         | questions.
         | 
         | - do you feel like this could be a viable work model for real
         | projects? I recognize it will most likely be more effective to
         | balance LLM code with hand written code in the real world.
         | 
         | - some of your prompts are really long. Do you feel like the
         | code you get out of the LLM is worth the effort you put in?
         | 
         | - given that the code returned is often wrong, do you feel like
         | you could feasible for someone who knows little to no code?
         | 
         | - it seems like you already know well all the technology behind
         | what you are building (I.e. you know how to write a game in
         | js). Do you think you could do this without already having that
         | background knowledge?
         | 
         | - how many times do you have to refine a prompt before you get
         | something that is worth committing?
        
           | romland wrote:
           | I think it could be viable, even right now, with a big
           | caveat, you will want to do some "human" fixes in the code
           | (not just the glue between prompts). The downside of that is
           | you might miss out on parts of the nice natural language
           | story in the commit history. But the upside is you will save
           | a lot of time.
           | 
           | Down the line you will be able to (cheaply) have LLMs know
           | about your entire code-base and at that point, it will
           | definitely become a pretty good option.
           | 
           | On prompt-length, yeah, some of those prompts took a long
           | time to craft. The longer I spend on a prompt, the more
           | variations of the same code I have seen -- I probably get
           | impatient and biased and home in on the exact solution I want
           | to see instead of explaining myself better. When it's gone
           | that far, it's probably not worth it. Very often I should
           | probably also start over on the prompt as it probably can be
           | described differently. That said, if it was in the real world
           | and I was fine with going in and massaging the code fully,
           | quite some time could be saved.
           | 
           | If you don't know how to code, I think it will be very hard.
           | You would at the very least need a lot more patience. But on
           | the flip side, you can ask for explanations of the code that
           | is returned and I must actually say that that is often pretty
           | good -- albeit very verbose in ChatGPT's case. I find it hard
           | to throw a real conclusion out there, but I can say that
           | domain knowledge will always help you. A lot.
           | 
           | I think if you know javascript, you could easily make a game
           | even though you had never ever thought about making a game
           | before. The nice thing about that is that you will probably
           | not do any premature optimization at least :-)
           | 
           | All in all, some prompts was nailed down on first try, the
           | simple particle system was one such example. Some other
           | prompts -- for instance the map-generation with Perlin noise
           | -- might be 50 attempts.
           | 
           | A lot of small decisions are helpful, such as deciding
           | against any external dependencies. It's pretty dodgy to ask
           | for code around some that (e.g. some noise library) that you
           | need to fit into your project. I decided pretty early that
           | there should be no external dependencies at all and all
           | graphics would be procedurally generated. It has helped me as
           | I don't need to understand any libraries I have never used
           | before.
           | 
           | Another note that is related to the above, there are upsides
           | and downsides with high-ish temperature is you get varying
           | results. I think I should probably change my behaviour around
           | that and possibly tweak it depending on how exact I feel my
           | prompt is.
           | 
           | I find myself often wondering where the cap of today's LLM's
           | are, even if we go in the direction of multi-models and have
           | a base which does the reasoning -- and I have to say I keep
           | finding myself getting surprised. I think there is a good
           | possibility that this will be the way some kinds of
           | development will be. But, well, we'd need good local models
           | for that if we work on projects that might be of a sensitive
           | nature.
           | 
           | Related to amount of prompt attempts: I think the game has
           | cost me around $6 in OpenAI fees so far.
           | 
           | One particularly irritating (time consuming) prompt was
           | getting animated legs and feet: https://github.com/romland/ll
           | emmings/commit/e9852a353f89c217...
        
             | [deleted]
        
         | ChatGTP wrote:
         | Just curious, you're using which version?
        
           | romland wrote:
           | I have experimented quite a bit with various flavours of
           | LLaMa, but have had little success in actually getting not-
           | narrow outputs out of them.
           | 
           | Most of the code in there now is generated by gpt-3.5-turbo.
           | Some commits are by GPT-4, and that is mostly due to context
           | length limitations. I have tried to put which LLM was used in
           | every non-human commit, but I might have missed it in some.
        
         | sk0g wrote:
         | That's a beautiful readme, starred!
         | 
         | Out of curiosity, right now would you say you have saved time
         | by (almost) exclusively prompting instead of typing the code up
         | yourself? Do you see that trending in another direction as the
         | project progresses?
        
           | romland wrote:
           | It was far easier to get a big chunks of work done in the
           | beginning, but that is pretty much how it works for a human
           | too (at least for me). The thing that limit you is the
           | context-length limit of the LLM, so you have to be rather
           | picky on what existing code you feed back in. With this then
           | comes the issue with all the glue between the prompts, so I
           | can see that the more polished things will need to become,
           | the more human intervention -- this is a trend I already very
           | much see.
           | 
           | If there is time saved, it is mostly because I don't fear
           | some upcoming grunt work. Say, for instance, creating the
           | "Builder" lemming. You know pretty much exactly how to do it
           | but you know there will be a lot of one-off errors and subtle
           | issues. It's easier to go at it by throwing together some
           | prompt a bit half-heartedly and see where it goes.
           | 
           | On some prompts, several hours were spent, mostly reading and
           | debugging outputs from the LLM. This is where it eventually
           | gets a bit dubious -- I now know pretty much exactly how I
           | want the code to look since I have seen so many variants. I
           | might find myself massaging the prompt to narrow in on my
           | exact solution instead of making the LLM "understand the
           | problem".
           | 
           | Much of this is due to the contrived situation (human should
           | write little code) -- in the real world you would just fix
           | the code instead of the prompt and save a lot of time.
           | 
           | Thank you, by the way! I always find it scary to share links
           | to projects! :-)
        
             | sk0g wrote:
             | No worries, going to check out some of the commits when I
             | get a bit more free time as well. The concept is
             | intriguing!
             | 
             | The usefulness of LLMs for engineering things is very hard
             | to gauge, and your project is going to be quite interesting
             | as you progress. No doubt they help with writing new
             | things, but I spend maybe ~15% of my time working on
             | something new, vs maintenance and extensions. The more
             | common activities are very infrequently demonstrated,
             | either the usefulness diminishes as the context required
             | grows, or they simply make for less exciting examples.
             | Though someone in my org has brought up an LLM tool that
             | tries to remedy bugs on the fly (at runtime), which sounds
             | absolutely horrific to me...
             | 
             | It sounds similar to my experience with Copilot then. In
             | small, self-contained bits of code -- much more common in
             | new projects or microservices for example -- it can save a
             | lot of cookie cutter work. Sometimes it will get me 80% of
             | the way there, and I have to manually tweak it. Quite often
             | it produces complete garbage that I ignore. All that to
             | say, if I wasn't an SE, Copilot brings me no closer to
             | tackling anything beyond hello world.
             | 
             | One big benefit though is with the simpler test cases. If I
             | start them with a "GIVEN ... WHEN ... THEN ..." comment,
             | the autocompletes for those can be terrific, requiring
             | maybe some alterations to suite my taste. I get positive
             | feedback in PRs and from people debugging the test cases
             | too, because the intention behind them is clear without
             | needing to guess the rationale for the test. Win win!
        
       | moonchrome wrote:
       | I feel like this is a bunch of ceremony and back and forth, and
       | also considering GPT-4 speed - I feel like I would fly past this
       | approach just using copilot and coding.
       | 
       | I look forward to offloading these kinds of tasks to LLMs but I'm
       | not seeing the value right now. Using them feels slow and
       | unsatisfying, need to triple check everything, specify everything
       | relevant for context.
       | 
       | Also maybe it's just me but verbalizing requirements
       | unambiguously can often be harder than writing code for it. And
       | it's not fun. If GPT4 was GPT3.5 fast it would probably be a
       | completely different story.
        
       | mrbonner wrote:
       | I can't help but thinking: this is way more work than just code
       | it myself? Anybody has the same thought?
        
         | all2 wrote:
         | It depends how you use it. I've been using it to skip
         | boilerplate coding and get straight to the meaty bits. It took
         | me a few days to sketch out an application using ChatGPT to
         | handle the boilerplate, including dependency management
         | (python, poetry, etc.).
         | 
         | I've had to handle the specific pieces of implementation
         | myself. Especially unit testing new pieces of code. When asked
         | to generate unit tests, it does ok, but it doesn't get the
         | spirit of the code (my intended purpose) and so I'm left
         | filling in a bunch of blanks.
        
       | [deleted]
        
       | greenhearth wrote:
       | Is it me or does this just create a bunch of extra steps and
       | gratuitous complexity? These tools are not so efficient or make
       | anything easier, it seems. I'm sorry to the enthusiasts here - I
       | am usually excited about AI and a student of Computational
       | Linguistics, but I think this emperor is naked.
        
         | wudangmonk wrote:
         | I have been trying to find a use case for these LLMs and I
         | continue to keep an eye just in case someone figures out a way
         | to use them that I find useful in my workflow. My only use for
         | them so far is as an explorative tool for tasks I'm not
         | familiar such as when having to work with programming languages
         | I never use. For such things its great as not only do I not
         | have to go digging through the documentation, I also do not
         | have to then search the web for examples on how its actually
         | used.
         | 
         | This is taking into account that I have removed the cost of
         | using it as much as possible since I do not have to switch to a
         | browser tab, asks my question, wait for reply and then copy any
         | useful text to my editor. I have it setup as a function call
         | inside my repl along with saved history to a local file in case
         | I need it.
         | 
         | Even with this convenient way of using it I notice that pretty
         | much the only time I use it when working on my actual projecs
         | is just to save me the trouble of doing a google search for
         | trivial things such as looking up word definitions/synonyms for
         | naming things or for anything else where I would expect to find
         | the answer with just a bit of googling. I can just quickly do
         | my request and continue with whatever I was doing and then
         | return for my answer later.
        
       | mov_eax_ecx wrote:
       | How to overengineer with an LLM, don't state clearly the
       | requirements, shove your pet patterns first, it is more important
       | to follow the slice redux awareness hook than to have working
       | solution, never trust your developers to make decisions, worry
       | more how it is built than building a solution.
       | 
       | My way to work with an LLM is to have a good, clear requirement
       | and make the LLM write a possible file organization and query the
       | contents of each file, just the code no comments and assemble a
       | working prototype fast, then you can iterate over the
       | requirements and evolve from there.
        
         | lyjackal wrote:
         | Generally, I agree that approach works well. It's going to
         | perform better if it's not trying to fulfill your teams
         | existing patterns. On the other hand, allowing lots of
         | inconsistencies in style in your large code base seems like a
         | quick way to create a hot mess. Chat prompts seem like a really
         | difficult way to communicate code style and conventions though.
         | A sibling comment to yours mentions that a copilot autocomplete
         | seems like a much better pattern for working in an existing
         | code base, and I tend to agree that's much more promising. Read
         | the existing code, and recommend small pieces as you type
        
         | moonchrome wrote:
         | How often do you get working code that way ? Unless it's
         | something trivial that fits in it's scope I'd say that's going
         | to produce garbage. I've seen it steer into garbage on longer
         | prompt chains about a single class (of medium complexity) - I
         | doubt it would work project level. Mind sharing the projects ?
        
           | mov_eax_ecx wrote:
           | I work only with closed source codebases and this approach
           | for prototypes, but, using the same example as the blog i
           | prompt: "the current system is an online whiteboard system.
           | Tech stack: react, use some test framework, use konva for the
           | canvas, propose a file organization, print the file layout
           | tree. (without explanations)." The trick is that for every
           | chat the context is the requirement+the filesystem + the
           | specific file, so you don't have the entire codebase in the
           | context, only the current file, also use gpt4, gpt3 is not
           | good enough.
           | 
           | My main point is that the blog post final output is mock test
           | awareness hook redux, where an architect feels good to see
           | his patterns, with my approach you have a prototype online
           | whiteboard system,
        
       | blatant303 wrote:
       | Is there a tool that allows to do this within a text editor (for
       | instance VS Code). Using selection instead of copy pasting.
       | Having the LLM store its output directly within local files.
       | Maybe giving it access to a shell to run the tests on its own ?
        
         | mxuribe wrote:
         | If there isn't a tool currently...then give it time, and
         | eventually there will be tools similar to what you dewscribed.
         | I'd guess they'll be called something like prompt editors (like
         | text editors, etc.). ...Or, maybe they'll be called
         | chatditors...no, no, prompt editors is better. ;-)
        
         | xenospn wrote:
         | Copilot X will include that capability.
        
       | amelius wrote:
       | What I want is a prompt that continuously copies whatever I'm
       | doing, so I can ask to complete the task.
       | 
       | For example, say I'm converting all identifiers in a file from
       | lowercase to CamelCase. Then after doing like 3 of them, I can
       | ask the LLM to take over and do the remainder.
        
         | chrisco255 wrote:
         | I mean, that kind of task is more than easy to do today. You
         | could probably just create a VS Code extension that you type
         | "convert all identifiers in this file that match this pattern
         | from lowercase to camel case" and pipe that to the GPT API to
         | instantly do it (without even needing to give it the first 3
         | examples).
        
           | amelius wrote:
           | Sometimes just doing stuff takes less energy than thinking
           | about how it can be automated.
        
         | clarge1120 wrote:
         | Great example of how a GPT can reason on your behalf and
         | dramatically improve your performance. For instance, it could
         | watch for inconsistent approaches to design or even continue an
         | complex implementation you've started just from examining
         | context signals.
        
       | gdubs wrote:
       | There's an unfortunately common take on AI that goes basically
       | like this:
       | 
       | "I tried it and it didn't do what I wanted, not impressed."
       | 
       | My suggestion is to tune out the noise and really try
       | experimenting with these tools - and know that they're rapidly
       | improving. Even if ultimately you have criticisms or decide one
       | way or another, at least really investigate them for your own
       | use-cases rather than jumping on a bandwagon that's either "AI is
       | bad" or the breathless hype-machine at the other end.
        
         | mise_en_place wrote:
         | I was very impressed when it showed me the different techniques
         | for deep reinforcement learning. However, where it struggles is
         | when building an agent. Because you will need a high amount of
         | tokens to template a prompt, in the case of langchain or
         | AutoGPT.
        
         | rootusrootus wrote:
         | I agree it's a good idea to take a moderate approach. The hype
         | that LLMs are going to replace SWEs is clearly just that, hype,
         | if you've done any real work trying to get GPT4 to give you the
         | code you want. But it's also clearly a very useful tool. I
         | think it'll absolutely destroy Stack Overflow.
        
           | z3c0 wrote:
           | I am very critical of the LLM hype, but the threat to
           | stackoverflow is evident. Like stackoverflow, I never write
           | code verbatim that comes from even GPT4. I frequently find
           | issues in the output, as the code I write is generally very
           | context-specific. However, I find the back-and-forth with
           | interesting tidbits of info dropped here-and-there amounts to
           | something like rubber duck debugging on steroids.
        
           | [deleted]
        
           | tarruda wrote:
           | > The hype that LLMs are going to replace SWEs is clearly
           | just that, hype
           | 
           | LLMs cannot replace anyone, but it is clear that engineers
           | which master LLMs usage might multiply their productivity by
           | a lot.
           | 
           | The question is: If one LLM assisted engineer can work 10x
           | faster, will companies reduce their engineer staff by 90%?
        
             | majormajor wrote:
             | I've worked at far more companies with miles of product
             | idea backlog we never get to than ones with nothing for
             | engineering to do.
             | 
             | Now product will be able to use an LLM to come up with
             | feature proposals and design docs even faster! :o
             | 
             | So: are you working at a company where engineering is a
             | cost center or a revenue center? The latter wants to get
             | more done at the same cost _much_ more than it wants to
             | just cut spend.
        
             | nuancebydefault wrote:
             | To answer your question with a question if I may -- when
             | did productivity increase in software ever result in
             | headcount reduction? The competition also will have similar
             | productivity gain.
        
               | MacsHeadroom wrote:
               | >when did productivity increase in software ever result
               | in headcount reduction? The competition also will have
               | similar productivity gain.
               | 
               | The average AI company has like 1 employee per $25M
               | valuation. That's around 25x fewer employees than the
               | typical tech company.
        
           | drowsspa wrote:
           | Yet, the whole movement of getting blue collar workers to
           | code seems to have lost its steam.
        
             | gumballindie wrote:
             | Probably because "graduating" bootcamps doesnt make one a
             | swe and people figured out it's a scam?
        
           | peterashford wrote:
           | Of course, there's the issue that a lot of the info for
           | useful LLMs probably comes from places like Stack Overflow
        
           | lcnPylGDnU4H9OF wrote:
           | > destroy Stack Overflow
           | 
           | It'll be interesting to see how future training data is
           | sourced.
        
             | rootusrootus wrote:
             | Github would be my first guess.
        
               | lcnPylGDnU4H9OF wrote:
               | That does seem like a likely option. Discussions on
               | issues alongside the actual working (and not working)
               | code.
        
               | [deleted]
        
             | svachalek wrote:
             | You simply need the system to train itself on its own
             | interactions, like how search engines improve results by
             | counting clicks.
        
               | lcnPylGDnU4H9OF wrote:
               | I'm not wondering about how the system will determine
               | what's most helpful but instead determining what's even
               | "correct". A model will learn what's "correct" from Stack
               | Overflow by finding accepted or highly-voted answers but
               | when it can't find such content anymore (in this case
               | because Stack Overflow is hypothetically gone) then what
               | would even exist to generate these discussions to be used
               | as training data?
               | 
               | Github, per the sibling comment, is a good example
               | because projects will have issues (tied to the individual
               | repository of source code to be seen as a working
               | implementation of the idea) which will be where such
               | discussions happen.
        
               | LawTalkingGuy wrote:
               | Those topics that AI replaces the forums for won't need
               | discussion. People won't be confused about that thing
               | because the coding AI knows the details of it. Soon
               | that'll be most syntax questions, soon simple to mid-
               | level algorithms, etc.
               | 
               | People will move on to higher-level questions.
        
               | ok_dad wrote:
               | When Google search became important, people structured
               | their information so that Google could best index it.
               | When AIs become important in the same way, people will
               | start to structure their information so that a particular
               | class of AI can best index it. If that involves API
               | documentation, perhaps there will be a standard format
               | that AIs understand the best.
        
         | spaceman_2020 wrote:
         | People also forget that the model is trained on older data. At
         | first, it will default to referencing out of date frameworks
         | and solutions, but if you tell it that its code isn't working,
         | it will usually correct itself.
        
         | alexashka wrote:
         | You may be underestimating how much meaning people derive
         | _from_ jumping on bandwagons and having a simple to understand
         | group identity.
         | 
         | Your suggestion would make many people unhappy. They can't win
         | the competence game and hence 'really investigating' is a
         | losing proposition for them. What they _can_ do is jump on
         | bandwagons very quickly, hoping to score a first mover
         | advantage.
         | 
         | How much of an advantage would one get from taking a couple of
         | years to _really_ investigate Bitcoin and the algorithms
         | involved, vs buying some as early as possible and telling
         | everyone else how great it is? :)
        
         | joseph_grobbles wrote:
         | [dead]
        
       ___________________________________________________________________
       (page generated 2023-04-18 23:00 UTC)