[HN Gopher] Prompt engineering vs. blind prompting
       ___________________________________________________________________
        
       Prompt engineering vs. blind prompting
        
       Author : Anon84
       Score  : 216 points
       Date   : 2023-04-22 16:44 UTC (6 hours ago)
        
 (HTM) web link (mitchellh.com)
 (TXT) w3m dump (mitchellh.com)
        
       | popcorncowboy wrote:
       | If you don't apply a rigorous evaluation and testing framework
       | you are a Prompt Alchemist at best.
        
       | jw1224 wrote:
       | I spent 3 hours today carefully "engineering" a complex prompt. I
       | ran it on a loop, then fed the results back in to my database.
       | 
       | The 3 hours I spent has saved my business ~$15k in costs.
        
         | rideontime wrote:
         | How?
        
         | woutr_be wrote:
         | Do you mind explaining this more? How did your prompt save your
         | business $15K in costs?
        
       | azubinski wrote:
       | Prompt engineering is a kind of prompting that is (in a sense) a
       | kind of an engineering, but it's impossible to understand in what
       | sense it is a kind of what engineering, that's why it is so hard
       | to understand _coherent texts_ if they are not completely
       | meaningful.
       | 
       | All this is so exciting and it promises many new jobs that
       | require accelerated education of prompt engineers.
        
       | mcs_ wrote:
       | Prompt's engineer only exists if they can measure the efficiency
       | of their inputs.
       | 
       | How to measure the effectiveness of a given prompt seems to me a
       | big deal now.
        
       | tgcandido wrote:
       | does your approach takes into consideration that the temperature
       | parameter is equal to 0?
        
         | majormajor wrote:
         | There's also no instances of "sample" or "random" in the
         | article.
         | 
         | If you say "can be developed based on real experimental
         | methodologies" but don't talk about randomness or temp (or
         | top_p, though personally I haven't played with that one as
         | much) then I'm going to be _very_ skeptical.
         | 
         | Once you're beyond the trivial ("a five sentence prompt worked
         | better than a five word one") then if you get different things
         | for the same prompt then you need to do a LOT of work to be
         | sure that your modified prompt is "better" than your first, vs
         | just "had a better outcome that time."
        
           | mitchellh wrote:
           | Author here. As I noted in the post, this is an elementary
           | post to help people understand the very basics. I didn't want
           | to bring in anything more than a "101"-level view.
           | 
           | I do mention output sampling briefly (Cmd-F "self-
           | consistency"). And yes, there are a lot of good techniques on
           | the validation set too. At the most basic, you can sample, of
           | course, but you can also perform uncertainty analysis on each
           | individual test case so that future tests sample either the
           | most uncertain, or a diverse set of uncertain and not test
           | cases. I also didn't go into few-shot very much, since
           | choosing the exemplars for few-shot are a whole thing unto
           | itself. And this benefits from "sampling" (of sorts) as well.
           | But again, a whole topic on its own. And so on.
           | 
           | As for top_p, for classification this is a very good tool,
           | and I do talk about top_p as well (Cmd-F "confusion matrix")!
           | I again, felt it was too specific or too advanced to dive in
           | more deeply in this blog post, but I linked to various
           | research if people are interested.
           | 
           | To the grandparent re: temperature: when I first tweeted
           | about this, I noted in a tweet that I ran all these tests
           | with some fixed parameters (i.e. temp) but in a realistic
           | environment and depending on the problem statement, you'd
           | want to take those into account as well.
           | 
           | There's a lot that could be covered! But the post was getting
           | long so I wanted to keep this really as... baby's first guide
           | to prompt eng.
        
             | haensi wrote:
             | Thanks for this 101 article! The entire LLMOps field is
             | developing so fast and is being defined as we speak.
             | 
             | Somehow, this time feels to me like the early days of
             | computer science, when Don Knuth was barely known and a
             | Turing award was only known to Turing award winners. I met
             | Don Knuth in Palo Alto in March and we talked about LLMs.
             | His take: ,,Vint Cerf told me he was underwhelmed when he
             | asked the LLM to write a biography on Vinton Cerf."
             | 
             | There are also tools being built and released for Prompt
             | engineering [1]. Full transparency: I work at W&B
             | 
             | LangChain and other connecting elements will vastly
             | increase the usability and combinations of different tools.
             | 
             | [1]: https://wandb.ai/wandb/wb-
             | announcements/reports/Introducing-...
        
           | de_nied wrote:
           | Try following the links in the article. They give much more
           | detailed information. For example, your temperture
           | explination can be found here[1] (Ctrl+F), which is also
           | linked in the article.
           | 
           | [1]: https://huyenchip.com/2023/04/11/llm-
           | engineering.html#prompt...
        
       | ilaksh wrote:
       | This nonsense illustrates the most typical way that people
       | misunderstand the word "engineering" in a software context.
       | Software engineering and prompt engineering is not about the
       | self-proclaimed level of rigor or formality that you apply. It's
       | about the actual knowledge and processes used and especially
       | their _effectiveness_ as measured in closed feedback loops.
       | 
       | But the starting point for this is that the term "prompt
       | engineering" is an obvious exaggeration that people are using to
       | promote a skill set which is real and very useful but a big
       | stretch to describe as a whole new engineering discipline.
       | 
       | Regardless of what you call it, like software engineering, it
       | really is a process of trial and error for the most part. With
       | the capabilities of the latest OpenAI models, you should be
       | aiming for a level of generality where most tasks are not going
       | to have a simple answer that you can automatically check to
       | create an accuracy score. EDIT: after thinking about it, there
       | certainly are tasks that you could check for specific answers to
       | create an accuracy score, but I still think it would make more
       | sense in most cases to instead spend time iterating on user
       | feedback rather than trying to think of comprehensive test cases
       | on your own. There are a few things to know, such as the idea of
       | providing examples, the necessary context, and telling the model
       | to work step-by-step.
       | 
       | Actually I would say that there are two major things that could
       | be improved in the engineering described in this article related
       | to actually closing the feedback loops he mentions. He really
       | should at least mention the possibility of coming up with a new
       | prompt candidate after he was done with the first round of tests
       | and also after the users found some problem cases.
       | 
       | The main thing is to close the feedback loops.
        
         | [deleted]
        
         | manojlds wrote:
         | I think prompt engineering is social engineering but for AI.
        
         | LegitShady wrote:
         | writers are now word engineers, artists are image engineers,
         | and politicians are now bullshit engineers.
         | 
         | Prompt engineering is just people putting effort into studying
         | the behaviour of ML models and how input affects the output.
         | They're more like ML psychologists than engineers. Calling
         | themselves engineers just makes them feel better about being
         | glorified prompt testers.
        
           | jimbokun wrote:
           | I love "bullshit engineer" as a job description for
           | politician.
        
           | H8crilA wrote:
           | TIL that developing something like the Google Search ranking
           | function is not engineering.
        
             | LegitShady wrote:
             | psychologists and sociologists use statistics to evaluate
             | their results, does that make them engineers too?
        
               | kgwgk wrote:
               | Engineers of the human soul:
               | https://en.wikipedia.org/wiki/Engineers_of_the_human_soul
        
               | LegitShady wrote:
               | so you're taking classification advice from joseph
               | stalin? Maybe reconsider whether thats applicable to the
               | situation or not.
        
         | cfn wrote:
         | I believe it is a bit too much to call the article nonsense.
         | The process described mirrors what we have been doing in
         | Machine Learning for a long time: You setup a training set and
         | validation set, put it through system under test and draw
         | conclusions from the statistical analysis of the results.
        
       | charlieyu1 wrote:
       | I almost always prefer the old school way of prompting, keywords
       | and commands only. Has been working well for fine-tuning Google
       | search results for the last 20 years. Why do I have to Tak with
       | computers with natural languages suddenly?
        
       | Shirine wrote:
       | Omggg
        
       | alphanullmeric wrote:
       | Mostly just a push by "people skills" people to insert themselves
       | into the bleeding edge of STEM and pretend like they add any
       | value.
        
       | cs702 wrote:
       | When I read guides like this one, I wonder if "prompt
       | engineering" is a misguided effort to pigeonhole a _formal
       | language_ that by necessity is precise and unambiguous (like a
       | programming language) into natural language, which by necessity
       | has evolved to be imprecise and ambiguous.
       | 
       | It's like trying to fit a square peg inside an irregularly shaped
       | hole, without leaving any space unfilled around the edges of the
       | square.
        
       | williamcotton wrote:
       | Here is an example of some prompt engineering in order to build
       | augmentations for factual question-and-answer as well as building
       | web applications:
       | 
       | https://github.com/williamcotton/transynthetical-engine
        
         | m3kw9 wrote:
         | Problem with this is that it requires the software to know what
         | the target is when question is asked and I don't see it as
         | reliable as there are many ways to ask and could have many
         | targets
        
           | williamcotton wrote:
           | I don't really understand your criticism but I'd be happy to
           | continue a dialog to find out why you mean!
           | 
           | There's probably a little too much going on with that
           | project, including generating datasets for fine-tuning, which
           | is the reason for comparing with a known answer.
           | 
           | It is very similar to the approach used by the Toolformer
           | team.
           | 
           | But by teaching an agent to use a tool like Wikipedia or Duck
           | Duck Go search it dramatically reduces factual errors,
           | especially those related to exact numbers.
           | 
           | Here's a more general overview of the approach:
           | 
           | From Prompt Alchemy to Prompt Engineering: An Introduction to
           | Analytic Augmentation
           | 
           | https://github.com/williamcotton/empirical-
           | philosophy/blob/m...
        
       | svilen_dobrev wrote:
       | heh. i wonder, what the "SEO" equivalent would be in this domain?
       | "Agenda engineer" ? "prompt influencer"?
        
       | pbowyer wrote:
       | > There are fantastic deterministic libraries out there that can
       | turn strings like "next Tuesday" into timestamps with extremely
       | high accuracy.
       | 
       | Which libraries? I know of Duckling [0] but what others?
       | 
       | 0. https://github.com/facebook/duckling
        
         | gregsadetsky wrote:
         | A few libs come up for "human" format date parsing. The Python
         | "dateparser" below is definitely well known.
         | 
         | https://dateparser.readthedocs.io/en/latest/
         | 
         | https://sugarjs.com/dates/#/Parsing
         | 
         | https://github.com/wanasit/chrono
        
       | jmount wrote:
       | Nice, this got me to thinking on variations on the topic:
       | https://win-vector.com/2023/04/22/the-sell-as-scam/
        
       | skybrian wrote:
       | It's a good start. Also, it's good to use a toy problem for
       | explaining how to do it. It would be great if more people
       | published the results of careful experiments like this, perhaps
       | for things that aren't toy problems? It would be so much better
       | than sharing screenshots!
       | 
       | However, when you do have such a simple problem, I wonder if you
       | couldn't ask ChatGPT to write a script to do it? Running a script
       | would be a lot cheaper than calling an LLM in production.
        
       | cloudking wrote:
       | What is a business problem you solved with LLMs that you couldn't
       | solve as efficiently without them?
        
         | H8crilA wrote:
         | Translation, GPT-4 put to dust other translation tools like
         | DeepL or Google Translate. At a much higher cost, of course.
        
         | potatoman22 wrote:
         | Named entity recognition - extracting structured data from text
        
           | rolisz wrote:
           | But there are fairly good models for doing NER that are not
           | LLMs. Models that are open source and you can even run on a
           | CPU, with parameter counts in the hundred of millions, not
           | billions.
        
             | billythemaniam wrote:
             | While true, GPT-4 kinda just gets a lot of the classic NLP
             | tasks, such as NER, right with zero fine-tuning or minimal
             | prompt engineering (or whatever you want to call it). I
             | haven't done an extensive study, but I do NLP daily as part
             | of my current job. I often reach for GPT-4 now, and so far
             | it does a better job than any other pretrained models or
             | ones I've trained/fine-tuned, at least for data I work on.
        
               | rolisz wrote:
               | But what about cost? There was a recent article saying
               | that Doordash makes 40 billion predictions per day, which
               | would result in 40 million dollars per day if using GPT4.
               | 
               | Sure, GPT4 is great for experimenting with and I often
               | try it out, but at the end of the day, for deploying a
               | widely used model, the cost benefit analysis will favor
               | bespoke models a lot of the time.
        
             | og_kalu wrote:
             | GPT-4 generally performs better than expert human workers
             | on NLP tasks, nevermind bespoke models.
             | https://www.artisana.ai/articles/gpt-4-outperforms-elite-
             | cro....
        
               | rolisz wrote:
               | The article you linked says that GPT4 performed better
               | than crowdsourced workers, not than experts. The experts
               | performed better than GPT4 in all but 1 or 2 cases. And
               | in my experience with Mechanical Turk, the workers from
               | MT are often barely better than random chance.
        
               | og_kalu wrote:
               | Fair on the wording I suppose but
               | 
               | First of all, the dataset used for evaluation was created
               | by those researchers, weighing it in their favor.
               | 
               | Second, GPT-4 still performs better in 6 of those. Hardly
               | 1 or 2. And when it doesn't, it's usually very close.
               | 
               | All of this is to say that GPT-4 will smoke any bespoke
               | NLP model/API which is the main point.
        
         | jstx1 wrote:
         | ChatGPT with GPT4 has made me much better and faster at solving
         | programming problems, both at work and for working on personal
         | projects.
         | 
         | Many people are still sleeping on how useful LLMs are. There's
         | a lot of related things to be skeptical about (big promises,
         | general AI, does it replace jobs, all the new startups that are
         | basically dressed up API calls...) but if you do any kind of
         | knowledge work, there's a good chance that you could it much
         | better if you also used an LLM.
        
           | jabradoodle wrote:
           | The parent is asking for a specific example use case.
        
             | michaelbuckbee wrote:
             | An eye opening example for me was that I was working with a
             | Ruby/Rails class that was testing various IP configurations
             | [1] and I was able to just copy and paste it into chatgpt
             | and say "write some tests for this".
             | 
             | It wasn't really anything I couldn't have written in a half
             | hour or so but it was so much faster. The real kicker is
             | that by default chatgpt wrote Rspec and I was able to say
             | "rewrite that in minitest" and it worked.
             | 
             | 1 - https://wafris.org/ip-lookup
        
             | hayksaakian wrote:
             | I can't speak for OP, but for me I literally never use
             | stack overflow any more, and I spend about 90% less time on
             | Google
        
               | styfle wrote:
               | Curious if that's because AI provides better answers?
               | 
               | It's certainly not quicker answers, right?
        
               | 8organicbits wrote:
               | Are you not fact checking chatgpt? I've seen wrong info,
               | especially subtle things. It seemed reckless to use as-
               | is.
        
               | blowski wrote:
               | Both ChatGPT and StackOverflow suffer from content
               | becoming outdated. So some highly-upvoted answer on
               | StackOverflow has been out of date since 2011, and now
               | ChatGPT is trained on it.
               | 
               | I see the future as writing test cases (perhaps also with
               | ChatGPT), and separately using ChatGPT to write the
               | implementation. Perhaps we will just give it a bunch of
               | test cases and it will return code (or submit a PR) that
               | passes those tests.
        
               | etimberg wrote:
               | For fun I tried asking chatgpt to create a simple using
               | an opensource project I maintain. The generated answer
               | was sort of correct but not correct enough to copy and
               | paste. It missed including a plugin, used a version of
               | the project that doesn't exist yet, and generated data
               | that wasn't valid datetimes.
        
               | hallway_monitor wrote:
               | Yep exactly. I guess I haven't hit the 25 messages in 3
               | hours limit, but whenever there's an API or library I'm
               | not familiar with. I can get my exact example in about 10
               | seconds from ChatGPT 4
        
               | iudqnolq wrote:
               | Are those popular APIs?
               | 
               | I've found Copilot useful when writing greenfield code,
               | but very unhelpful generating code that uses APIs not
               | popular enough to have significant coverage on
               | StackOverflow. Even if I have examples of correct usage
               | in the same file it still guesses plausible but wrong
               | types.
               | 
               | I haven't bought GPT 4 but I'm curious if it's much
               | better at this.
        
               | lstamour wrote:
               | If you don't mention a library by name it is liable to
               | make something up by picking a popular library in another
               | language and converting the syntax to the language you
               | asked for.
               | 
               | If you ask for something impossible in a library it will
               | also frequently make up functions or application
               | settings. If you ask for something obscure but hard to
               | do, it might reply that it's impossible but it is
               | possible if you know how and teach it.
               | 
               | I sort of compare prompt engineering to Googling - you
               | sometimes have to search for exactly the right terms that
               | you want to appear in the result in order to get the
               | answer you're looking for. It's just that the flexibility
               | of ChatGPT in writing a direct response sometimes means
               | it will completely make up an answer.
               | 
               | There's also a limitation that the web interface doesn't
               | actually let you upload files and has a length limit for
               | inputs. For Copilot, I'm looking forward to Copilot X:
               | https://www.youtube.com/watch?v=3surPGP7_4o
        
               | iudqnolq wrote:
               | This was neither. I've forgotten the exact words I typed
               | but it was something like this.
               | 
               | Prompt:                   fn encode(value: Foo) {
               | capnproto::serialize_packed:: serialize_message(value);
               | }              fn decode(input: &[u8]) {
               | 
               | Expected:                  capnproto::serialize_packed::
               | deserialize_message(input);
               | 
               | Generated
               | capnproto::PackedMessageDeserializer::deserialize(input)
        
           | nice_byte wrote:
           | what kind of problems are you trying to solve that make gpt-4
           | so helpful to you?
        
           | cbm-vic-20 wrote:
           | I'm really trying to do the same, for both my work, and
           | personal projects. But the type of answers I need for work
           | (enterprise software, large codebase built over 20+ years)
           | requires a ton of context that I simply cannot provide to
           | ChatGPT, not only for legal reasons, but just due to the
           | amount of code that would be required to provide enough
           | context for the LLM to chew on.
           | 
           | Even personal projects, where I'm learning new languages and
           | libraries, I've found that the code that gets generated in
           | most cases is incorrect at best, and won't compile at worst.
           | So I have to go through and double-check all of its "work"
           | anyway- just like I'd have to do if I had a junior engineer
           | sidekick who didn't know how to run the compiler.
           | 
           | I think for the work problems, if our company could train and
           | self-host an LLM system on all of our internal code, it would
           | be interesting to see if that could be used to assist
           | building out new features and fixes.
        
         | nomel wrote:
         | Documentation of old undocumented code bases. Feed in the
         | functions, with a little context, and it works surprisingly
         | well.
        
         | Fordec wrote:
         | Name one business problem solved with any tool that can only be
         | solved by that tool and nothing else.
         | 
         | It's not about uniqueness, the name of the game is
         | efficiency/scaling already solvable problems by multiple or
         | skilled humans and reducing one of those dimensions.
        
           | cloudking wrote:
           | Fair, edited to include "as efficiently". I'm cutting through
           | the noise to find some signals for how people are using these
           | APIs.
        
           | [deleted]
        
         | drc500free wrote:
         | Writing emails that I had been putting off for weeks.
        
         | capableweb wrote:
         | Not sure what counts as a "business problem" for you, but
         | personally I couldn't have gotten as far as I've come with game
         | development without it, as I really struggle with the math and
         | I don't know many people locally who develop games that I could
         | get help from. GPT4 have been instrumental in helping me
         | understand concepts I've tried to learn before but couldn't,
         | and helps me implement algorithms I don't really understand the
         | inner workings of, but I understand the value of the specific
         | algorithm and how to use it.
         | 
         | In the end, it sometimes requires extensive testing as things
         | are wrong in subtle ways, but the same goes for the code I
         | write myself too. I'm happy to just get further than have been
         | possible for the last ~20 years I've tried to do it on my own.
         | 
         | Ultimately, I want to finish games and sell them, so for me
         | this is a "business problem", but I could totally understand
         | that for others it isn't.
        
           | moonchrome wrote:
           | Sound like you need to learn to search. There's tons of
           | resources on game dev. I can sort of see the value of using
           | GPT here but have you tried using it in an area you're an
           | expert in ? The rate of convincing bullshit vs correct
           | answers is astonishing. It gets better with Phind/Bing but
           | then it's a roulette that it will hit valid answers in the
           | index fast enough.
           | 
           | My point is - learning with GPT at this point sounds like
           | setting yourself up for failure - you won't know when it's
           | bullshiting you and you're missing out on learning how to
           | actually learn.
           | 
           | By the time LLMs are reliable enough to teach you, whatever
           | you're learning is probably irrelevant since it can be solved
           | better by LLM.
        
             | space_fountain wrote:
             | > By the time LLMs are reliable enough to teach you,
             | whatever you're learning is probably irrelevant since it
             | can be solved better by LLM.
             | 
             | For solving the really common problem of working in a new
             | area LLMs being unreliable isn't actually a big deal. If I
             | just need to know what some math is called or understand
             | how to use an equation, it's often very easy to verify an
             | answer, but can be hard to find it through google. I might
             | not know the right terms to search or my options might be
             | hard to locate documentation or SEO spam
        
               | moonchrome wrote:
               | This is fair, using it as a starting point to learning
               | could be useful if you're ready/able to do the rest of
               | the process. Maybe I was too dismissive because it read
               | to me like OP couldn't do that and thought he found the
               | magic trick to skip that part.
        
             | nicetryguy wrote:
             | > Sound like you need to learn to search. There's tons of
             | resources on game dev.
             | 
             | I have been making games since / in Flash, HTML5, Unity,
             | and classic consoles using ASM such as NES / SNES /
             | Gameboy: Tons of resources are WRONG, tutorials are
             | incomplete, engines are buggy, answers you find on
             | stackoverflow are outdated, even official documentation can
             | be littered with gaping holes and unmentioned gotcha's.
             | 
             | I have found GPT incredibly valuable when it comes to
             | spitting out exact syntax and tons of lines that i
             | otherwise would have spent hours and hours to write combing
             | through dodgy forum posts, arrogant SO douchebags, and the
             | questionable word salad that is the "official
             | documentation"; and it just does it instantly. What a
             | godsend!
             | 
             | > you won't know when it's bullshiting you and you're
             | missing out on learning how to actually learn.
             | 
             | Have you tried ...compiling it? You can challenge,
             | question, and iterate with GPT at a speed that you cannot
             | with other resources: i doubt you are better off combing
             | pages and pages of Ctrl+F'ing PDFs / giant repositories or
             | getting Just The Right Google Query to get exactly what you
             | need on page 4. GPT isn't perfect but god damn it is a hell
             | of alot better and faster than anything that has ever
             | existed before.
             | 
             | > whatever you're learning is probably irrelevant since it
             | can be solved better by LLM.
             | 
             | Not true. It still makes mistakes (as of Apr '23) and still
             | needs a decent bit of hand holding. Can / should you take
             | what it says as fact? No. But my experience says i can say
             | that about any resource honestly.
        
               | moonchrome wrote:
               | >I have found GPT incredibly valuable when it comes to
               | spitting out exact syntax and tons of lines that i
               | otherwise would have spent hours and hours to write
               | combing through dodgy forum posts, arrogant SO
               | douchebags, and the questionable word salad that is the
               | "official documentation"; and it just does it instantly.
               | What a godsend!
               | 
               | IMO if you're learning from GPT you have to double check
               | it's answers, and then you have to go through the same
               | song and dance. For problems that are well documented you
               | might as well start with those. If you're struggling with
               | something how do you know it's not bullshitting you ?
               | Especially for learning, I can see "copy paste and test
               | if it works" flying if you need a quick fix but for
               | learning I've seen it give right answers with wrong
               | reasoning and wrong answers with right reasoning.
               | 
               | I'm not disagreeing with you on code part, my no.1 use
               | case right now is bash scripting/short scripts/tedious
               | model translations - where it's easy to provide all the
               | context and easy to verify the solution.
               | 
               | I'd disagree on the fastest tool part, part of the reason
               | I'm not using it more is because it's so slow (and
               | responses are full of pointless fluff that eats tokens
               | even when you ask it to be concise or give code only).
               | Iterating on nontrivial solutions is usually slower than
               | writing them out on my own (depending on the problem).
        
             | williamcotton wrote:
             | Funny enough, I'd been wanting to learn some assembly for
             | my M1 MacBook but had given up after attempts at googling
             | for help as I ran into really basic issues and since I was
             | just messing around and had plenty of actually productive
             | things to work on.
             | 
             | A few sessions with ChatGPT sorted out various platform
             | specific things and within tens of minutes I was popping
             | stacks and conditionally jumping to my heart's delight.
        
               | erichocean wrote:
               | Yup, ChatGPT is, paradoxically, MOST USEFUL in areas you
               | already know something about. It's easy to nudge it
               | (chat) towards the actual answers you're looking for.
               | 
               | GP is way off base IMO.
        
               | moonchrome wrote:
               | After trying to use it as such so far :
               | 
               | Nontrivial problem solutions are wishful thinking
               | hallucinations, eg. I ask it for some way to use AWS
               | service X and it comes up with a perfect solution - that
               | I spend 10 minutes desperately trying to uncover - and
               | find out that it doesn't exist and I've wasted 15 minutes
               | of my life. "Nudging it" with followups how it's
               | described solutions violate some common patterns on the
               | platform, it doubles down on it's bullshit by inventing
               | other features that would support the functionality. It's
               | the worst when what you're trying to do can't really be
               | done with constraints specified.
               | 
               | It gives out bullshit reasoning and code, eg. I wanted it
               | to shorten some function I spitballed and it made the
               | code both subtly wrong (by switching to unordered
               | collection) and slower (switching from list to hash map
               | with no benefit). And then even claims it's solution is
               | faster because it avoids allocations ! (where my solution
               | was adding new KeyValuePair to the list, which is a value
               | type and doesn't actually allocate anything). I can
               | easily see a newbie absorbing this BS - you need
               | background knowledge to break it down. Or another example
               | I wanted to check the rationale behind some lint warning,
               | not only was it off base but it even said some blatantly
               | wrong facts in the process (like default equality
               | comparison in C# being ordinal ignore case ???).
               | 
               | In my experience working with junior/mid members the
               | amount of half assed/seemingly working solutions that I
               | had to PR in last couple of months has increased and a
               | lot (along with "shrug ChatGPT wrote it").
               | 
               | Maybe in some areas like ASM for a specific machine
               | there's not a lot of newbie friendly material and ChatGPT
               | can grok it correctly (or it's easy to tweak the outputs
               | because you know what it should look like) - but that's
               | not the case for gamedev. Like there are multiple books
               | titled "math for game developers" (OP use case).
        
             | ghaff wrote:
             | With respect to writing I've used it for things I know
             | enough to write--and will have to look up some quotes,
             | data, etc. in any case. GPT gives me a sort of 0th draft
             | that saves me some time but I don't need to check every
             | assertion to see if it's right or reasonable because I
             | already know.
             | 
             | But it doesn't really solve a business problem for me. Just
             | saves some time and gives me a starting point. Though on-
             | the-fly spellchecking and, to a lesser degree grammar
             | checking, help me a lot too--especially if I'm not going to
             | ultimately be copyedited.
        
             | capableweb wrote:
             | > Sound like you need to learn to search
             | 
             | Sounds like you need to not be condescending :)
             | 
             | Of course I've searched and tried countless of avenues to
             | pick up this, I'm not saying it's absolutely not possible
             | without GPT, just that I found it the easiest way of
             | learning.
             | 
             | And it's not "Write a function that does X" but more
             | employing the Socratic method to help me further understand
             | a subject, that I can then dive deeper into myself.
             | 
             | But having a rubber duck is infinitive worth, if you happen
             | to a programmer, you probably can see the value in this.
             | 
             | > have you tried using it in an area you're an expert in ?
             | The rate of convincing bullshit vs correct answers is
             | astonishing. It gets better with Phind/Bing but then it's a
             | roulette that it will hit valid answers in the index fast
             | enough.
             | 
             | Yes, programming is my expertise, and I use it daily for
             | programming and it's doing fine for me (GPT4 that is,
             | GPT3.5 and models before are basically trash).
             | 
             | Bing is probably one of the worst implementations of GPT
             | I've seen in the wild, so it seems like our experience
             | already differs quite a bit.
             | 
             | > you won't know when it's bullshiting you and you're
             | missing out on learning how to actually learn.
             | 
             | Yeah, you can tell relatively easy if it's bullshitting and
             | making things up, if you're paying any sort of attention to
             | what it tells you.
             | 
             | > By the time LLMs are reliable enough to teach you,
             | whatever you're learning is probably irrelevant since it
             | can be solved better by LLM.
             | 
             | Disagree, I'm not learning in order to generate more money
             | for myself or whatever, I'm learning because the process of
             | learning is fun, and I want to be able to build games
             | myself. A LLM will never be able to replace that, as part
             | of the fun is that I'm the one doing it.
        
               | moonchrome wrote:
               | >Yeah, you can tell relatively easy if it's bullshitting
               | and making things up, if you're paying any sort of
               | attention to what it tells you.
               | 
               | It's trained on generating the most likely completion to
               | some text, it's not at all easy to tell if it's
               | bullshitting you if you're a newbie.
               | 
               | Agreed that I was condescending and dismissive in my
               | reply, been dealing with people trying to use ChatGPT to
               | get free lunch without understanding the problem recently
               | so I just assume at this point, my bad.
        
               | ohmahjong wrote:
               | I have personally found the rubber-ducking to be really
               | helpful, especially for more exploratory work. I find
               | myself typing "So if I understand correctly, the code
               | does this this and this because of this" and usually get
               | some helpful feedback.
               | 
               | It feels a bit like pair programming with someone who
               | knows 90% of the documentation for an older version of a
               | relevant library - definitely more helpful than me by
               | myself, and with somewhat less communication overhead
               | that actually pairing with a human.
        
           | lamontcg wrote:
           | I don't particularly have a big problem with math at the
           | level that AIs tend to be useful for, and find that it tends
           | to hallucinate if you ask it anything which is moderately
           | difficult.
           | 
           | There's sort of a narrow area where if you ask it for
           | something fairly common but moderately complicated like a
           | translation matrix that it usually can come up with it, and
           | can write it in the language that you specify. But guarding
           | against hallucinations is almost as much trouble as looking
           | it up on wikipedia or something and writing it yourself.
           | 
           | The language model really needs to be combined with the hard
           | rules of arithmetic/algebra/calculus/dimensional-analysis/etc
           | in a way that it can't violate them and just mash up some
           | equations that its been trained on even though the result is
           | absolute nonsense.
        
       | binarymax wrote:
       | The techniques in this article are good practice for general
       | model tuning and testing with a _correct answer_. So for tasks
       | like extraction, labelling, classification, this is a great
       | guide.
       | 
       | The challenge comes when the response is a _subjective answer_.
       | Tasks like summarization, open question answering generation,
       | search query /question/result generation, are the hard things to
       | test. Those typically will need another manual step in the
       | process to grade the success of each result, and then you need to
       | worry about bias/subjectivity of your expert graders. So then you
       | might need multiple graders and consensus metrics. In short it
       | makes the process very very slow, expensive, and tedious.
        
         | jimbokun wrote:
         | Just like it is with grading a student's English class essay,
         | for example.
        
         | IsaacL wrote:
         | I pretty much agree. The "scientific" approach the author
         | pushes for in the article -- running experiments with multiple
         | similar prompts on problems where you desire a short specific
         | answer, and then running a statistical analysis -- doesn't
         | really make much sense for problems where you want a long,
         | detailed answer.
         | 
         | For things like creative writing, programming, summaries of
         | historical events, producing basic analyses of
         | countries/businesses/etc, I've found the incremental, trial-
         | and-error approach to be best. For these problems, you have to
         | expect that GPT will not reliably give you a perfect answer,
         | and you will need to check and possibly edit its output. It can
         | do a very good job at quickly generating multiple revisions,
         | though.
         | 
         | My favourite example was having GPT write some fictional
         | stories from the point of view of different animals. The
         | stories were very creative but sounded a bit repetitive. By
         | giving it specific follow-up prompts ("revise the above to
         | include a more diverse array of light and dark events; include
         | concrete descriptions of sights, sounds, tastes, smells,
         | textures and other tangible things" -- my actual prompts were a
         | lot longer) the quality of the results went way up. This did
         | not require a "scientific" approach but instead knowledge of
         | what characterized good creative writing. Trying out variants
         | of these prompts would not have been useful. Instead, it was
         | clear that:
         | 
         | - asking an initial prompt for background knowledge to set
         | context - writing quite long prompts (for creative writing I
         | saw better results with 2-3 paragraph prompts) - revising
         | intelligently
         | 
         | Consistently led to better results.
         | 
         | On that note, this was the best resource I found for more
         | complex prompting -- it details several techniques that you can
         | "overlap" within one prompt:
         | 
         | https://learnprompting.org/docs/intro
        
       | alpark3 wrote:
       | I use GPT-4 pretty consistently(set up a discord bot for myself).
       | What I found myself doing was tending towards the most simple
       | prompt that the LLM would still understand - If I asked a human
       | expert the types of prompts I was giving GPT, I most likely
       | would've gotten a clarifying question rather than an answer like
       | the LLM was giving me, simply because I'm talking in such short
       | and concise sentences.
       | 
       | I think the interesting thing is that the more concise a message
       | is to a fellow human, the more work needs to be done by the other
       | party in order to actually decode my message, even if it is
       | ultimately understandable. Whereas with LLMs, shorter token
       | length doesn't really matter: matrices of the same size are being
       | multiplied anyways.
        
         | LegitShady wrote:
         | I think because a human actually wants to figure out what you
         | want, and a you're just going to keep prompting that ML model
         | until you get something similar to what you want, something
         | that would annoy a human and probably waste their time or make
         | an endeavor extremely expensive.
         | 
         | I don't think its really fundamental to LLMs its just that you
         | don't treat a human the same way you treat an unthinking
         | unfeeling computer system whose transactions are cheap and
         | relatively near instant compared to requesting from a human.
        
       | rvz wrote:
       | This article reads into so much nonsense, I would not be
       | surprised to see that some of the content has been generated by
       | ChatGPT. I mean just look at this:
       | 
       | > Citations required! I'm sorry, I didn't cite the experimental
       | research to support these recommendations. The honest truth is
       | that I'm too lazy to look up the papers I read about them (often
       | multiple per point). If you choose not to believe me, that's
       | fine, the more important point is that experimental studies on
       | prompting techniques and their efficacy exist. But, I promise I
       | didn't make these up, though it may be possible some are outdated
       | with modern models.
       | 
       | This person appears like they are just in the hype phase of the
       | LLM and prompt mania and attempting to justify this new snake oil
       | with all this jargon that not even they understand the inner
       | workings of a AI model when it hallucinates frequently.
       | 
       | "Prompt Engineering" and "Blind Prompting" is different branding
       | of the same snake oil.
        
       | mistercheph wrote:
       | This article was written by chatgpt.
        
         | kingforaday wrote:
         | ...or maybe it's really Mitchell hiding behind every ChatGPT
         | response? After all he is a machine.
        
       | voidhorse wrote:
       | I recall there used to be a school of thought that argued that
       | making programming languages more like natural language was a
       | futile effort, as the benefits of having a precise, limited,
       | deterministic, if abstract, language for describing our ideas
       | were far superior to any "close enough" approximation we could
       | achieve with natural language. Where have those people gone?
       | 
       | When I step back and think about this LLM craze, the only stance
       | I'm left with is that I find it baffling that people are so
       | excited about what is ultimately _a stochastic process, and what
       | will always be a stochastic process_. It 's like the world has
       | suddenly shifted from valuing deterministic, precise, behaviors
       | to preferring this sort of "close enough, good enough" cavalier
       | attitude to _everything_. All it took was for something shiny and
       | new to gloss over all our concerns around precision and
       | certainty. Sure, LLMs are great for _getting approximations
       | quickly_ , but approximations are still just approximations.
       | Where have the lovers of certainty and deduction gone? I can't
       | help but think our general laziness and acceptance of "close
       | enough" fast solutions is going to bite us in the end.
        
         | jiggawatts wrote:
         | Deterministic processes are great at dealing with objective
         | data, but less great at dealing with free-form text produced by
         | humans.
         | 
         | Each tool should be used for the right job. Until now, we had
         | only cheap plastic tools for language processing. Suddenly, we
         | have a turbo power tool that can parse through pages of English
         | like a hot knife through butter.
         | 
         | We're all excited by the shiny new tool in the workshop, and
         | we're putting everything through it just to see what it can do.
         | Eventually the exuberance will subside and we'll put it to work
         | where it is the most applicable.
         | 
         | That doesn't mean we'll abandon other tools and methods.
        
         | frabjoused wrote:
         | I think you're just not thinking hard enough of ways to use it
         | -- use cases where "close enough" can be augmented by
         | deterministic validation, cleanup and iteration to perform
         | real-world work that is "all the way".
         | 
         | I'm currently littering my platform with small, server-side
         | decisions made by LLM prompts and it's doing real work that is
         | working. There are a ton of other people doing this right now.
         | You can be as angry as you want about it, but in a year or two
         | you'll be using the result of this work every day.
        
       | [deleted]
        
       | unbearded wrote:
       | Maybe should be called Prompt Science or Prompt Discovery or even
       | Prompt Craft.
       | 
       | I have a 40 million BERT-embedding spotify-annoy index that I
       | keep experimenting with to make a better query vector.
       | 
       | One way that I'm doing is getting only the token vectors with the
       | highest sum of the whole vector and averaging the top vectors to
       | use as the query vector.
       | 
       | Another way is zeroing many dimensions randomly on the query
       | vector to introduce diversity.
       | 
       | But after experimenting with "prompt engineering" I found out
       | that prefixing the sentences for the query vectors with "prompts"
       | yield very interesting results.
       | 
       | But I don't see much engineering. It's more trial, feedback and
       | trying again. Maybe even Prompt Art. Just like on chatGPT.
        
         | z3c0 wrote:
         | I like "prompt injection", personally. It's not as pretentious
         | as "prompt engineering".
        
           | d0gbread wrote:
           | I think that's already taken and more about hacking via
           | variables in the prompt like SQL injection.
           | 
           | I would just got with prompt tuning.
        
       | avoinot wrote:
       | From this post: If you want to learn some more advanced
       | techniques, Prompt Engineering by Lilian Weng provides a
       | fantastic overview.
        
         | gregsadetsky wrote:
         | https://lilianweng.github.io/posts/2023-03-15-prompt-enginee...
         | ?
        
       | slowhadoken wrote:
       | ridiculous
        
         | dang wrote:
         | " _Please don 't post shallow dismissals, especially of other
         | people's work. A good critical comment teaches us something._"
         | 
         | " _When disagreeing, please reply to the argument instead of
         | calling names. 'That is idiotic; 1 + 1 is 2, not 3' can be
         | shortened to '1 + 1 is 2, not 3._"
         | 
         | https://news.ycombinator.com/newsguidelines.html
        
       | epberry wrote:
       | My personal next step with LLMs is to use them as completion
       | engines versus just asking them questions. Few shot prompting is
       | another intermediate skill I want to incorporate more.
        
       | doubtfuluser wrote:
       | A bit of an unpopular opinion as it seems, but I would actually
       | bet that the current prompt engineering is just a short term
       | thing. When the performance of LLMs continue to improve I
       | actually expect that they will become much better to understand
       | not so well formed prompts. Especially when you take into
       | consideration that they now are trained with RLHF on _real_ users
       | input. So it will probably become less of an engineering problem
       | but more an articulation of what exactly you want
        
         | thrashh wrote:
         | I don't know.
         | 
         | To talk to other humans, we literally have a whole writing
         | field, courses to teach how to open to write technical
         | documentation or research grants and so much more.
         | 
         | There's already a whole industry already on how to talk to the
         | human language model and humans are currently way smarter.
        
         | Hackbraten wrote:
         | Even as LLMs get better over time at understanding ill-formed
         | prompts, I expect that API prices will still continue to depend
         | on the number of tokens used. That's an incentive to minimize
         | tokens, so "prompt engineering" might stick around, even if
         | just for cost optimization.
        
           | charcircuit wrote:
           | Do you not expect a trend of token prices decreasing over
           | time? There will be business using a less cutting edge model
           | and the difference of how many words a prompt is won't be a
           | big contributing factor to the total spend of the business.
        
         | burtonator wrote:
         | The next major leap in LLMs (in the next year) is probably
         | going to be the prompt context size. Right now we have 2k, 4k,
         | 8k ... but OpenAI also has a 32k model that they're not really
         | giving access to unfortunately.
         | 
         | The 8k model is nice but it's GPT4 so it's slow.
         | 
         | I think the thing that you're missing is that zero shot
         | learning is VERY hard but anything > GPT3 is actually pretty
         | good once you give it some real world examples.
         | 
         | I think prompt engineering is going to be here for a while just
         | because, on a lot of task, examples are needed.
         | 
         | Doesn't mean it needs to be a herculean effort of course. Just
         | that you need to come up with some concrete examples.
         | 
         | This is going to be ESPECIALLY true with Open Source LLMs that
         | aren't anywhere near as sophisticated as GPT4.
         | 
         | In fact, I think there's a huge opportunity to use GPT4 to
         | train the prompts of smaller models, come up with more
         | examples, and help improve their precision/recall without
         | massive prompt engineering efforts.
        
           | kiratp wrote:
           | You can't commercially use anything you train off OpenAI
           | outputs.
        
             | sebzim4500 wrote:
             | You can as long as the resulting model does not compete
             | with OpenAI.
        
             | rufius wrote:
             | Can you elaborate?
        
               | 411111111111111 wrote:
               | They're probably talking about the TOS a user would've
               | had to agree to when using their services. It's actually
               | a lot more permissive then I expected
               | 
               | > _Restrictions. You may not (i) use the Services in a
               | way that infringes, misappropriates or violates any
               | person's rights; (ii) reverse assemble, reverse compile,
               | decompile, translate or otherwise attempt to discover the
               | source code or underlying components of models,
               | algorithms, and systems of the Services (except to the
               | extent such restrictions are contrary to applicable law);
               | (iii) use output from the Services to develop models that
               | compete with OpenAI;_
        
               | kiratp wrote:
               | Their API TOS basically forbid it. Simple as that.
        
               | MacsHeadroom wrote:
               | Someone who acquires these outputs who has never
               | consented to their ToS is not bound by their ToS.
        
               | reissbaker wrote:
               | Sure, but the ways of acquiring those outputs legally
               | have vampiric licensing that bind you to those ToS, since
               | the re-licenser is bound by the original ToS.
               | 
               | It's like distributing GPL code in a nonfree application.
               | Even if you didn't "consent to [the original author's]
               | ToS," you are still going to be bound to it via the
               | redistributors license.
        
           | throwawayForMe2 wrote:
           | >> The next major leap in LLMs (in the next year) is probably
           | going to be the prompt context size. Right now we have 2k,
           | 4k, 8k ... but OpenAI also has a 32k model that they're not
           | really giving access to unfortunately.
           | 
           | Saw this article today about a different approach that opens
           | up orders of magnitude larger contexts
           | 
           | https://hazyresearch.stanford.edu/blog/2023-03-07-hyena
        
         | delusional wrote:
         | How does that make sense? LLM's are machines that produce
         | output from input, the position and distribution of that input
         | in the latent space is highly predictive of the output. It
         | seems fairly uncontroversial to expect some knowledge of the
         | tokens and their individual contribution to that distribution
         | in combination with the others, some intuition of the
         | multivariate nonlinear behavior of the hidden layers, is
         | exactly what would let you utilize this machine for anything
         | useful.
         | 
         | Regular people type all sorts of shit into google, but power
         | users know how to query google effectively to work with the
         | system. Knowing the right keywords is often half the work. I
         | don't understand how the architecture of current LLMs are going
         | to work around that feature.
        
         | ransom1538 wrote:
         | I expect the exact opposite. As more rules and regulations get
         | put in, prompt engineering is going to be the new software
         | development. "I would like you to pretend i need a lawyer
         | dealing in a commercial lease that..."
        
         | nicetryguy wrote:
         | I remember being a "good google querier" before autocomplete
         | rendered that mostly irrelevant. While i think you're right to
         | some degree, you still have to articulate exactly what you want
         | and need from this machine, and no amount of the LLM guessing
         | what the intent was will ever replace specifically and
         | explicitly stating your needs and goals. I see a continuing
         | relationship with the complexity of the task tied to the
         | required complexity of the request.
        
           | james-revisoai wrote:
           | Google autocomplete using your query history also reduces the
           | information you learn from suggestions as you do the
           | searching...
           | 
           | While in the past "indexDB.set undefined in " might
           | autocomplete to show safari first, indicating a vendor-
           | specific bug, it'll often now prefill with some noun from
           | whatever you last searched (e.g. "main window") to "help"
           | you.
           | 
           | Haven't found a way to disable that, annoying for
           | understanding bugs, situations/context and root causes.
        
           | ethbr0 wrote:
           | Not just auto-complete, but Google removing power search
           | capabilities (quotes, plus, etc).
           | 
           | Here's hoping LLMs-as-a-service don't fall into the same
           | trap.
           | 
           | It's fine to optimize for the 80% of your users who write
           | badly, but for god's sake _keep a bail-out for power users
           | who want more control_.
           | 
           | You don't have to make it the default... but just don't
           | remove it!
        
           | UltimateEdge wrote:
           | Being able to compose a good query is still relevant I think!
           | My peer once asked me for help with a mathematical problem,
           | for which they could not find help online - after not much
           | searching I could find a relevant page, given the same
           | information/problem statement.
        
             | [deleted]
        
         | theK wrote:
         | Not so sure about that. The biggest part of prompt engineering
         | I am seeing is of the kind that sets up context to bootstrap a
         | discussion on a predetermined domain.
         | 
         | As I've said elsewhere, in most knowledge work context is key
         | to getting viable results. I don't think something like this is
         | ever going to get automated away, especially in the cases where
         | the context comes from proprietary knowledge.
        
         | dr_dshiv wrote:
         | It isn't just engineering vs blind prompting. There is also
         | "prompt vibing" where intuition comes into play.
        
         | petetnt wrote:
         | People spent years and years learning how to get the best
         | answers with least possible efforts and search engines evolved
         | with it. Seems pretty insane to me that we have now devolved
         | into asking insanely specific and obtuse questions to receive
         | obtuse answers to any questions.
        
         | skybrian wrote:
         | Learning to say what you want is a skill. Much like you can get
         | better at searching, you can get better at saying what you
         | want.
         | 
         | The framework described in the blog post seems like a more
         | formal way to do it, but there are other ways to iterate in
         | conversation. After seeing the first result, you can explain
         | better what you want. If you're not expecting to repeat the
         | query then maybe that's good enough?
         | 
         | I expect there will be better UI's that encourage iteration.
         | Maybe you see a list of suggested prompts that are similar and
         | decide which one you really want?
        
         | ryanjshaw wrote:
         | It depends on how you define "short term". If you until until
         | AGI, then sure. Until then, however, for anything that is going
         | to potentially generate revenue you will need to consider the
         | points raised by the article to keep costs manageable, to avoid
         | performance regression, etc.
        
       ___________________________________________________________________
       (page generated 2023-04-22 23:00 UTC)